All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: remove set_fs for m68k
@ 2021-07-09  7:01 Christoph Hellwig
  2021-07-09  7:01 ` [PATCH 1/7] m68k: document that access_ok is broken for !CONFIG_CPU_HAS_ADDRESS_SPACES Christoph Hellwig
                   ` (7 more replies)
  0 siblings, 8 replies; 73+ messages in thread
From: Christoph Hellwig @ 2021-07-09  7:01 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Ungerer; +Cc: Michael Schmitz, linux-m68k

Hi all,

this series converts m68k to the new world where set_fs is gone to make
the uaccess routines work on kernel pointers.  Note that it is compile
tested only and thus needs to be handled with care.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 1/7] m68k: document that access_ok is broken for !CONFIG_CPU_HAS_ADDRESS_SPACES
  2021-07-09  7:01 RFC: remove set_fs for m68k Christoph Hellwig
@ 2021-07-09  7:01 ` Christoph Hellwig
  2021-07-09  7:01 ` [PATCH 2/7] m68k: use BUILD_BUG for passing invalid sizes to get_user/put_user Christoph Hellwig
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 73+ messages in thread
From: Christoph Hellwig @ 2021-07-09  7:01 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Ungerer; +Cc: Michael Schmitz, linux-m68k

Document that access_ok is completely broken for coldfire and friends at
the moment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/m68k/include/asm/uaccess.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h
index f98208ccbbcd..610bfe8d64d5 100644
--- a/arch/m68k/include/asm/uaccess.h
+++ b/arch/m68k/include/asm/uaccess.h
@@ -16,6 +16,10 @@
 static inline int access_ok(const void __user *addr,
 			    unsigned long size)
 {
+	/*
+	 * XXX: for !CONFIG_CPU_HAS_ADDRESS_SPACES this really needs to check
+	 * for TASK_SIZE!
+	 */
 	return 1;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 2/7] m68k: use BUILD_BUG for passing invalid sizes to get_user/put_user
  2021-07-09  7:01 RFC: remove set_fs for m68k Christoph Hellwig
  2021-07-09  7:01 ` [PATCH 1/7] m68k: document that access_ok is broken for !CONFIG_CPU_HAS_ADDRESS_SPACES Christoph Hellwig
@ 2021-07-09  7:01 ` Christoph Hellwig
  2021-07-09  7:01 ` [PATCH 3/7] m68k: remove the inline copy_{from,to}_user variants Christoph Hellwig
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 73+ messages in thread
From: Christoph Hellwig @ 2021-07-09  7:01 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Ungerer; +Cc: Michael Schmitz, linux-m68k

Simplify the handling a bit by using the common helper instead of
referencing undefined symbols.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/m68k/include/asm/uaccess.h | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h
index 610bfe8d64d5..01334a9658c4 100644
--- a/arch/m68k/include/asm/uaccess.h
+++ b/arch/m68k/include/asm/uaccess.h
@@ -39,9 +39,6 @@ static inline int access_ok(const void __user *addr,
 #define	MOVES	"move"
 #endif
 
-extern int __put_user_bad(void);
-extern int __get_user_bad(void);
-
 #define __put_user_asm(res, x, ptr, bwl, reg, err)	\
 asm volatile ("\n"					\
 	"1:	"MOVES"."#bwl"	%2,%1\n"		\
@@ -105,8 +102,7 @@ asm volatile ("\n"					\
 		break;							\
 	    }								\
 	default:							\
-		__pu_err = __put_user_bad();				\
-		break;							\
+		BUILD_BUG();						\
 	}								\
 	__pu_err;							\
 })
@@ -179,8 +175,7 @@ asm volatile ("\n"					\
 		break;							\
 	}								\
 	default:							\
-		__gu_err = __get_user_bad();				\
-		break;							\
+		BUILD_BUG();						\
 	}								\
 	__gu_err;							\
 })
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 3/7] m68k: remove the inline copy_{from,to}_user variants
  2021-07-09  7:01 RFC: remove set_fs for m68k Christoph Hellwig
  2021-07-09  7:01 ` [PATCH 1/7] m68k: document that access_ok is broken for !CONFIG_CPU_HAS_ADDRESS_SPACES Christoph Hellwig
  2021-07-09  7:01 ` [PATCH 2/7] m68k: use BUILD_BUG for passing invalid sizes to get_user/put_user Christoph Hellwig
@ 2021-07-09  7:01 ` Christoph Hellwig
  2021-07-09  7:01 ` [PATCH 4/7] m68k: remove the err argument to the get_user/put_user assembly helpers Christoph Hellwig
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 73+ messages in thread
From: Christoph Hellwig @ 2021-07-09  7:01 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Ungerer; +Cc: Michael Schmitz, linux-m68k

Small access already use get_user/put_user so move the whole larger user
copy machinery out of line.

This saves almost 10k of .text for a defconfig build:

   text	   data	    bss	    dec	    hex	filename
4019751	1344434	 170104	5534289	 547251	vmlinux.pre
4015417	1338962	 170104	5524483	 544c03	vmlinux.post

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/m68k/include/asm/uaccess.h | 201 +-------------------------------
 arch/m68k/lib/uaccess.c         |  12 +-
 2 files changed, 10 insertions(+), 203 deletions(-)

diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h
index 01334a9658c4..96aae4d4656b 100644
--- a/arch/m68k/include/asm/uaccess.h
+++ b/arch/m68k/include/asm/uaccess.h
@@ -181,203 +181,10 @@ asm volatile ("\n"					\
 })
 #define get_user(x, ptr) __get_user(x, ptr)
 
-unsigned long __generic_copy_from_user(void *to, const void __user *from, unsigned long n);
-unsigned long __generic_copy_to_user(void __user *to, const void *from, unsigned long n);
-
-#define __suffix0
-#define __suffix1 b
-#define __suffix2 w
-#define __suffix4 l
-
-#define ____constant_copy_from_user_asm(res, to, from, tmp, n1, n2, n3, s1, s2, s3)\
-	asm volatile ("\n"						\
-		"1:	"MOVES"."#s1"	(%2)+,%3\n"			\
-		"	move."#s1"	%3,(%1)+\n"			\
-		"	.ifnc	\""#s2"\",\"\"\n"			\
-		"2:	"MOVES"."#s2"	(%2)+,%3\n"			\
-		"	move."#s2"	%3,(%1)+\n"			\
-		"	.ifnc	\""#s3"\",\"\"\n"			\
-		"3:	"MOVES"."#s3"	(%2)+,%3\n"			\
-		"	move."#s3"	%3,(%1)+\n"			\
-		"	.endif\n"					\
-		"	.endif\n"					\
-		"4:\n"							\
-		"	.section __ex_table,\"a\"\n"			\
-		"	.align	4\n"					\
-		"	.long	1b,10f\n"				\
-		"	.ifnc	\""#s2"\",\"\"\n"			\
-		"	.long	2b,20f\n"				\
-		"	.ifnc	\""#s3"\",\"\"\n"			\
-		"	.long	3b,30f\n"				\
-		"	.endif\n"					\
-		"	.endif\n"					\
-		"	.previous\n"					\
-		"\n"							\
-		"	.section .fixup,\"ax\"\n"			\
-		"	.even\n"					\
-		"10:	addq.l #"#n1",%0\n"				\
-		"	.ifnc	\""#s2"\",\"\"\n"			\
-		"20:	addq.l #"#n2",%0\n"				\
-		"	.ifnc	\""#s3"\",\"\"\n"			\
-		"30:	addq.l #"#n3",%0\n"				\
-		"	.endif\n"					\
-		"	.endif\n"					\
-		"	jra	4b\n"					\
-		"	.previous\n"					\
-		: "+d" (res), "+&a" (to), "+a" (from), "=&d" (tmp)	\
-		: : "memory")
-
-#define ___constant_copy_from_user_asm(res, to, from, tmp, n1, n2, n3, s1, s2, s3)\
-	____constant_copy_from_user_asm(res, to, from, tmp, n1, n2, n3, s1, s2, s3)
-#define __constant_copy_from_user_asm(res, to, from, tmp, n1, n2, n3)	\
-	___constant_copy_from_user_asm(res, to, from, tmp, n1, n2, n3,  \
-					__suffix##n1, __suffix##n2, __suffix##n3)
-
-static __always_inline unsigned long
-__constant_copy_from_user(void *to, const void __user *from, unsigned long n)
-{
-	unsigned long res = 0, tmp;
-
-	switch (n) {
-	case 1:
-		__constant_copy_from_user_asm(res, to, from, tmp, 1, 0, 0);
-		break;
-	case 2:
-		__constant_copy_from_user_asm(res, to, from, tmp, 2, 0, 0);
-		break;
-	case 3:
-		__constant_copy_from_user_asm(res, to, from, tmp, 2, 1, 0);
-		break;
-	case 4:
-		__constant_copy_from_user_asm(res, to, from, tmp, 4, 0, 0);
-		break;
-	case 5:
-		__constant_copy_from_user_asm(res, to, from, tmp, 4, 1, 0);
-		break;
-	case 6:
-		__constant_copy_from_user_asm(res, to, from, tmp, 4, 2, 0);
-		break;
-	case 7:
-		__constant_copy_from_user_asm(res, to, from, tmp, 4, 2, 1);
-		break;
-	case 8:
-		__constant_copy_from_user_asm(res, to, from, tmp, 4, 4, 0);
-		break;
-	case 9:
-		__constant_copy_from_user_asm(res, to, from, tmp, 4, 4, 1);
-		break;
-	case 10:
-		__constant_copy_from_user_asm(res, to, from, tmp, 4, 4, 2);
-		break;
-	case 12:
-		__constant_copy_from_user_asm(res, to, from, tmp, 4, 4, 4);
-		break;
-	default:
-		/* we limit the inlined version to 3 moves */
-		return __generic_copy_from_user(to, from, n);
-	}
-
-	return res;
-}
-
-#define __constant_copy_to_user_asm(res, to, from, tmp, n, s1, s2, s3)	\
-	asm volatile ("\n"						\
-		"	move."#s1"	(%2)+,%3\n"			\
-		"11:	"MOVES"."#s1"	%3,(%1)+\n"			\
-		"12:	move."#s2"	(%2)+,%3\n"			\
-		"21:	"MOVES"."#s2"	%3,(%1)+\n"			\
-		"22:\n"							\
-		"	.ifnc	\""#s3"\",\"\"\n"			\
-		"	move."#s3"	(%2)+,%3\n"			\
-		"31:	"MOVES"."#s3"	%3,(%1)+\n"			\
-		"32:\n"							\
-		"	.endif\n"					\
-		"4:\n"							\
-		"\n"							\
-		"	.section __ex_table,\"a\"\n"			\
-		"	.align	4\n"					\
-		"	.long	11b,5f\n"				\
-		"	.long	12b,5f\n"				\
-		"	.long	21b,5f\n"				\
-		"	.long	22b,5f\n"				\
-		"	.ifnc	\""#s3"\",\"\"\n"			\
-		"	.long	31b,5f\n"				\
-		"	.long	32b,5f\n"				\
-		"	.endif\n"					\
-		"	.previous\n"					\
-		"\n"							\
-		"	.section .fixup,\"ax\"\n"			\
-		"	.even\n"					\
-		"5:	moveq.l	#"#n",%0\n"				\
-		"	jra	4b\n"					\
-		"	.previous\n"					\
-		: "+d" (res), "+a" (to), "+a" (from), "=&d" (tmp)	\
-		: : "memory")
-
-static __always_inline unsigned long
-__constant_copy_to_user(void __user *to, const void *from, unsigned long n)
-{
-	unsigned long res = 0, tmp;
-
-	switch (n) {
-	case 1:
-		__put_user_asm(res, *(u8 *)from, (u8 __user *)to, b, d, 1);
-		break;
-	case 2:
-		__put_user_asm(res, *(u16 *)from, (u16 __user *)to, w, r, 2);
-		break;
-	case 3:
-		__constant_copy_to_user_asm(res, to, from, tmp, 3, w, b,);
-		break;
-	case 4:
-		__put_user_asm(res, *(u32 *)from, (u32 __user *)to, l, r, 4);
-		break;
-	case 5:
-		__constant_copy_to_user_asm(res, to, from, tmp, 5, l, b,);
-		break;
-	case 6:
-		__constant_copy_to_user_asm(res, to, from, tmp, 6, l, w,);
-		break;
-	case 7:
-		__constant_copy_to_user_asm(res, to, from, tmp, 7, l, w, b);
-		break;
-	case 8:
-		__constant_copy_to_user_asm(res, to, from, tmp, 8, l, l,);
-		break;
-	case 9:
-		__constant_copy_to_user_asm(res, to, from, tmp, 9, l, l, b);
-		break;
-	case 10:
-		__constant_copy_to_user_asm(res, to, from, tmp, 10, l, l, w);
-		break;
-	case 12:
-		__constant_copy_to_user_asm(res, to, from, tmp, 12, l, l, l);
-		break;
-	default:
-		/* limit the inlined version to 3 moves */
-		return __generic_copy_to_user(to, from, n);
-	}
-
-	return res;
-}
-
-static inline unsigned long
-raw_copy_from_user(void *to, const void __user *from, unsigned long n)
-{
-	if (__builtin_constant_p(n))
-		return __constant_copy_from_user(to, from, n);
-	return __generic_copy_from_user(to, from, n);
-}
-
-static inline unsigned long
-raw_copy_to_user(void __user *to, const void *from, unsigned long n)
-{
-	if (__builtin_constant_p(n))
-		return __constant_copy_to_user(to, from, n);
-	return __generic_copy_to_user(to, from, n);
-}
-#define INLINE_COPY_FROM_USER
-#define INLINE_COPY_TO_USER
+unsigned long raw_copy_from_user(void *to, const void __user *from,
+		unsigned long n);
+unsigned long raw_copy_to_user(void __user *to, const void *from,
+		unsigned long n);
 
 #define user_addr_max() \
 	(uaccess_kernel() ? ~0UL : TASK_SIZE)
diff --git a/arch/m68k/lib/uaccess.c b/arch/m68k/lib/uaccess.c
index 7646e461aa62..56022d06b780 100644
--- a/arch/m68k/lib/uaccess.c
+++ b/arch/m68k/lib/uaccess.c
@@ -7,8 +7,8 @@
 #include <linux/module.h>
 #include <linux/uaccess.h>
 
-unsigned long __generic_copy_from_user(void *to, const void __user *from,
-				       unsigned long n)
+unsigned long raw_copy_from_user(void *to, const void __user *from,
+				 unsigned long n)
 {
 	unsigned long tmp, res;
 
@@ -51,10 +51,10 @@ unsigned long __generic_copy_from_user(void *to, const void __user *from,
 
 	return res;
 }
-EXPORT_SYMBOL(__generic_copy_from_user);
+EXPORT_SYMBOL(raw_copy_from_user);
 
-unsigned long __generic_copy_to_user(void __user *to, const void *from,
-				     unsigned long n)
+unsigned long raw_copy_to_user(void __user *to, const void *from,
+			       unsigned long n)
 {
 	unsigned long tmp, res;
 
@@ -95,7 +95,7 @@ unsigned long __generic_copy_to_user(void __user *to, const void *from,
 
 	return res;
 }
-EXPORT_SYMBOL(__generic_copy_to_user);
+EXPORT_SYMBOL(raw_copy_to_user);
 
 /*
  * Zero Userspace
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 4/7] m68k: remove the err argument to the get_user/put_user assembly helpers
  2021-07-09  7:01 RFC: remove set_fs for m68k Christoph Hellwig
                   ` (2 preceding siblings ...)
  2021-07-09  7:01 ` [PATCH 3/7] m68k: remove the inline copy_{from,to}_user variants Christoph Hellwig
@ 2021-07-09  7:01 ` Christoph Hellwig
  2021-07-09  7:01 ` [PATCH 5/7] m68k: factor the 8-byte lowlevel {get,put}_user code into helpers Christoph Hellwig
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 73+ messages in thread
From: Christoph Hellwig @ 2021-07-09  7:01 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Ungerer; +Cc: Michael Schmitz, linux-m68k

These are always hardwired to -EFAULT now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/m68k/include/asm/uaccess.h | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h
index 96aae4d4656b..fa466bf6c4ca 100644
--- a/arch/m68k/include/asm/uaccess.h
+++ b/arch/m68k/include/asm/uaccess.h
@@ -39,7 +39,7 @@ static inline int access_ok(const void __user *addr,
 #define	MOVES	"move"
 #endif
 
-#define __put_user_asm(res, x, ptr, bwl, reg, err)	\
+#define __put_user_asm(res, x, ptr, bwl, reg)		\
 asm volatile ("\n"					\
 	"1:	"MOVES"."#bwl"	%2,%1\n"		\
 	"2:\n"						\
@@ -55,7 +55,7 @@ asm volatile ("\n"					\
 	"	.long	2b,10b\n"			\
 	"	.previous"				\
 	: "+d" (res), "=m" (*(ptr))			\
-	: #reg (x), "i" (err))
+	: #reg (x), "i" (-EFAULT))
 
 /*
  * These are the main single-value transfer routines.  They automatically
@@ -69,13 +69,13 @@ asm volatile ("\n"					\
 	__chk_user_ptr(ptr);						\
 	switch (sizeof (*(ptr))) {					\
 	case 1:								\
-		__put_user_asm(__pu_err, __pu_val, ptr, b, d, -EFAULT);	\
+		__put_user_asm(__pu_err, __pu_val, ptr, b, d);		\
 		break;							\
 	case 2:								\
-		__put_user_asm(__pu_err, __pu_val, ptr, w, r, -EFAULT);	\
+		__put_user_asm(__pu_err, __pu_val, ptr, w, r);		\
 		break;							\
 	case 4:								\
-		__put_user_asm(__pu_err, __pu_val, ptr, l, r, -EFAULT);	\
+		__put_user_asm(__pu_err, __pu_val, ptr, l, r);		\
 		break;							\
 	case 8:								\
  	    {								\
@@ -109,7 +109,7 @@ asm volatile ("\n"					\
 #define put_user(x, ptr)	__put_user(x, ptr)
 
 
-#define __get_user_asm(res, x, ptr, type, bwl, reg, err) ({		\
+#define __get_user_asm(res, x, ptr, type, bwl, reg) ({			\
 	type __gu_val;							\
 	asm volatile ("\n"						\
 		"1:	"MOVES"."#bwl"	%2,%1\n"			\
@@ -126,7 +126,7 @@ asm volatile ("\n"					\
 		"	.long	1b,10b\n"				\
 		"	.previous"					\
 		: "+d" (res), "=&" #reg (__gu_val)			\
-		: "m" (*(ptr)), "i" (err));				\
+		: "m" (*(ptr)), "i" (-EFAULT));				\
 	(x) = (__force typeof(*(ptr)))(__force unsigned long)__gu_val;	\
 })
 
@@ -136,13 +136,13 @@ asm volatile ("\n"					\
 	__chk_user_ptr(ptr);						\
 	switch (sizeof(*(ptr))) {					\
 	case 1:								\
-		__get_user_asm(__gu_err, x, ptr, u8, b, d, -EFAULT);	\
+		__get_user_asm(__gu_err, x, ptr, u8, b, d);		\
 		break;							\
 	case 2:								\
-		__get_user_asm(__gu_err, x, ptr, u16, w, r, -EFAULT);	\
+		__get_user_asm(__gu_err, x, ptr, u16, w, r);		\
 		break;							\
 	case 4:								\
-		__get_user_asm(__gu_err, x, ptr, u32, l, r, -EFAULT);	\
+		__get_user_asm(__gu_err, x, ptr, u32, l, r);		\
 		break;							\
 	case 8: {							\
 		const void __user *__gu_ptr = (ptr);			\
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 5/7] m68k: factor the 8-byte lowlevel {get,put}_user code into helpers
  2021-07-09  7:01 RFC: remove set_fs for m68k Christoph Hellwig
                   ` (3 preceding siblings ...)
  2021-07-09  7:01 ` [PATCH 4/7] m68k: remove the err argument to the get_user/put_user assembly helpers Christoph Hellwig
@ 2021-07-09  7:01 ` Christoph Hellwig
  2021-07-09  7:01 ` [PATCH 6/7] m68k: provide __{get,put}_kernel_nofault Christoph Hellwig
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 73+ messages in thread
From: Christoph Hellwig @ 2021-07-09  7:01 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Ungerer; +Cc: Michael Schmitz, linux-m68k

Add new helpers for doing the grunt work of the 8-byte {get,put}_user
routines to allow for better reuse.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/m68k/include/asm/uaccess.h | 111 +++++++++++++++++---------------
 1 file changed, 60 insertions(+), 51 deletions(-)

diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h
index fa466bf6c4ca..384b2a6b135c 100644
--- a/arch/m68k/include/asm/uaccess.h
+++ b/arch/m68k/include/asm/uaccess.h
@@ -57,6 +57,31 @@ asm volatile ("\n"					\
 	: "+d" (res), "=m" (*(ptr))			\
 	: #reg (x), "i" (-EFAULT))
 
+#define __put_user_asm8(res, x, ptr)				\
+do {								\
+	const void *__pu_ptr = (const void __force *)(ptr);	\
+								\
+	asm volatile ("\n"					\
+		"1:	"MOVES".l %2,(%1)+\n"			\
+		"2:	"MOVES".l %R2,(%1)\n"			\
+		"3:\n"						\
+		"	.section .fixup,\"ax\"\n"		\
+		"	.even\n"				\
+		"10:	movel %3,%0\n"				\
+		"	jra 3b\n"				\
+		"	.previous\n"				\
+		"\n"						\
+		"	.section __ex_table,\"a\"\n"		\
+		"	.align 4\n"				\
+		"	.long 1b,10b\n"				\
+		"	.long 2b,10b\n"				\
+		"	.long 3b,10b\n"				\
+		"	.previous"				\
+		: "+d" (res), "+a" (__pu_ptr)			\
+		: "r" (x), "i" (-EFAULT)			\
+		: "memory");					\
+} while (0)
+
 /*
  * These are the main single-value transfer routines.  They automatically
  * use the right size if we just have the right pointer type.
@@ -78,29 +103,8 @@ asm volatile ("\n"					\
 		__put_user_asm(__pu_err, __pu_val, ptr, l, r);		\
 		break;							\
 	case 8:								\
- 	    {								\
- 		const void __user *__pu_ptr = (ptr);			\
-		asm volatile ("\n"					\
-			"1:	"MOVES".l	%2,(%1)+\n"		\
-			"2:	"MOVES".l	%R2,(%1)\n"		\
-			"3:\n"						\
-			"	.section .fixup,\"ax\"\n"		\
-			"	.even\n"				\
-			"10:	movel %3,%0\n"				\
-			"	jra 3b\n"				\
-			"	.previous\n"				\
-			"\n"						\
-			"	.section __ex_table,\"a\"\n"		\
-			"	.align 4\n"				\
-			"	.long 1b,10b\n"				\
-			"	.long 2b,10b\n"				\
-			"	.long 3b,10b\n"				\
-			"	.previous"				\
-			: "+d" (__pu_err), "+a" (__pu_ptr)		\
-			: "r" (__pu_val), "i" (-EFAULT)			\
-			: "memory");					\
+		__put_user_asm8(__pu_err, __pu_val, ptr);		\
 		break;							\
-	    }								\
 	default:							\
 		BUILD_BUG();						\
 	}								\
@@ -130,6 +134,38 @@ asm volatile ("\n"					\
 	(x) = (__force typeof(*(ptr)))(__force unsigned long)__gu_val;	\
 })
 
+#define __get_user_asm8(res, x, ptr)					\
+do {									\
+	const void *__gu_ptr = (const void __force *)(ptr);		\
+	union {								\
+		u64 l;							\
+		__typeof__(*(ptr)) t;					\
+	} __gu_val;							\
+									\
+	asm volatile ("\n"						\
+		"1:	"MOVES".l	(%2)+,%1\n"			\
+		"2:	"MOVES".l	(%2),%R1\n"			\
+		"3:\n"							\
+		"	.section .fixup,\"ax\"\n"			\
+		"	.even\n"					\
+		"10:	move.l	%3,%0\n"				\
+		"	sub.l	%1,%1\n"				\
+		"	sub.l	%R1,%R1\n"				\
+		"	jra	3b\n"					\
+		"	.previous\n"					\
+		"\n"							\
+		"	.section __ex_table,\"a\"\n"			\
+		"	.align	4\n"					\
+		"	.long	1b,10b\n"				\
+		"	.long	2b,10b\n"				\
+		"	.previous"					\
+		: "+d" (res), "=&r" (__gu_val.l),			\
+		  "+a" (__gu_ptr)					\
+		: "i" (-EFAULT)						\
+		: "memory");						\
+	(x) = __gu_val.t;						\
+} while (0)
+
 #define __get_user(x, ptr)						\
 ({									\
 	int __gu_err = 0;						\
@@ -144,36 +180,9 @@ asm volatile ("\n"					\
 	case 4:								\
 		__get_user_asm(__gu_err, x, ptr, u32, l, r);		\
 		break;							\
-	case 8: {							\
-		const void __user *__gu_ptr = (ptr);			\
-		union {							\
-			u64 l;						\
-			__typeof__(*(ptr)) t;				\
-		} __gu_val;						\
-		asm volatile ("\n"					\
-			"1:	"MOVES".l	(%2)+,%1\n"		\
-			"2:	"MOVES".l	(%2),%R1\n"		\
-			"3:\n"						\
-			"	.section .fixup,\"ax\"\n"		\
-			"	.even\n"				\
-			"10:	move.l	%3,%0\n"			\
-			"	sub.l	%1,%1\n"			\
-			"	sub.l	%R1,%R1\n"			\
-			"	jra	3b\n"				\
-			"	.previous\n"				\
-			"\n"						\
-			"	.section __ex_table,\"a\"\n"		\
-			"	.align	4\n"				\
-			"	.long	1b,10b\n"			\
-			"	.long	2b,10b\n"			\
-			"	.previous"				\
-			: "+d" (__gu_err), "=&r" (__gu_val.l),		\
-			  "+a" (__gu_ptr)				\
-			: "i" (-EFAULT)					\
-			: "memory");					\
-		(x) = __gu_val.t;					\
+	case 8:								\
+		__get_user_asm8(__gu_err, x, ptr);			\
 		break;							\
-	}								\
 	default:							\
 		BUILD_BUG();						\
 	}								\
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 6/7] m68k: provide __{get,put}_kernel_nofault
  2021-07-09  7:01 RFC: remove set_fs for m68k Christoph Hellwig
                   ` (4 preceding siblings ...)
  2021-07-09  7:01 ` [PATCH 5/7] m68k: factor the 8-byte lowlevel {get,put}_user code into helpers Christoph Hellwig
@ 2021-07-09  7:01 ` Christoph Hellwig
  2021-07-09  7:01 ` [PATCH 7/7] m68k: remove set_fs() Christoph Hellwig
  2021-07-11  7:20 ` RFC: remove set_fs for m68k Michael Schmitz
  7 siblings, 0 replies; 73+ messages in thread
From: Christoph Hellwig @ 2021-07-09  7:01 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Ungerer; +Cc: Michael Schmitz, linux-m68k

Allow non-faulting access to kernel addresses without overriding the
address space.  Implemented by passing the instruction name to the
low-level assembly macros as an argument, and force the use of the
normal move instructions for kernel access.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/m68k/include/asm/uaccess.h | 90 ++++++++++++++++++++++++++-------
 1 file changed, 72 insertions(+), 18 deletions(-)

diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h
index 384b2a6b135c..2bedb1b8c4bf 100644
--- a/arch/m68k/include/asm/uaccess.h
+++ b/arch/m68k/include/asm/uaccess.h
@@ -39,9 +39,9 @@ static inline int access_ok(const void __user *addr,
 #define	MOVES	"move"
 #endif
 
-#define __put_user_asm(res, x, ptr, bwl, reg)		\
+#define __put_user_asm(inst, res, x, ptr, bwl, reg)	\
 asm volatile ("\n"					\
-	"1:	"MOVES"."#bwl"	%2,%1\n"		\
+	"1:	"inst"."#bwl"	%2,%1\n"		\
 	"2:\n"						\
 	"	.section .fixup,\"ax\"\n"		\
 	"	.even\n"				\
@@ -57,13 +57,13 @@ asm volatile ("\n"					\
 	: "+d" (res), "=m" (*(ptr))			\
 	: #reg (x), "i" (-EFAULT))
 
-#define __put_user_asm8(res, x, ptr)				\
+#define __put_user_asm8(inst, res, x, ptr)			\
 do {								\
 	const void *__pu_ptr = (const void __force *)(ptr);	\
 								\
 	asm volatile ("\n"					\
-		"1:	"MOVES".l %2,(%1)+\n"			\
-		"2:	"MOVES".l %R2,(%1)\n"			\
+		"1:	"inst".l %2,(%1)+\n"			\
+		"2:	"inst".l %R2,(%1)\n"			\
 		"3:\n"						\
 		"	.section .fixup,\"ax\"\n"		\
 		"	.even\n"				\
@@ -94,16 +94,16 @@ do {								\
 	__chk_user_ptr(ptr);						\
 	switch (sizeof (*(ptr))) {					\
 	case 1:								\
-		__put_user_asm(__pu_err, __pu_val, ptr, b, d);		\
+		__put_user_asm(MOVES, __pu_err, __pu_val, ptr, b, d);	\
 		break;							\
 	case 2:								\
-		__put_user_asm(__pu_err, __pu_val, ptr, w, r);		\
+		__put_user_asm(MOVES, __pu_err, __pu_val, ptr, w, r);	\
 		break;							\
 	case 4:								\
-		__put_user_asm(__pu_err, __pu_val, ptr, l, r);		\
+		__put_user_asm(MOVES, __pu_err, __pu_val, ptr, l, r);	\
 		break;							\
 	case 8:								\
-		__put_user_asm8(__pu_err, __pu_val, ptr);		\
+		__put_user_asm8(MOVES, __pu_err, __pu_val, ptr);	\
 		break;							\
 	default:							\
 		BUILD_BUG();						\
@@ -113,10 +113,10 @@ do {								\
 #define put_user(x, ptr)	__put_user(x, ptr)
 
 
-#define __get_user_asm(res, x, ptr, type, bwl, reg) ({			\
+#define __get_user_asm(inst, res, x, ptr, type, bwl, reg) ({		\
 	type __gu_val;							\
 	asm volatile ("\n"						\
-		"1:	"MOVES"."#bwl"	%2,%1\n"			\
+		"1:	"inst"."#bwl"	%2,%1\n"			\
 		"2:\n"							\
 		"	.section .fixup,\"ax\"\n"			\
 		"	.even\n"					\
@@ -134,7 +134,7 @@ do {								\
 	(x) = (__force typeof(*(ptr)))(__force unsigned long)__gu_val;	\
 })
 
-#define __get_user_asm8(res, x, ptr)					\
+#define __get_user_asm8(inst, res, x, ptr) 				\
 do {									\
 	const void *__gu_ptr = (const void __force *)(ptr);		\
 	union {								\
@@ -143,8 +143,8 @@ do {									\
 	} __gu_val;							\
 									\
 	asm volatile ("\n"						\
-		"1:	"MOVES".l	(%2)+,%1\n"			\
-		"2:	"MOVES".l	(%2),%R1\n"			\
+		"1:	"inst".l (%2)+,%1\n"				\
+		"2:	"inst".l (%2),%R1\n"				\
 		"3:\n"							\
 		"	.section .fixup,\"ax\"\n"			\
 		"	.even\n"					\
@@ -172,16 +172,16 @@ do {									\
 	__chk_user_ptr(ptr);						\
 	switch (sizeof(*(ptr))) {					\
 	case 1:								\
-		__get_user_asm(__gu_err, x, ptr, u8, b, d);		\
+		__get_user_asm(MOVES, __gu_err, x, ptr, u8, b, d);	\
 		break;							\
 	case 2:								\
-		__get_user_asm(__gu_err, x, ptr, u16, w, r);		\
+		__get_user_asm(MOVES, __gu_err, x, ptr, u16, w, r);	\
 		break;							\
 	case 4:								\
-		__get_user_asm(__gu_err, x, ptr, u32, l, r);		\
+		__get_user_asm(MOVES, __gu_err, x, ptr, u32, l, r);	\
 		break;							\
 	case 8:								\
-		__get_user_asm8(__gu_err, x, ptr);			\
+		__get_user_asm8(MOVES, __gu_err, x, ptr);		\
 		break;							\
 	default:							\
 		BUILD_BUG();						\
@@ -190,6 +190,60 @@ do {									\
 })
 #define get_user(x, ptr) __get_user(x, ptr)
 
+#define HAVE_GET_KERNEL_NOFAULT
+
+#define __get_kernel_nofault(dst, src, type, err_label)			\
+do {									\
+	type __gk_dst = *(type *)(dst);					\
+	type *__gk_src = (type *)(src);					\
+	int __gk_err = 0;						\
+									\
+	switch (sizeof(type)) {						\
+	case 1:								\
+		__get_user_asm("move", __gk_err, __gk_dst, __gk_src, u8, b, d); \
+		break;							\
+	case 2:								\
+		__get_user_asm("move", __gk_err, __gk_dst, __gk_src, u16, w, r); \
+		break;							\
+	case 4:								\
+		__get_user_asm("move", __gk_err, __gk_dst, __gk_src, u32, l, r); \
+		break;							\
+	case 8:								\
+		__get_user_asm8("move", __gk_err, __gk_dst, __gk_src);	\
+		break;							\
+	default:							\
+		BUILD_BUG();						\
+	}								\
+	if (unlikely(__gk_err))						\
+		goto err_label;						\
+} while (0)
+
+#define __put_kernel_nofault(dst, src, type, err_label)			\
+do {									\
+	type __pk_src = *(type *)(src);					\
+	type *__pk_dst = (type *)(dst);					\
+	int __pk_err = 0;						\
+									\
+	switch (sizeof(type)) {						\
+	case 1:								\
+		__put_user_asm("move", __pk_err, __pk_src, __pk_dst, b, d); \
+		break;							\
+	case 2:								\
+		__put_user_asm("move", __pk_err, __pk_src, __pk_dst, w, r); \
+		break;							\
+	case 4:								\
+		__put_user_asm("move", __pk_err, __pk_src, __pk_dst, l, r); \
+		break;							\
+	case 8:								\
+		__put_user_asm8("move", __pk_err, __pk_src, __pk_dst);	\
+		break;							\
+	default:							\
+		BUILD_BUG();						\
+	}								\
+	if (unlikely(__pk_err))						\
+		goto err_label;						\
+} while (0)
+
 unsigned long raw_copy_from_user(void *to, const void __user *from,
 		unsigned long n);
 unsigned long raw_copy_to_user(void __user *to, const void *from,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH 7/7] m68k: remove set_fs()
  2021-07-09  7:01 RFC: remove set_fs for m68k Christoph Hellwig
                   ` (5 preceding siblings ...)
  2021-07-09  7:01 ` [PATCH 6/7] m68k: provide __{get,put}_kernel_nofault Christoph Hellwig
@ 2021-07-09  7:01 ` Christoph Hellwig
  2021-07-11  7:20 ` RFC: remove set_fs for m68k Michael Schmitz
  7 siblings, 0 replies; 73+ messages in thread
From: Christoph Hellwig @ 2021-07-09  7:01 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Ungerer; +Cc: Michael Schmitz, linux-m68k

Add a m68k-only set_fc helper to set the SFC and DFC registers for the
few places that need to override it for special MM operations, but
disconnect that from the deprecated kernel-wide set_fs() API.

Note that the SFC/DFC registers are context switched, so there is no need
to disable preemption.

Partially based on an earlier patch from
Linus Torvalds <torvalds@linux-foundation.org>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/m68k/68000/entry.S             |  1 -
 arch/m68k/Kconfig                   |  1 -
 arch/m68k/coldfire/entry.S          |  1 -
 arch/m68k/include/asm/processor.h   | 31 +++++++++++++--
 arch/m68k/include/asm/segment.h     | 59 -----------------------------
 arch/m68k/include/asm/thread_info.h |  3 --
 arch/m68k/include/asm/tlbflush.h    | 11 ++----
 arch/m68k/include/asm/uaccess.h     |  4 --
 arch/m68k/kernel/asm-offsets.c      |  2 +-
 arch/m68k/kernel/entry.S            |  5 +--
 arch/m68k/kernel/process.c          |  4 +-
 arch/m68k/kernel/traps.c            | 13 ++-----
 arch/m68k/mac/misc.c                |  1 -
 arch/m68k/mm/cache.c                | 23 ++++++-----
 arch/m68k/mm/init.c                 |  2 +-
 arch/m68k/mm/kmap.c                 |  1 -
 arch/m68k/mm/memory.c               |  1 -
 arch/m68k/mm/motorola.c             |  2 +-
 arch/m68k/sun3/config.c             |  3 +-
 arch/m68k/sun3/mmu_emu.c            |  6 +--
 arch/m68k/sun3/sun3ints.c           |  1 -
 arch/m68k/sun3x/prom.c              |  1 -
 22 files changed, 59 insertions(+), 117 deletions(-)
 delete mode 100644 arch/m68k/include/asm/segment.h

diff --git a/arch/m68k/68000/entry.S b/arch/m68k/68000/entry.S
index 259b3661b614..6aefcc20c28a 100644
--- a/arch/m68k/68000/entry.S
+++ b/arch/m68k/68000/entry.S
@@ -15,7 +15,6 @@
 #include <asm/unistd.h>
 #include <asm/errno.h>
 #include <asm/setup.h>
-#include <asm/segment.h>
 #include <asm/traps.h>
 #include <asm/asm-offsets.h>
 #include <asm/entry.h>
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 96989ad46f66..3a83013f0a96 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -32,7 +32,6 @@ config M68K
 	select NO_DMA if !MMU && !COLDFIRE
 	select OLD_SIGACTION
 	select OLD_SIGSUSPEND3
-	select SET_FS
 	select UACCESS_MEMCPY if !MMU
 	select VIRT_TO_BUS
 	select ZONE_DMA
diff --git a/arch/m68k/coldfire/entry.S b/arch/m68k/coldfire/entry.S
index d43a02795a4a..412bb7326099 100644
--- a/arch/m68k/coldfire/entry.S
+++ b/arch/m68k/coldfire/entry.S
@@ -31,7 +31,6 @@
 #include <asm/thread_info.h>
 #include <asm/errno.h>
 #include <asm/setup.h>
-#include <asm/segment.h>
 #include <asm/asm-offsets.h>
 #include <asm/entry.h>
 
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index 3750819ac5a1..debb4537eab5 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -9,7 +9,6 @@
 #define __ASM_M68K_PROCESSOR_H
 
 #include <linux/thread_info.h>
-#include <asm/segment.h>
 #include <asm/fpu.h>
 #include <asm/ptrace.h>
 
@@ -75,11 +74,37 @@ static inline void wrusp(unsigned long usp)
 #define TASK_UNMAPPED_BASE	0
 #endif
 
+/* Address spaces (or Function Codes in Motorola lingo) */
+#define USER_DATA     (1)
+#define USER_PROGRAM  (2)
+#define SUPER_DATA    (5)
+#define SUPER_PROGRAM (6)
+#define CPU_SPACE     (7)
+
+#ifdef CONFIG_CPU_HAS_ADDRESS_SPACES
+/*
+ * Set the SFC/DFC registers for special MM operations.  For most normal
+ * operation these remain set to USER_DATA for the uaccess routines.
+ */
+static inline void set_fc(unsigned long val)
+{
+	WARN_ON_ONCE(in_interrupt());
+
+	__asm__ __volatile__ ("movec %0,%/sfc\n\t"
+			      "movec %0,%/dfc\n\t"
+			      : /* no outputs */ : "r" (val) : "memory");
+}
+#else
+static inline void set_fc(unsigned long val)
+{
+}
+#endif /* CONFIG_CPU_HAS_ADDRESS_SPACES */
+
 struct thread_struct {
 	unsigned long  ksp;		/* kernel stack pointer */
 	unsigned long  usp;		/* user stack pointer */
 	unsigned short sr;		/* saved status register */
-	unsigned short fs;		/* saved fs (sfc, dfc) */
+	unsigned short fc;		/* saved fc (sfc, dfc) */
 	unsigned long  crp[2];		/* cpu root pointer */
 	unsigned long  esp0;		/* points to SR of stack frame */
 	unsigned long  faddr;		/* info about last fault */
@@ -92,7 +117,7 @@ struct thread_struct {
 #define INIT_THREAD  {							\
 	.ksp	= sizeof(init_stack) + (unsigned long) init_stack,	\
 	.sr	= PS_S,							\
-	.fs	= __KERNEL_DS,						\
+	.fc	= USER_DATA,						\
 }
 
 /*
diff --git a/arch/m68k/include/asm/segment.h b/arch/m68k/include/asm/segment.h
deleted file mode 100644
index 2b5e68a71ef7..000000000000
--- a/arch/m68k/include/asm/segment.h
+++ /dev/null
@@ -1,59 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _M68K_SEGMENT_H
-#define _M68K_SEGMENT_H
-
-/* define constants */
-/* Address spaces (FC0-FC2) */
-#define USER_DATA     (1)
-#ifndef __USER_DS
-#define __USER_DS     (USER_DATA)
-#endif
-#define USER_PROGRAM  (2)
-#define SUPER_DATA    (5)
-#ifndef __KERNEL_DS
-#define __KERNEL_DS   (SUPER_DATA)
-#endif
-#define SUPER_PROGRAM (6)
-#define CPU_SPACE     (7)
-
-#ifndef __ASSEMBLY__
-
-typedef struct {
-	unsigned long seg;
-} mm_segment_t;
-
-#define MAKE_MM_SEG(s)	((mm_segment_t) { (s) })
-
-#ifdef CONFIG_CPU_HAS_ADDRESS_SPACES
-/*
- * Get/set the SFC/DFC registers for MOVES instructions
- */
-#define USER_DS		MAKE_MM_SEG(__USER_DS)
-#define KERNEL_DS	MAKE_MM_SEG(__KERNEL_DS)
-
-static inline mm_segment_t get_fs(void)
-{
-	mm_segment_t _v;
-	__asm__ ("movec %/dfc,%0":"=r" (_v.seg):);
-	return _v;
-}
-
-static inline void set_fs(mm_segment_t val)
-{
-	__asm__ __volatile__ ("movec %0,%/sfc\n\t"
-			      "movec %0,%/dfc\n\t"
-			      : /* no outputs */ : "r" (val.seg) : "memory");
-}
-
-#else
-#define USER_DS		MAKE_MM_SEG(TASK_SIZE)
-#define KERNEL_DS	MAKE_MM_SEG(0xFFFFFFFF)
-#define get_fs()	(current_thread_info()->addr_limit)
-#define set_fs(x)	(current_thread_info()->addr_limit = (x))
-#endif
-
-#define uaccess_kernel()	(get_fs().seg == KERNEL_DS.seg)
-
-#endif /* __ASSEMBLY__ */
-
-#endif /* _M68K_SEGMENT_H */
diff --git a/arch/m68k/include/asm/thread_info.h b/arch/m68k/include/asm/thread_info.h
index 15a757073fa5..c952658ba792 100644
--- a/arch/m68k/include/asm/thread_info.h
+++ b/arch/m68k/include/asm/thread_info.h
@@ -4,7 +4,6 @@
 
 #include <asm/types.h>
 #include <asm/page.h>
-#include <asm/segment.h>
 
 /*
  * On machines with 4k pages we default to an 8k thread size, though we
@@ -27,7 +26,6 @@
 struct thread_info {
 	struct task_struct	*task;		/* main task structure */
 	unsigned long		flags;
-	mm_segment_t		addr_limit;	/* thread address space */
 	int			preempt_count;	/* 0 => preemptable, <0 => BUG */
 	__u32			cpu;		/* should always be 0 on m68k */
 	unsigned long		tp_value;	/* thread pointer */
@@ -37,7 +35,6 @@ struct thread_info {
 #define INIT_THREAD_INFO(tsk)			\
 {						\
 	.task		= &tsk,			\
-	.addr_limit	= KERNEL_DS,		\
 	.preempt_count	= INIT_PREEMPT_COUNT,	\
 }
 
diff --git a/arch/m68k/include/asm/tlbflush.h b/arch/m68k/include/asm/tlbflush.h
index a6318ccd308f..b882e2f4f551 100644
--- a/arch/m68k/include/asm/tlbflush.h
+++ b/arch/m68k/include/asm/tlbflush.h
@@ -13,13 +13,12 @@ static inline void flush_tlb_kernel_page(void *addr)
 	if (CPU_IS_COLDFIRE) {
 		mmu_write(MMUOR, MMUOR_CNL);
 	} else if (CPU_IS_040_OR_060) {
-		mm_segment_t old_fs = get_fs();
-		set_fs(KERNEL_DS);
+		set_fc(SUPER_DATA);
 		__asm__ __volatile__(".chip 68040\n\t"
 				     "pflush (%0)\n\t"
 				     ".chip 68k"
 				     : : "a" (addr));
-		set_fs(old_fs);
+		set_fc(USER_DATA);
 	} else if (CPU_IS_020_OR_030)
 		__asm__ __volatile__("pflush #4,#4,(%0)" : : "a" (addr));
 }
@@ -84,12 +83,8 @@ static inline void flush_tlb_mm(struct mm_struct *mm)
 
 static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 {
-	if (vma->vm_mm == current->active_mm) {
-		mm_segment_t old_fs = force_uaccess_begin();
-
+	if (vma->vm_mm == current->active_mm)
 		__flush_tlb_one(addr);
-		force_uaccess_end(old_fs);
-	}
 }
 
 static inline void flush_tlb_range(struct vm_area_struct *vma,
diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h
index 2bedb1b8c4bf..a308f0b41154 100644
--- a/arch/m68k/include/asm/uaccess.h
+++ b/arch/m68k/include/asm/uaccess.h
@@ -9,7 +9,6 @@
  */
 #include <linux/compiler.h>
 #include <linux/types.h>
-#include <asm/segment.h>
 #include <asm/extable.h>
 
 /* We let the MMU do all checking */
@@ -249,9 +248,6 @@ unsigned long raw_copy_from_user(void *to, const void __user *from,
 unsigned long raw_copy_to_user(void __user *to, const void *from,
 		unsigned long n);
 
-#define user_addr_max() \
-	(uaccess_kernel() ? ~0UL : TASK_SIZE)
-
 extern long strncpy_from_user(char *dst, const char __user *src, long count);
 extern __must_check long strnlen_user(const char __user *str, long n);
 
diff --git a/arch/m68k/kernel/asm-offsets.c b/arch/m68k/kernel/asm-offsets.c
index ccea355052ef..906d73230537 100644
--- a/arch/m68k/kernel/asm-offsets.c
+++ b/arch/m68k/kernel/asm-offsets.c
@@ -31,7 +31,7 @@ int main(void)
 	DEFINE(THREAD_KSP, offsetof(struct thread_struct, ksp));
 	DEFINE(THREAD_USP, offsetof(struct thread_struct, usp));
 	DEFINE(THREAD_SR, offsetof(struct thread_struct, sr));
-	DEFINE(THREAD_FS, offsetof(struct thread_struct, fs));
+	DEFINE(THREAD_FC, offsetof(struct thread_struct, fc));
 	DEFINE(THREAD_CRP, offsetof(struct thread_struct, crp));
 	DEFINE(THREAD_ESP0, offsetof(struct thread_struct, esp0));
 	DEFINE(THREAD_FPREG, offsetof(struct thread_struct, fp));
diff --git a/arch/m68k/kernel/entry.S b/arch/m68k/kernel/entry.S
index 9dd76fbb7c6b..63434f331561 100644
--- a/arch/m68k/kernel/entry.S
+++ b/arch/m68k/kernel/entry.S
@@ -36,7 +36,6 @@
 #include <linux/linkage.h>
 #include <asm/errno.h>
 #include <asm/setup.h>
-#include <asm/segment.h>
 #include <asm/traps.h>
 #include <asm/unistd.h>
 #include <asm/asm-offsets.h>
@@ -338,7 +337,7 @@ resume:
 
 	/* save fs (sfc,%dfc) (may be pointing to kernel memory) */
 	movec	%sfc,%d0
-	movew	%d0,%a0@(TASK_THREAD+THREAD_FS)
+	movew	%d0,%a0@(TASK_THREAD+THREAD_FC)
 
 	/* save usp */
 	/* it is better to use a movel here instead of a movew 8*) */
@@ -424,7 +423,7 @@ resume:
 	movel	%a0,%usp
 
 	/* restore fs (sfc,%dfc) */
-	movew	%a1@(TASK_THREAD+THREAD_FS),%a0
+	movew	%a1@(TASK_THREAD+THREAD_FC),%a0
 	movec	%a0,%sfc
 	movec	%a0,%dfc
 
diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c
index db49f9091711..1ab692b952cd 100644
--- a/arch/m68k/kernel/process.c
+++ b/arch/m68k/kernel/process.c
@@ -92,7 +92,7 @@ void show_regs(struct pt_regs * regs)
 
 void flush_thread(void)
 {
-	current->thread.fs = __USER_DS;
+	current->thread.fc = USER_DATA;
 #ifdef CONFIG_FPU
 	if (!FPU_IS_EMU) {
 		unsigned long zero = 0;
@@ -155,7 +155,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 	 * Must save the current SFC/DFC value, NOT the value when
 	 * the parent was last descheduled - RGH  10-08-96
 	 */
-	p->thread.fs = get_fs().seg;
+	p->thread.fc = USER_DATA;
 
 	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
 		/* kernel thread */
diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
index 9e1261462bcc..c5e0a4f93bd5 100644
--- a/arch/m68k/kernel/traps.c
+++ b/arch/m68k/kernel/traps.c
@@ -181,9 +181,8 @@ static inline void access_error060 (struct frame *fp)
 static inline unsigned long probe040(int iswrite, unsigned long addr, int wbs)
 {
 	unsigned long mmusr;
-	mm_segment_t old_fs = get_fs();
 
-	set_fs(MAKE_MM_SEG(wbs));
+	set_fc(wbs);
 
 	if (iswrite)
 		asm volatile (".chip 68040; ptestw (%0); .chip 68k" : : "a" (addr));
@@ -192,7 +191,7 @@ static inline unsigned long probe040(int iswrite, unsigned long addr, int wbs)
 
 	asm volatile (".chip 68040; movec %%mmusr,%0; .chip 68k" : "=r" (mmusr));
 
-	set_fs(old_fs);
+	set_fc(USER_DATA);
 
 	return mmusr;
 }
@@ -201,10 +200,8 @@ static inline int do_040writeback1(unsigned short wbs, unsigned long wba,
 				   unsigned long wbd)
 {
 	int res = 0;
-	mm_segment_t old_fs = get_fs();
 
-	/* set_fs can not be moved, otherwise put_user() may oops */
-	set_fs(MAKE_MM_SEG(wbs));
+	set_fc(wbs);
 
 	switch (wbs & WBSIZ_040) {
 	case BA_SIZE_BYTE:
@@ -218,9 +215,7 @@ static inline int do_040writeback1(unsigned short wbs, unsigned long wba,
 		break;
 	}
 
-	/* set_fs can not be moved, otherwise put_user() may oops */
-	set_fs(old_fs);
-
+	set_fc(USER_DATA);
 
 	pr_debug("do_040writeback1, res=%d\n", res);
 
diff --git a/arch/m68k/mac/misc.c b/arch/m68k/mac/misc.c
index 90f4e9ca1276..4fab34791758 100644
--- a/arch/m68k/mac/misc.c
+++ b/arch/m68k/mac/misc.c
@@ -18,7 +18,6 @@
 
 #include <linux/uaccess.h>
 #include <asm/io.h>
-#include <asm/segment.h>
 #include <asm/setup.h>
 #include <asm/macintosh.h>
 #include <asm/mac_via.h>
diff --git a/arch/m68k/mm/cache.c b/arch/m68k/mm/cache.c
index b486c0889eec..d26eb1a591d3 100644
--- a/arch/m68k/mm/cache.c
+++ b/arch/m68k/mm/cache.c
@@ -12,7 +12,8 @@
 #include <asm/traps.h>
 
 
-static unsigned long virt_to_phys_slow(unsigned long vaddr)
+static unsigned long virt_to_phys_slow(unsigned long vaddr,
+		unsigned long seg)
 {
 	if (CPU_IS_060) {
 		unsigned long paddr;
@@ -55,7 +56,7 @@ static unsigned long virt_to_phys_slow(unsigned long vaddr)
 		asm volatile ("ptestr %3,%2@,#7,%0\n\t"
 			      "pmove %%psr,%1"
 			      : "=a&" (descaddr), "=m" (mmusr)
-			      : "a" (vaddr), "d" (get_fs().seg));
+			      : "a" (vaddr), "d" (seg));
 		if (mmusr & (MMU_I|MMU_B|MMU_L))
 			return 0;
 		descaddr = phys_to_virt((unsigned long)descaddr);
@@ -73,7 +74,8 @@ static unsigned long virt_to_phys_slow(unsigned long vaddr)
 
 /* Push n pages at kernel virtual address and clear the icache */
 /* RZ: use cpush %bc instead of cpush %dc, cinv %ic */
-void flush_icache_user_range(unsigned long address, unsigned long endaddr)
+static void __flush_icache_range(unsigned long address, unsigned long endaddr,
+		unsigned long seg)
 {
 	if (CPU_IS_COLDFIRE) {
 		unsigned long start, end;
@@ -92,7 +94,7 @@ void flush_icache_user_range(unsigned long address, unsigned long endaddr)
 				      ".chip 68040\n\t"
 				      "cpushp %%bc,(%0)\n\t"
 				      ".chip 68k"
-				      : : "a" (virt_to_phys_slow(address)));
+				      : : "a" (virt_to_phys_slow(address, seg)));
 			address += PAGE_SIZE;
 		} while (address < endaddr);
 	} else {
@@ -105,13 +107,16 @@ void flush_icache_user_range(unsigned long address, unsigned long endaddr)
 	}
 }
 
-void flush_icache_range(unsigned long address, unsigned long endaddr)
+void flush_icache_user_range(unsigned long address, unsigned long endaddr)
 {
-	mm_segment_t old_fs = get_fs();
+	__flush_icache_range(address, endaddr, USER_DATA);
+}
 
-	set_fs(KERNEL_DS);
-	flush_icache_user_range(address, endaddr);
-	set_fs(old_fs);
+void flush_icache_range(unsigned long address, unsigned long endaddr)
+{
+	set_fc(SUPER_DATA);
+	__flush_icache_range(address, endaddr, SUPER_DATA);
+	set_fc(USER_DATA);
 }
 EXPORT_SYMBOL(flush_icache_range);
 
diff --git a/arch/m68k/mm/init.c b/arch/m68k/mm/init.c
index 5d749e188246..133cd8802a78 100644
--- a/arch/m68k/mm/init.c
+++ b/arch/m68k/mm/init.c
@@ -76,7 +76,7 @@ void __init paging_init(void)
 	/*
 	 * Set up SFC/DFC registers (user data space).
 	 */
-	set_fs (USER_DS);
+	set_fc(USER_DATA);
 
 	max_zone_pfn[ZONE_DMA] = end_mem >> PAGE_SHIFT;
 	free_area_init(max_zone_pfn);
diff --git a/arch/m68k/mm/kmap.c b/arch/m68k/mm/kmap.c
index 1269d513b221..20ddf71b43d0 100644
--- a/arch/m68k/mm/kmap.c
+++ b/arch/m68k/mm/kmap.c
@@ -17,7 +17,6 @@
 #include <linux/vmalloc.h>
 
 #include <asm/setup.h>
-#include <asm/segment.h>
 #include <asm/page.h>
 #include <asm/io.h>
 #include <asm/tlbflush.h>
diff --git a/arch/m68k/mm/memory.c b/arch/m68k/mm/memory.c
index fe75aecfb238..c2c03b0a1567 100644
--- a/arch/m68k/mm/memory.c
+++ b/arch/m68k/mm/memory.c
@@ -15,7 +15,6 @@
 #include <linux/gfp.h>
 
 #include <asm/setup.h>
-#include <asm/segment.h>
 #include <asm/page.h>
 #include <asm/traps.h>
 #include <asm/machdep.h>
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index 3a653f0a4188..9f3f77785aa7 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -467,7 +467,7 @@ void __init paging_init(void)
 	/*
 	 * Set up SFC/DFC registers
 	 */
-	set_fs(KERNEL_DS);
+	set_fc(USER_DATA);
 
 #ifdef DEBUG
 	printk ("before free_area_init\n");
diff --git a/arch/m68k/sun3/config.c b/arch/m68k/sun3/config.c
index f7dd47232b6c..203f428a0344 100644
--- a/arch/m68k/sun3/config.c
+++ b/arch/m68k/sun3/config.c
@@ -31,7 +31,6 @@
 #include <asm/intersil.h>
 #include <asm/irq.h>
 #include <asm/sections.h>
-#include <asm/segment.h>
 #include <asm/sun3ints.h>
 
 char sun3_reserved_pmeg[SUN3_PMEGS_NUM];
@@ -89,7 +88,7 @@ void __init sun3_init(void)
 	sun3_reserved_pmeg[249] = 1;
 	sun3_reserved_pmeg[252] = 1;
 	sun3_reserved_pmeg[253] = 1;
-	set_fs(KERNEL_DS);
+	set_fc(USER_DATA);
 }
 
 /* Without this, Bad Things happen when something calls arch_reset. */
diff --git a/arch/m68k/sun3/mmu_emu.c b/arch/m68k/sun3/mmu_emu.c
index 7aa879b7c7ff..7ec20817c0c9 100644
--- a/arch/m68k/sun3/mmu_emu.c
+++ b/arch/m68k/sun3/mmu_emu.c
@@ -23,7 +23,6 @@
 #include <linux/uaccess.h>
 #include <asm/page.h>
 #include <asm/sun3mmu.h>
-#include <asm/segment.h>
 #include <asm/oplib.h>
 #include <asm/mmu_context.h>
 #include <asm/dvma.h>
@@ -191,14 +190,13 @@ void __init mmu_emu_init(unsigned long bootmem_end)
 	for(seg = 0; seg < PAGE_OFFSET; seg += SUN3_PMEG_SIZE)
 		sun3_put_segmap(seg, SUN3_INVALID_PMEG);
 
-	set_fs(MAKE_MM_SEG(3));
+	set_fc(3);
 	for(seg = 0; seg < 0x10000000; seg += SUN3_PMEG_SIZE) {
 		i = sun3_get_segmap(seg);
 		for(j = 1; j < CONTEXTS_NUM; j++)
 			(*(romvec->pv_setctxt))(j, (void *)seg, i);
 	}
-	set_fs(KERNEL_DS);
-
+	set_fc(USER_DATA);
 }
 
 /* erase the mappings for a dead context.  Uses the pg_dir for hints
diff --git a/arch/m68k/sun3/sun3ints.c b/arch/m68k/sun3/sun3ints.c
index 41ae422119d3..36cc280a4505 100644
--- a/arch/m68k/sun3/sun3ints.c
+++ b/arch/m68k/sun3/sun3ints.c
@@ -11,7 +11,6 @@
 #include <linux/sched.h>
 #include <linux/kernel_stat.h>
 #include <linux/interrupt.h>
-#include <asm/segment.h>
 #include <asm/intersil.h>
 #include <asm/oplib.h>
 #include <asm/sun3ints.h>
diff --git a/arch/m68k/sun3x/prom.c b/arch/m68k/sun3x/prom.c
index 74d2fe57524b..64c23bfaa90c 100644
--- a/arch/m68k/sun3x/prom.c
+++ b/arch/m68k/sun3x/prom.c
@@ -14,7 +14,6 @@
 #include <asm/traps.h>
 #include <asm/sun3xprom.h>
 #include <asm/idprom.h>
-#include <asm/segment.h>
 #include <asm/sun3ints.h>
 #include <asm/openprom.h>
 #include <asm/machines.h>
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-09  7:01 RFC: remove set_fs for m68k Christoph Hellwig
                   ` (6 preceding siblings ...)
  2021-07-09  7:01 ` [PATCH 7/7] m68k: remove set_fs() Christoph Hellwig
@ 2021-07-11  7:20 ` Michael Schmitz
  2021-07-12  9:50   ` Christoph Hellwig
                     ` (2 more replies)
  7 siblings, 3 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-07-11  7:20 UTC (permalink / raw)
  To: Christoph Hellwig, Geert Uytterhoeven, Greg Ungerer; +Cc: linux-m68k

Hi Christoph,

Am 09.07.2021 um 19:01 schrieb Christoph Hellwig:
> Hi all,
>
> this series converts m68k to the new world where set_fs is gone to make
> the uaccess routines work on kernel pointers.  Note that it is compile
> tested only and thus needs to be handled with care.

Testing your series, I've got this in the console log:

[16534.130000] *** FORMAT ERROR ***   FORMAT=0
[16534.150000] Current process id is 1347
[16534.160000] BAD KERNEL TRAP: 00000000
[16534.180000] Modules linked in: atari_scsi ne 8390p
[16534.210000] PC: [<00002a8c>] resume_userspace+0x14/0x16
[16534.230000] SR: 2200  SP: d1b6e0e8  a2: 00000000
[16534.240000] d0: 0000001e    d1: 00000003    d2: 00000578    d3: 00000000
[16534.270000] d4: ffffffff    d5: 00000001    a0: 00000578    a1: 00000080
[16534.300000] Process savelog (pid: 1347, task=e3955528)
[16534.310000] Frame format=0
[16534.330000] Stack from 005e9fa4:
[16534.330000]         02108005 0d06b008 1eeeb649 007e0001 00aea040 
003ec318 005e9e68 000279a2
[16534.330000]         000b0000 00000000 00000000 030dfffb 0044fffa 
0e000000 fffa1a00 fffa1c00
[16534.330000]         fffa1e00 fffb0e40 fffb0e80 00049e02 00000040 
0085a800 00000001
[16534.430000] Call Trace: [<000279a2>] warn_slowpath_fmt+0x0/0x62
[16534.460000]  [<000b0000>] vm_map_ram+0x144/0x5ae
[16534.490000]  [<00049e02>] __handle_irq_event_percpu+0x38/0xce
[16534.520000]
[16534.540000] Code: 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 <9f38> 9f38 
9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38

Never seen the like of it before...

Cheers,

	Michael




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-11  7:20 ` RFC: remove set_fs for m68k Michael Schmitz
@ 2021-07-12  9:50   ` Christoph Hellwig
  2021-07-12 10:20   ` Andreas Schwab
  2021-07-12 19:04   ` Michael Schmitz
  2 siblings, 0 replies; 73+ messages in thread
From: Christoph Hellwig @ 2021-07-12  9:50 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Christoph Hellwig, Geert Uytterhoeven, Greg Ungerer, linux-m68k

On Sun, Jul 11, 2021 at 07:20:26PM +1200, Michael Schmitz wrote:
> Hi Christoph,
>
> Am 09.07.2021 um 19:01 schrieb Christoph Hellwig:
>> Hi all,
>>
>> this series converts m68k to the new world where set_fs is gone to make
>> the uaccess routines work on kernel pointers.  Note that it is compile
>> tested only and thus needs to be handled with care.
>
> Testing your series, I've got this in the console log:

This comes from bad_super_trap, but at this point my knowledge of
the m68k low-level details is over and I can't help much.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-11  7:20 ` RFC: remove set_fs for m68k Michael Schmitz
  2021-07-12  9:50   ` Christoph Hellwig
@ 2021-07-12 10:20   ` Andreas Schwab
  2021-07-12 19:12     ` Michael Schmitz
  2021-07-12 19:04   ` Michael Schmitz
  2 siblings, 1 reply; 73+ messages in thread
From: Andreas Schwab @ 2021-07-12 10:20 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Christoph Hellwig, Geert Uytterhoeven, Greg Ungerer, linux-m68k

On Jul 11 2021, Michael Schmitz wrote:

> Testing your series, I've got this in the console log:
>
> [16534.130000] *** FORMAT ERROR ***   FORMAT=0
> [16534.150000] Current process id is 1347
> [16534.160000] BAD KERNEL TRAP: 00000000
> [16534.180000] Modules linked in: atari_scsi ne 8390p
> [16534.210000] PC: [<00002a8c>] resume_userspace+0x14/0x16
> [16534.230000] SR: 2200  SP: d1b6e0e8  a2: 00000000
> [16534.240000] d0: 0000001e    d1: 00000003    d2: 00000578    d3: 00000000
> [16534.270000] d4: ffffffff    d5: 00000001    a0: 00000578    a1: 00000080
> [16534.300000] Process savelog (pid: 1347, task=e3955528)
> [16534.310000] Frame format=0
> [16534.330000] Stack from 005e9fa4:
> [16534.330000]         02108005 0d06b008 1eeeb649 007e0001 00aea040
> 003ec318 005e9e68 000279a2
> [16534.330000]         000b0000 00000000 00000000 030dfffb 0044fffa
> 0e000000 fffa1a00 fffa1c00
> [16534.330000]         fffa1e00 fffb0e40 fffb0e80 00049e02 00000040
> 0085a800 00000001
> [16534.430000] Call Trace: [<000279a2>] warn_slowpath_fmt+0x0/0x62
> [16534.460000]  [<000b0000>] vm_map_ram+0x144/0x5ae
> [16534.490000]  [<00049e02>] __handle_irq_event_percpu+0x38/0xce
> [16534.520000]
> [16534.540000] Code: 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 <9f38> 9f38
> 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38

That looks like something overwrote the kernel code. 0x9f38 isn't an
insn that you would ever see in the kernel.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-11  7:20 ` RFC: remove set_fs for m68k Michael Schmitz
  2021-07-12  9:50   ` Christoph Hellwig
  2021-07-12 10:20   ` Andreas Schwab
@ 2021-07-12 19:04   ` Michael Schmitz
  2 siblings, 0 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-07-12 19:04 UTC (permalink / raw)
  To: Christoph Hellwig, Geert Uytterhoeven, Greg Ungerer; +Cc: linux-m68k

On 11/07/21 7:20 pm, Michael Schmitz wrote:

> Hi Christoph,
>
> Am 09.07.2021 um 19:01 schrieb Christoph Hellwig:
>> Hi all,
>>
>> this series converts m68k to the new world where set_fs is gone to make
>> the uaccess routines work on kernel pointers.  Note that it is compile
>> tested only and thus needs to be handled with care.
>
> Testing your series, I've got this in the console log:
>
> [16534.130000] *** FORMAT ERROR ***   FORMAT=0
> [16534.150000] Current process id is 1347
> [16534.160000] BAD KERNEL TRAP: 00000000
> [16534.180000] Modules linked in: atari_scsi ne 8390p
> [16534.210000] PC: [<00002a8c>] resume_userspace+0x14/0x16
> [16534.230000] SR: 2200  SP: d1b6e0e8  a2: 00000000
> [16534.240000] d0: 0000001e    d1: 00000003    d2: 00000578    d3: 
> 00000000
> [16534.270000] d4: ffffffff    d5: 00000001    a0: 00000578    a1: 
> 00000080
> [16534.300000] Process savelog (pid: 1347, task=e3955528)
> [16534.310000] Frame format=0
> [16534.330000] Stack from 005e9fa4:
> [16534.330000]         02108005 0d06b008 1eeeb649 007e0001 00aea040 
> 003ec318 005e9e68 000279a2
> [16534.330000]         000b0000 00000000 00000000 030dfffb 0044fffa 
> 0e000000 fffa1a00 fffa1c00
> [16534.330000]         fffa1e00 fffb0e40 fffb0e80 00049e02 00000040 
> 0085a800 00000001
> [16534.430000] Call Trace: [<000279a2>] warn_slowpath_fmt+0x0/0x62
> [16534.460000]  [<000b0000>] vm_map_ram+0x144/0x5ae
> [16534.490000]  [<00049e02>] __handle_irq_event_percpu+0x38/0xce
> [16534.520000]
> [16534.540000] Code: 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 <9f38> 
> 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 
> 9f38
>
> Never seen the like of it before...
>
Now the good news is, the IO test I ran at that time continued OK and 
completed normally (aside from the OOM killer hitting another process). 
I'm using IO to three 24 MB files each on five partitions concurrently 
with 14 MB of RAM (9,7 MB total RAM remaining once the kernel has 
loaded) so memory may be a little tight.

Any reason to assume the warn_slowpath_fmt() here is related to the 
warning you stuck in set_segment()?

Cheers,

     Michael



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-12 10:20   ` Andreas Schwab
@ 2021-07-12 19:12     ` Michael Schmitz
  2021-07-13  5:41       ` Christoph Hellwig
  0 siblings, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-07-12 19:12 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Christoph Hellwig, Geert Uytterhoeven, Greg Ungerer, linux-m68k

Hi Andreas,

On 12/07/21 10:20 pm, Andreas Schwab wrote:
> On Jul 11 2021, Michael Schmitz wrote:
>
>> Testing your series, I've got this in the console log:
>>
>> [16534.130000] *** FORMAT ERROR ***   FORMAT=0
>> [16534.150000] Current process id is 1347
>> [16534.160000] BAD KERNEL TRAP: 00000000
>> [16534.180000] Modules linked in: atari_scsi ne 8390p
>> [16534.210000] PC: [<00002a8c>] resume_userspace+0x14/0x16
>> [16534.230000] SR: 2200  SP: d1b6e0e8  a2: 00000000
>> [16534.240000] d0: 0000001e    d1: 00000003    d2: 00000578    d3: 00000000
>> [16534.270000] d4: ffffffff    d5: 00000001    a0: 00000578    a1: 00000080
>> [16534.300000] Process savelog (pid: 1347, task=e3955528)
>> [16534.310000] Frame format=0
>> [16534.330000] Stack from 005e9fa4:
>> [16534.330000]         02108005 0d06b008 1eeeb649 007e0001 00aea040
>> 003ec318 005e9e68 000279a2
>> [16534.330000]         000b0000 00000000 00000000 030dfffb 0044fffa
>> 0e000000 fffa1a00 fffa1c00
>> [16534.330000]         fffa1e00 fffb0e40 fffb0e80 00049e02 00000040
>> 0085a800 00000001
>> [16534.430000] Call Trace: [<000279a2>] warn_slowpath_fmt+0x0/0x62
>> [16534.460000]  [<000b0000>] vm_map_ram+0x144/0x5ae
>> [16534.490000]  [<00049e02>] __handle_irq_event_percpu+0x38/0xce
>> [16534.520000]
>> [16534.540000] Code: 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 <9f38> 9f38
>> 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38 9f38
> That looks like something overwrote the kernel code. 0x9f38 isn't an
> insn that you would ever see in the kernel.

That confirms my guess - can we rely on the call trace in that case? And 
how does overwriting kernel code at that address tee up with the kernel 
still happily chugging along?

Cheers,

     Michael


>
> Andreas.
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-12 19:12     ` Michael Schmitz
@ 2021-07-13  5:41       ` Christoph Hellwig
  2021-07-13  8:16         ` Michael Schmitz
  0 siblings, 1 reply; 73+ messages in thread
From: Christoph Hellwig @ 2021-07-13  5:41 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Andreas Schwab, Christoph Hellwig, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

On Tue, Jul 13, 2021 at 07:12:45AM +1200, Michael Schmitz wrote:
> That confirms my guess - can we rely on the call trace in that case? And 
> how does overwriting kernel code at that address tee up with the kernel 
> still happily chugging along?

So if you remove the WARN_ON everything keeps running just fine?  I
wonder if in_interrupt somehow is not exact in m68k.  Or we actually
call it from an interrupt.  Let me try to unwind the various call
chains.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-13  5:41       ` Christoph Hellwig
@ 2021-07-13  8:16         ` Michael Schmitz
  2021-07-13  8:54           ` Christoph Hellwig
  0 siblings, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-07-13  8:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k

Hi Christoph,

Am 13.07.2021 um 17:41 schrieb Christoph Hellwig:
> On Tue, Jul 13, 2021 at 07:12:45AM +1200, Michael Schmitz wrote:
>> That confirms my guess - can we rely on the call trace in that case? And
>> how does overwriting kernel code at that address tee up with the kernel
>> still happily chugging along?
>
> So if you remove the WARN_ON everything keeps running just fine?  I

I'll try that next. It certainly ran fine when I tried your earlier 
version which still had the __constant_copy_from_user() (but modified to 
use __get_user_asm() for the 1, 2 and 4 byte cases). That doesn't prove 
much though - I may not have hit the exact same memory pressure (not 
sure how often savelogs runs on that system, and I haven't checked how 
much the kernel size differs between the two versions).

Anyway, I'll try my previous version again, and I'll try with the 
WARN_ON removed, but that'll take a few days (did I mention this 030 is 
clocked at 16 MHz only?).

> wonder if in_interrupt somehow is not exact in m68k.  Or we actually
> call it from an interrupt.  Let me try to unwind the various call
> chains.

I suspect it may get called from an interrupt - not sure what interrupt 
handler would call vm_map_ram(), or if that even is allowed though. 
Might happen during a softirq, which is covered by in_interrupt as well ...

Cheers,

	Michael






^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-13  8:16         ` Michael Schmitz
@ 2021-07-13  8:54           ` Christoph Hellwig
  2021-07-14 19:26             ` Michael Schmitz
  0 siblings, 1 reply; 73+ messages in thread
From: Christoph Hellwig @ 2021-07-13  8:54 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Christoph Hellwig, Andreas Schwab, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

On Tue, Jul 13, 2021 at 08:16:25PM +1200, Michael Schmitz wrote:
> I'll try that next. It certainly ran fine when I tried your earlier version 
> which still had the __constant_copy_from_user() (but modified to use 
> __get_user_asm() for the 1, 2 and 4 byte cases). That doesn't prove much 
> though - I may not have hit the exact same memory pressure (not sure how 
> often savelogs runs on that system, and I haven't checked how much the 
> kernel size differs between the two versions).
>
> Anyway, I'll try my previous version again, and I'll try with the WARN_ON 
> removed, but that'll take a few days (did I mention this 030 is clocked at 
> 16 MHz only?).

No problem.

> I suspect it may get called from an interrupt - not sure what interrupt 
> handler would call vm_map_ram(), or if that even is allowed though. Might 
> happen during a softirq, which is covered by in_interrupt as well ...

vm_map_ram isn't allowed to be called from interrupts, just like all the
vmalloc/vmap code.  But that doesn't mean it might not have crept in
somewhere.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-13  8:54           ` Christoph Hellwig
@ 2021-07-14 19:26             ` Michael Schmitz
  2021-07-14 20:03               ` Andreas Schwab
  2021-07-16  2:03               ` Michael Schmitz
  0 siblings, 2 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-07-14 19:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

Am 13.07.2021 um 20:54 schrieb Christoph Hellwig:
> On Tue, Jul 13, 2021 at 08:16:25PM +1200, Michael Schmitz wrote:
>> I'll try that next. It certainly ran fine when I tried your earlier version
>> which still had the __constant_copy_from_user() (but modified to use
>> __get_user_asm() for the 1, 2 and 4 byte cases). That doesn't prove much
>> though - I may not have hit the exact same memory pressure (not sure how
>> often savelogs runs on that system, and I haven't checked how much the
>> kernel size differs between the two versions).
>>
>> Anyway, I'll try my previous version again, and I'll try with the WARN_ON

That one crashed even harder - again with the same odd repetitive 
pattern around the trap PC, but resulting in a kernel panic this time:

[25187.890000] Data read fault at 0x00000000 in CPU (pc=0x80050d06)
[25187.900000] *** LINE 1010 ***   FORMAT=0
[25187.900000] Current process id is 1409
[25187.900000] BAD KERNEL TRAP: 00000000
[25187.900000] Modules linked in: atari_scsi ne 8390p
[25187.900000] PC: [<0000f85c>] PITBL+0xc8/0x410
[25187.900000] SR: 2608  SP: e17e02dc  a2: 004a28ae
[25187.900000] d0: 00000000    d1: 00000800    d2: 00000000    d3: 00000040
[25187.900000] d4: 00000000    d5: 00049ef6    a0: 0000f858    a1: 003ee318
[25187.900000] Process savelog (pid: 1409, task=59e59dd6)
[25187.900000] Frame format=0
[25187.900000] Stack from 00593d24:
[25187.900000]         00049f6e 00000040 004b0800 00000001 00000040 0000000
0 00049ef6 002c7bb2
[25187.900000]         003ee318 003ec508 00593de8 00593d68 0004a01e 003ee31
8 00593d68 003ee318
[25187.900000]         000279fe 00000000 0004a06e 003ee318 003ee318 0004c47
8 003ee318 003ee318
[25187.900000]         000499b8 000499de 003ee318 00007a82 00000040 0000000
0 0000000c 00803100
[25187.900000]         00049f6e 0000000c 003b489c 00000000 00046a12 0000000
0 00000000 00000001
[25187.900000]         003ec508 00000000 00049802 00593de8 0004a01e 003ec50
8 00593de8 003ec508
[25187.900000] Call Trace: [<00049f6e>] __handle_irq_event_percpu+0x38/0xce
[25187.900000]  [<00049ef6>] __irq_wake_thread+0x0/0x40
[25187.900000]  [<002c7bb2>] printk+0x0/0x18
[25187.900000]  [<0004a01e>] handle_irq_event_percpu+0x1a/0x4c
[25187.900000]  [<000279fe>] warn_slowpath_fmt+0x0/0x62
[25187.900000]  [<0004a06e>] handle_irq_event+0x1e/0x30
[25187.900000]  [<0004c478>] handle_simple_irq+0x48/0x4e
[25187.900000]  [<000499b8>] generic_handle_irq+0x0/0x30
[25187.900000]  [<000499de>] generic_handle_irq+0x26/0x30
[25187.900000]  [<00007a82>] mfp_timer_d_handler+0x24/0x34
[25187.900000]  [<00049f6e>] __handle_irq_event_percpu+0x38/0xce
[25187.900000]  [<00046a12>] msg_print_ext_body+0x0/0x6a
[25187.900000]  [<00049802>] prb_read_valid+0x0/0x1e
[25187.900000]  [<0004a01e>] handle_irq_event_percpu+0x1a/0x4c
[25187.900000]  [<00046640>] info_print_ext_header.constprop.35+0x0/0x8a
[25187.900000]  [<0004a06e>] handle_irq_event+0x1e/0x30
[25187.900000]  [<0004c478>] handle_simple_irq+0x48/0x4e
[25187.900000]  [<000499de>] generic_handle_irq+0x26/0x30
[25187.900000]  [<00002c28>] do_IRQ+0x20/0x32
[25187.900000]  [<00002204>] do_one_initcall+0x90/0x150
[25187.900000]  [<00002b38>] user_irqvec_fixup+0xc/0x14
[25187.900000]  [<00002204>] do_one_initcall+0x90/0x150
[25187.900000]  [<00046a12>] msg_print_ext_body+0x0/0x6a
[25187.900000]  [<00048912>] __printk_safe_exit+0x0/0x10
[25187.900000]  [<000078de>] atari_scc_console_write+0x0/0x4a
[25187.900000]  [<00048912>] __printk_safe_exit+0x0/0x10
[25187.900000]  [<00048902>] __printk_safe_enter+0x0/0x10
[25187.900000]  [<000016e8>] kernel_pg_dir+0x6e8/0x1000
[25187.900000]  [<000484be>] vprintk_emit+0xe8/0xfa
[25187.900000]  [<0000376f>] show_cpuinfo+0x14f/0x19a
[25187.900000]  [<000484e6>] vprintk_default+0x16/0x1c
[25187.900000]  [<002c7bc4>] printk+0x12/0x18
[25187.900000]  [<00005786>] buserr_c+0x26a/0x4a8
[25187.900000]  [<00002934>] buserr+0x20/0x28
[25187.900000]  [<0000f85a>] PITBL+0xc6/0x410
[25187.900000]  [<000279fe>] warn_slowpath_fmt+0x0/0x62
[25187.900000]  [<0000f85a>] PITBL+0xc6/0x410
[25187.900000]  [<0000f31a>] WORK+0x88/0xca
[25187.900000]  [<00049f6e>] __handle_irq_event_percpu+0x38/0xce
[25187.900000]
[25187.900000] Code: 3cb8 3cb8 3cb8 3cb8 3cb8 3cb8 3cb8 3cb8 <3cb8> 3cb8 
3cb8 3cb8 3cb8 3cb8 3cb8 3cb8 3cb8 3cb8 3cb8 3cb8 3cb8 3cb8 3cb8 3cb8


I've got a vague recollection that I've seen weird crashes in the past 
related to temperature extremes (we've had a few unusually cold days in 
our parts just now), so I've gone back to a kernel from the switch stack 
/ refactoring exit tests (which ran the stress tests fine earlier) to 
rule that one out. Looking good so far, so I begin to wonder whether we 
need to introduce get_fc() and use that to restore the original sfc/dfc 
instead of assuming USER_DATA is always correct?

>> removed, but that'll take a few days (did I mention this 030 is clocked at
>> 16 MHz only?).
>
> No problem.
>
>> I suspect it may get called from an interrupt - not sure what interrupt
>> handler would call vm_map_ram(), or if that even is allowed though. Might
>> happen during a softirq, which is covered by in_interrupt as well ...
>
> vm_map_ram isn't allowed to be called from interrupts, just like all the
> vmalloc/vmap code.  But that doesn't mean it might not have crept in
> somewhere.

That's what I recalled, thanks.

Cheers,

	Michael


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-14 19:26             ` Michael Schmitz
@ 2021-07-14 20:03               ` Andreas Schwab
  2021-07-15  5:44                 ` Michael Schmitz
  2021-07-16  2:03               ` Michael Schmitz
  1 sibling, 1 reply; 73+ messages in thread
From: Andreas Schwab @ 2021-07-14 20:03 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Christoph Hellwig, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h
index a308f0b41154..578e643ec83d 100644
--- a/arch/m68k/include/asm/uaccess.h
+++ b/arch/m68k/include/asm/uaccess.h
@@ -193,22 +193,22 @@ do {									\
 
 #define __get_kernel_nofault(dst, src, type, err_label)			\
 do {									\
-	type __gk_dst = *(type *)(dst);					\
+	type *__gk_dst = (type *)(dst);					\
 	type *__gk_src = (type *)(src);					\
 	int __gk_err = 0;						\
 									\
 	switch (sizeof(type)) {						\
 	case 1:								\
-		__get_user_asm("move", __gk_err, __gk_dst, __gk_src, u8, b, d); \
+		__get_user_asm("move", __gk_err, *__gk_dst, __gk_src, u8, b, d); \
 		break;							\
 	case 2:								\
-		__get_user_asm("move", __gk_err, __gk_dst, __gk_src, u16, w, r); \
+		__get_user_asm("move", __gk_err, *__gk_dst, __gk_src, u16, w, r); \
 		break;							\
 	case 4:								\
-		__get_user_asm("move", __gk_err, __gk_dst, __gk_src, u32, l, r); \
+		__get_user_asm("move", __gk_err, *__gk_dst, __gk_src, u32, l, r); \
 		break;							\
 	case 8:								\
-		__get_user_asm8("move", __gk_err, __gk_dst, __gk_src);	\
+		__get_user_asm8("move", __gk_err, *__gk_dst, __gk_src);	\
 		break;							\
 	default:							\
 		BUILD_BUG();						\

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-14 20:03               ` Andreas Schwab
@ 2021-07-15  5:44                 ` Michael Schmitz
  0 siblings, 0 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-07-15  5:44 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Christoph Hellwig, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

Hi Andreas,

well spotted. I'll give that a try.

Cheers,

	Michael

Am 15.07.2021 um 08:03 schrieb Andreas Schwab:
> diff --git a/arch/m68k/include/asm/uaccess.h b/arch/m68k/include/asm/uaccess.h
> index a308f0b41154..578e643ec83d 100644
> --- a/arch/m68k/include/asm/uaccess.h
> +++ b/arch/m68k/include/asm/uaccess.h
> @@ -193,22 +193,22 @@ do {									\
>
>  #define __get_kernel_nofault(dst, src, type, err_label)			\
>  do {									\
> -	type __gk_dst = *(type *)(dst);					\
> +	type *__gk_dst = (type *)(dst);					\
>  	type *__gk_src = (type *)(src);					\
>  	int __gk_err = 0;						\
>  									\
>  	switch (sizeof(type)) {						\
>  	case 1:								\
> -		__get_user_asm("move", __gk_err, __gk_dst, __gk_src, u8, b, d); \
> +		__get_user_asm("move", __gk_err, *__gk_dst, __gk_src, u8, b, d); \
>  		break;							\
>  	case 2:								\
> -		__get_user_asm("move", __gk_err, __gk_dst, __gk_src, u16, w, r); \
> +		__get_user_asm("move", __gk_err, *__gk_dst, __gk_src, u16, w, r); \
>  		break;							\
>  	case 4:								\
> -		__get_user_asm("move", __gk_err, __gk_dst, __gk_src, u32, l, r); \
> +		__get_user_asm("move", __gk_err, *__gk_dst, __gk_src, u32, l, r); \
>  		break;							\
>  	case 8:								\
> -		__get_user_asm8("move", __gk_err, __gk_dst, __gk_src);	\
> +		__get_user_asm8("move", __gk_err, *__gk_dst, __gk_src);	\
>  		break;							\
>  	default:							\
>  		BUILD_BUG();						\
>
> Andreas.
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-14 19:26             ` Michael Schmitz
  2021-07-14 20:03               ` Andreas Schwab
@ 2021-07-16  2:03               ` Michael Schmitz
  2021-07-17  5:41                 ` Michael Schmitz
  1 sibling, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-07-16  2:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

Hi Christoph,

On 15/07/21 7:26 am, Michael Schmitz wrote:
> I've got a vague recollection that I've seen weird crashes in the past 
> related to temperature extremes (we've had a few unusually cold days 
> in our parts just now), so I've gone back to a kernel from the switch 
> stack / refactoring exit tests (which ran the stress tests fine 
> earlier) to rule that one out. Looking good so far, so I begin to 
> wonder whether we need to introduce get_fc() and use that to restore 
> the original sfc/dfc instead of assuming USER_DATA is always correct?

No crashes with the known good kernel after over a day of stress testing 
- I'll try Andreas' patch once the current test run has completed.

One thing I noticed with either your final or your v2 patch series - as 
far as the tests ran at all, run times were 30% increased. That's a lot 
more than I would have expected. Large byte size user copies were 
already handled by the code in arch/m68k/lib/uaccess.c which wasn't 
explicitly inlined though the calling raw_copy_to/from_user() was. Could 
the change to no longer inline raw_copy_to/from_user() have such a large 
impact?

Cheers,

     Michael



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-16  2:03               ` Michael Schmitz
@ 2021-07-17  5:41                 ` Michael Schmitz
  2021-07-18  1:14                   ` Michael Schmitz
  0 siblings, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-07-17  5:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

Am 16.07.2021 um 14:03 schrieb Michael Schmitz:
> Hi Christoph,
>
> On 15/07/21 7:26 am, Michael Schmitz wrote:
>> I've got a vague recollection that I've seen weird crashes in the past
>> related to temperature extremes (we've had a few unusually cold days
>> in our parts just now), so I've gone back to a kernel from the switch
>> stack / refactoring exit tests (which ran the stress tests fine
>> earlier) to rule that one out. Looking good so far, so I begin to
>> wonder whether we need to introduce get_fc() and use that to restore
>> the original sfc/dfc instead of assuming USER_DATA is always correct?
>
> No crashes with the known good kernel after over a day of stress testing
> - I'll try Andreas' patch once the current test run has completed.
>
> One thing I noticed with either your final or your v2 patch series - as
> far as the tests ran at all, run times were 30% increased. That's a lot

With Andreas' patch applied, the run time increase is now less severe 
(11-13%). I'll repeat that a few more times but it's looking a lot 
better so far. No instruction format errors seen anymore.

Cheers,

	Michael




^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-17  5:41                 ` Michael Schmitz
@ 2021-07-18  1:14                   ` Michael Schmitz
  2021-07-21 17:05                     ` Christoph Hellwig
  0 siblings, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-07-18  1:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds



Am 17.07.2021 um 17:41 schrieb Michael Schmitz:
> Am 16.07.2021 um 14:03 schrieb Michael Schmitz:
>> Hi Christoph,
>>
>> On 15/07/21 7:26 am, Michael Schmitz wrote:
>>> I've got a vague recollection that I've seen weird crashes in the past
>>> related to temperature extremes (we've had a few unusually cold days
>>> in our parts just now), so I've gone back to a kernel from the switch
>>> stack / refactoring exit tests (which ran the stress tests fine
>>> earlier) to rule that one out. Looking good so far, so I begin to
>>> wonder whether we need to introduce get_fc() and use that to restore
>>> the original sfc/dfc instead of assuming USER_DATA is always correct?
>>
>> No crashes with the known good kernel after over a day of stress testing
>> - I'll try Andreas' patch once the current test run has completed.
>>
>> One thing I noticed with either your final or your v2 patch series - as
>> far as the tests ran at all, run times were 30% increased. That's a lot
>
> With Andreas' patch applied, the run time increase is now less severe
> (11-13%). I'll repeat that a few more times but it's looking a lot
> better so far. No instruction format errors seen anymore.

Alas - got another one:


[124760.720000] *** FORMAT ERROR ***   FORMAT=0
[124760.740000] Current process id is 1108
[124760.750000] BAD KERNEL TRAP: 00000000
[124760.770000] Modules linked in: atari_scsi ne 8390p
[124760.800000] PC: [<00002a8c>] resume_userspace+0x14/0x16
[124760.820000] SR: 2200  SP: 96ae8faf  a2: efc67932
[124760.850000] d0: 00000047    d1: 0000000a    d2: 00000001    d3: 80007c40
[124760.880000] d4: 00000000    d5: 80004326    a0: 80009d78    a1: efc678f4
[124760.890000] Process syslogd (pid: 1108, task=742727e0)
[124760.920000] Frame format=0
[124760.930000] Stack from 00929fa4:
[124760.930000]         02048000 252cb008 0eee0749 660000c2 00929e18 
00000000 00000001 0003469a
[124760.930000]         00033fd8 0002f7a6 00000006 00000000 00000001 
0003469a 00513d00 00ad320c
[124760.930000]         00929e2c 0003e514 006616e0 00000003 00000000 
002c6f0a 00035946
[124761.030000] Call Trace: [<0003469a>] get_work_pool+0x0/0x38
[124761.060000]  [<00033fd8>] find_worker_executing_work+0x0/0x40
[124761.090000]  [<0002f7a6>] sys_rt_sigprocmask+0x5a/0x9a
[124761.100000]  [<0003469a>] get_work_pool+0x0/0x38
[124761.120000]  [<0003e514>] wake_up_process+0x12/0x16
[124761.150000]  [<002c6f0a>] printk+0x0/0x18
[124761.170000]  [<00035946>] __queue_work+0x1a8/0x1be
[124761.190000]
[124761.200000] Code: 1029 0007 660c 4cdf 073e 201f 588f dfdf <4e73> 
254f 040c e308 660a 487a ffe0 60ff 002d 26ae 598f 48e7 031e 486f 001c 61ff

The faulting instruction is the 'rte' at the end of resume_userspace 
from our entry.S.

Any idea what's gone wrong this time, Andreas?

(All processes except probably syslogd kept running and my tests 
completed OK - rerunning that again now to see what else I get...

Cheers,

	Michael

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-18  1:14                   ` Michael Schmitz
@ 2021-07-21 17:05                     ` Christoph Hellwig
  2021-07-21 19:20                       ` Michael Schmitz
  2021-07-23  4:00                       ` Michael Schmitz
  0 siblings, 2 replies; 73+ messages in thread
From: Christoph Hellwig @ 2021-07-21 17:05 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Christoph Hellwig, Andreas Schwab, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k, Linus Torvalds

FYI, I've pushed out a new version - this contains the fix from Andreas
and drops all these uninlining tweaks as they very obviously cause more
harm than good for now.

And sorry for the delay, life has been a little busy lately.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-21 17:05                     ` Christoph Hellwig
@ 2021-07-21 19:20                       ` Michael Schmitz
  2021-07-23  4:00                       ` Michael Schmitz
  1 sibling, 0 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-07-21 19:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

Hi Christoph,

On 22/07/21 5:05 am, Christoph Hellwig wrote:
> FYI, I've pushed out a new version - this contains the fix from Andreas
> and drops all these uninlining tweaks as they very obviously cause more
> harm than good for now.
>
> And sorry for the delay, life has been a little busy lately.

Thanks - after Andreas' patch, I didn't see any more kernel panic. Will 
pull your new version and give that a good workout.

Not to worry about the delay - I could use a little break from Linux 
testing. Hope you're not too badly affected by theb extreme weather 
events in your part of the country.

Cheers,

     Michael



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-21 17:05                     ` Christoph Hellwig
  2021-07-21 19:20                       ` Michael Schmitz
@ 2021-07-23  4:00                       ` Michael Schmitz
  2021-07-23  5:11                         ` Christoph Hellwig
  1 sibling, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-07-23  4:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

Hi Christoph,

Am 22.07.2021 um 05:05 schrieb Christoph Hellwig:
> FYI, I've pushed out a new version - this contains the fix from Andreas
> and drops all these uninlining tweaks as they very obviously cause more
> harm than good for now.
>
> And sorry for the delay, life has been a little busy lately.

The new version passed two rounds of tests OK - looking good so far.

Cheers,

	Michael



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-23  4:00                       ` Michael Schmitz
@ 2021-07-23  5:11                         ` Christoph Hellwig
  2021-07-25  7:36                           ` Michael Schmitz
  0 siblings, 1 reply; 73+ messages in thread
From: Christoph Hellwig @ 2021-07-23  5:11 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Christoph Hellwig, Andreas Schwab, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k, Linus Torvalds

On Fri, Jul 23, 2021 at 04:00:26PM +1200, Michael Schmitz wrote:
>> FYI, I've pushed out a new version - this contains the fix from Andreas
>> and drops all these uninlining tweaks as they very obviously cause more
>> harm than good for now.
>>
>> And sorry for the delay, life has been a little busy lately.
>
> The new version passed two rounds of tests OK - looking good so far.

Thanks a lot!  I'll wait a bit and will send it out as a formal series.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-23  5:11                         ` Christoph Hellwig
@ 2021-07-25  7:36                           ` Michael Schmitz
  2021-07-31 19:31                             ` Michael Schmitz
  0 siblings, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-07-25  7:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

Hi Christoph,

Am 23.07.2021 um 17:11 schrieb Christoph Hellwig:
> On Fri, Jul 23, 2021 at 04:00:26PM +1200, Michael Schmitz wrote:
>>> FYI, I've pushed out a new version - this contains the fix from Andreas
>>> and drops all these uninlining tweaks as they very obviously cause more
>>> harm than good for now.
>>>
>>> And sorry for the delay, life has been a little busy lately.
>>
>> The new version passed two rounds of tests OK - looking good so far.
>
> Thanks a lot!  I'll wait a bit and will send it out as a formal series.

Just saw this:

[256262.110000] *** FORMAT ERROR ***   FORMAT=0
[256262.140000] Current process id is 2969
[256262.170000] BAD KERNEL TRAP: 00000000
[256262.190000] Modules linked in: atari_scsi ne 8390p
[256262.220000] PC: [<00002a8c>] resume_userspace+0x14/0x16
[256262.250000] SR: 2200  SP: 4b4338bf  a2: 00000000
[256262.290000] d0: 00000008    d1: 00000003    d2: 00000ba1    d3: 00000000
[256262.300000] d4: ffffffff    d5: 00000000    a0: 00000ba1    a1: 00000080
[256262.320000] Process savelog (pid: 2969, task=d81f6132)
[256262.340000] Frame format=0
[256262.360000] Stack from 00c1bfa4:
[256262.360000]         02108005 0d06b008 00000009 00000000 00000001 
000347ba 0080ab00 00778b6c
[256262.360000]         00c1be0c 0003e68e 00d11970 00000003 00000000 
002c7bca 00035a66 00d11970
[256262.360000]         00778b6c 00809014 00000000 00002700 0000006d 
00002600 00049ef6
[256262.470000] Call Trace: [<000347ba>] get_work_pool+0x0/0x38
[256262.480000]  [<0003e68e>] wake_up_process+0x12/0x16
[256262.500000]  [<002c7bca>] printk+0x0/0x18
[256262.520000]  [<00035a66>] __queue_work+0x1a8/0x1be
[256262.540000]  [<00002700>] wait_for_initramfs+0x50/0x58
[256262.570000]  [<00002600>] name_to_dev_t+0x2b0/0x360
[256262.580000]  [<00049ef6>] __irq_wake_thread+0x0/0x40
[256262.600000]
[256262.610000] Code: 1029 0007 660c 4cdf 073e 201f 588f dfdf <4e73> 
254f 040c e308 660a 487a ffe0 60ff 002d 336e 598f 48e7 031e 486f 001c 61ff

Same PC as the one I reported on the 18th. Kernel locked up hard and 
needing a hardware reset this time.

I'll back out the top commit of your series (the one removing set_fs()) 
and retest. Again, this may take a few days to show any results.

Cheers,

	Michael



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-25  7:36                           ` Michael Schmitz
@ 2021-07-31 19:31                             ` Michael Schmitz
  2021-08-06  3:10                               ` Michael Schmitz
  0 siblings, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-07-31 19:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

Hi Christoph,

Am 25.07.2021 um 19:36 schrieb Michael Schmitz:
> Hi Christoph,
>
> Am 23.07.2021 um 17:11 schrieb Christoph Hellwig:
>> On Fri, Jul 23, 2021 at 04:00:26PM +1200, Michael Schmitz wrote:
>>>> FYI, I've pushed out a new version - this contains the fix from Andreas
>>>> and drops all these uninlining tweaks as they very obviously cause more
>>>> harm than good for now.
>>>>
>>>> And sorry for the delay, life has been a little busy lately.
>>>
>>> The new version passed two rounds of tests OK - looking good so far.
>>
>> Thanks a lot!  I'll wait a bit and will send it out as a formal series.
>
> Just saw this:
>
> [256262.110000] *** FORMAT ERROR ***   FORMAT=0
> [256262.140000] Current process id is 2969
> [256262.170000] BAD KERNEL TRAP: 00000000
> [256262.190000] Modules linked in: atari_scsi ne 8390p
> [256262.220000] PC: [<00002a8c>] resume_userspace+0x14/0x16
> [256262.250000] SR: 2200  SP: 4b4338bf  a2: 00000000
> [256262.290000] d0: 00000008    d1: 00000003    d2: 00000ba1    d3:
> 00000000
> [256262.300000] d4: ffffffff    d5: 00000000    a0: 00000ba1    a1:
> 00000080
> [256262.320000] Process savelog (pid: 2969, task=d81f6132)
> [256262.340000] Frame format=0
> [256262.360000] Stack from 00c1bfa4:
> [256262.360000]         02108005 0d06b008 00000009 00000000 00000001
> 000347ba 0080ab00 00778b6c
> [256262.360000]         00c1be0c 0003e68e 00d11970 00000003 00000000
> 002c7bca 00035a66 00d11970
> [256262.360000]         00778b6c 00809014 00000000 00002700 0000006d
> 00002600 00049ef6
> [256262.470000] Call Trace: [<000347ba>] get_work_pool+0x0/0x38
> [256262.480000]  [<0003e68e>] wake_up_process+0x12/0x16
> [256262.500000]  [<002c7bca>] printk+0x0/0x18
> [256262.520000]  [<00035a66>] __queue_work+0x1a8/0x1be
> [256262.540000]  [<00002700>] wait_for_initramfs+0x50/0x58
> [256262.570000]  [<00002600>] name_to_dev_t+0x2b0/0x360
> [256262.580000]  [<00049ef6>] __irq_wake_thread+0x0/0x40
> [256262.600000]
> [256262.610000] Code: 1029 0007 660c 4cdf 073e 201f 588f dfdf <4e73>
> 254f 040c e308 660a 487a ffe0 60ff 002d 336e 598f 48e7 031e 486f 001c 61ff
>
> Same PC as the one I reported on the 18th. Kernel locked up hard and
> needing a hardware reset this time.
>
> I'll back out the top commit of your series (the one removing set_fs())
> and retest. Again, this may take a few days to show any results.

No further format errors, kernel panics or lockups seen after almost a 
week of tests.

That suggests something is wrong with your 'm68k: remove set_fs()' commit.

I'll try adding a get_fc() and warn whenever FC does not match what your 
patch expects, maybe that can help to get a clearer picture.

Cheers,

	Michael


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-07-31 19:31                             ` Michael Schmitz
@ 2021-08-06  3:10                               ` Michael Schmitz
  2021-08-11  9:12                                 ` Christoph Hellwig
  2021-08-15  7:42                                 ` Christoph Hellwig
  0 siblings, 2 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-08-06  3:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 1173 bytes --]

Hi Christoph,

Am 01.08.2021 um 07:31 schrieb Michael Schmitz:
> Hi Christoph,
>> I'll back out the top commit of your series (the one removing set_fs())
>> and retest. Again, this may take a few days to show any results.
>
> No further format errors, kernel panics or lockups seen after almost a
> week of tests.
>
> That suggests something is wrong with your 'm68k: remove set_fs()' commit.
>
> I'll try adding a get_fc() and warn whenever FC does not match what your
> patch expects, maybe that can help to get a clearer picture.

See attached patches, to be applied on top of your RFC series - ran this 
for five days now, with no further errors. Will run this a while longer 
yet, but in the ordinary course of testing, I'd have seen errors by now.

Haven't seen the WARN_ON trigger once yet though, which is more than a 
little odd. Heisenbug?

I am aware that this patch defeats the purpose of the 'lets lose set_fs' 
patch series, and Linus was quite convincing in saying preemption 
couldn't be an issue so saving DFC ought not to be necessary. If there 
is anything else I can try to get to the bottom of these format errors, 
please say so.

Cheers,

	Michael



[-- Attachment #2: 0002-m68k-warn-in-get_fc-if-DFC-is-not-USER_DATA.patch --]
[-- Type: text/x-diff, Size: 905 bytes --]

From d5f7c3d4b4e4b437dfb54f9eb1ed6876dfb63744 Mon Sep 17 00:00:00 2001
From: Michael Schmitz <schmitzmic@gmail.com>
Date: Sun, 1 Aug 2021 13:06:21 +1200
Subject: [PATCH 2/2] m68k: warn in get_fc() if DFC is not USER_DATA

Trace uacess mode restore operations that mess up the SFC/DFC
registers - read out current DFC before changing modes and warn
when the current mode wasn't USER_DATA!

Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
---
 arch/m68k/include/asm/processor.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index 29dfa54..a8c533d 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -98,6 +98,7 @@ static inline unsigned long get_fc(void)
 {
 	unsigned long val;
 	__asm__ ("movec %/dfc,%0":"=r" (val):);
+	WARN_ON_ONCE(val != USER_DATA);
 	return val;
 }
 /*
-- 
2.7.4


[-- Attachment #3: 0001-m68k-add-get_fc-to-read-uaccess-mode.patch --]
[-- Type: text/x-diff, Size: 5350 bytes --]

From fcb3f2543125ca3fe269b50646f8c4782d31ba80 Mon Sep 17 00:00:00 2001
From: Michael Schmitz <schmitzmic@gmail.com>
Date: Sun, 1 Aug 2021 12:11:59 +1200
Subject: [PATCH 1/2] m68k: add get_fc() to read uaccess mode

Add back get_fc(), force_user_fc_begin() and force_user_fc_end(),
and use these to restore original DFC after uaccess operations,
instead of unconditionally setting USER_DATA.

Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
---
 arch/m68k/include/asm/processor.h | 35 +++++++++++++++++++++++++++++++++++
 arch/m68k/include/asm/tlbflush.h  |  8 ++++++--
 arch/m68k/kernel/process.c        |  2 +-
 arch/m68k/kernel/traps.c          |  6 ++++--
 arch/m68k/mm/cache.c              |  4 +++-
 5 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index debb453..29dfa54 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -94,10 +94,45 @@ static inline void set_fc(unsigned long val)
 			      "movec %0,%/dfc\n\t"
 			      : /* no outputs */ : "r" (val) : "memory");
 }
+static inline unsigned long get_fc(void)
+{
+	unsigned long val;
+	__asm__ ("movec %/dfc,%0":"=r" (val):);
+	return val;
+}
+/*
+ * Force the uaccess routines to be wired up for actual userspace access,
+ * overriding any possible set_fs(KERNEL_DS) still lingering around.  Undone
+ * using force_uaccess_end below.
+ */
+static inline unsigned long force_user_fc_begin(void)
+{
+        unsigned long fc = get_fc();
+
+        set_fc(USER_DATA);
+        return fc;
+}
+
+static inline void force_user_fc_end(unsigned long oldfc)
+{
+        set_fc(oldfc);
+}
 #else
 static inline void set_fc(unsigned long val)
 {
 }
+static inline unsigned long get_fs(void)
+{
+	return USER_DATA;
+}
+static inline unsigned long force_user_fc_begin(void)
+{
+        return USER_DATA;
+}
+
+static inline void force_user_fc_end(unsigned long oldfc)
+{
+}
 #endif /* CONFIG_CPU_HAS_ADDRESS_SPACES */
 
 struct thread_struct {
diff --git a/arch/m68k/include/asm/tlbflush.h b/arch/m68k/include/asm/tlbflush.h
index 524b13d..7c34d14 100644
--- a/arch/m68k/include/asm/tlbflush.h
+++ b/arch/m68k/include/asm/tlbflush.h
@@ -13,12 +13,13 @@ static inline void flush_tlb_kernel_page(void *addr)
 	if (CPU_IS_COLDFIRE) {
 		mmu_write(MMUOR, MMUOR_CNL);
 	} else if (CPU_IS_040_OR_060) {
+		unsigned long old_fc = get_fc();
 		set_fc(SUPER_DATA);
 		__asm__ __volatile__(".chip 68040\n\t"
 				     "pflush (%0)\n\t"
 				     ".chip 68k"
 				     : : "a" (addr));
-		set_fc(USER_DATA);
+		set_fc(old_fc);
 	} else if (CPU_IS_020_OR_030)
 		__asm__ __volatile__("pflush #4,#4,(%0)" : : "a" (addr));
 }
@@ -83,8 +84,11 @@ static inline void flush_tlb_mm(struct mm_struct *mm)
 
 static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 {
-	if (vma->vm_mm == current->active_mm)
+	if (vma->vm_mm == current->active_mm) {
+		unsigned long old_fc = force_user_fc_begin();
 		__flush_tlb_one(addr);
+		force_user_fc_end(old_fc);
+	}
 }
 
 static inline void flush_tlb_range(struct vm_area_struct *vma,
diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c
index fb257b7..cab012d 100644
--- a/arch/m68k/kernel/process.c
+++ b/arch/m68k/kernel/process.c
@@ -155,7 +155,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 	 * Must save the current SFC/DFC value, NOT the value when
 	 * the parent was last descheduled - RGH  10-08-96
 	 */
-	p->thread.fc = USER_DATA;
+	p->thread.fc = get_fc();
 
 	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
 		/* kernel thread */
diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
index c5e0a4f..1b07329 100644
--- a/arch/m68k/kernel/traps.c
+++ b/arch/m68k/kernel/traps.c
@@ -181,6 +181,7 @@ static inline void access_error060 (struct frame *fp)
 static inline unsigned long probe040(int iswrite, unsigned long addr, int wbs)
 {
 	unsigned long mmusr;
+	unsigned long old_fc = get_fc();
 
 	set_fc(wbs);
 
@@ -191,7 +192,7 @@ static inline unsigned long probe040(int iswrite, unsigned long addr, int wbs)
 
 	asm volatile (".chip 68040; movec %%mmusr,%0; .chip 68k" : "=r" (mmusr));
 
-	set_fc(USER_DATA);
+	set_fc(old_fc);
 
 	return mmusr;
 }
@@ -200,6 +201,7 @@ static inline int do_040writeback1(unsigned short wbs, unsigned long wba,
 				   unsigned long wbd)
 {
 	int res = 0;
+	unsigned long old_fc = get_fc();
 
 	set_fc(wbs);
 
@@ -215,7 +217,7 @@ static inline int do_040writeback1(unsigned short wbs, unsigned long wba,
 		break;
 	}
 
-	set_fc(USER_DATA);
+	set_fc(old_fc);
 
 	pr_debug("do_040writeback1, res=%d\n", res);
 
diff --git a/arch/m68k/mm/cache.c b/arch/m68k/mm/cache.c
index d26eb1a..70fd62b 100644
--- a/arch/m68k/mm/cache.c
+++ b/arch/m68k/mm/cache.c
@@ -109,14 +109,16 @@ static void __flush_icache_range(unsigned long address, unsigned long endaddr,
 
 void flush_icache_user_range(unsigned long address, unsigned long endaddr)
 {
+	unsigned long old_fc = get_fc();
 	__flush_icache_range(address, endaddr, USER_DATA);
 }
 
 void flush_icache_range(unsigned long address, unsigned long endaddr)
 {
+	unsigned long old_fc = get_fc();
 	set_fc(SUPER_DATA);
 	__flush_icache_range(address, endaddr, SUPER_DATA);
-	set_fc(USER_DATA);
+	set_fc(old_fc);
 }
 EXPORT_SYMBOL(flush_icache_range);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-06  3:10                               ` Michael Schmitz
@ 2021-08-11  9:12                                 ` Christoph Hellwig
  2021-08-12  3:37                                   ` Michael Schmitz
  2021-08-15  7:42                                 ` Christoph Hellwig
  1 sibling, 1 reply; 73+ messages in thread
From: Christoph Hellwig @ 2021-08-11  9:12 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Christoph Hellwig, Andreas Schwab, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k, Linus Torvalds

Hi Michael,

from looking at your patches I suspect the 040 traps might be able
to be called witha different fc.  Can you check if the below patch,
which is a cut down version of your changes makes all the issues
go away?

diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index debb4537eab5..a554e529fd26 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -94,6 +94,13 @@ static inline void set_fc(unsigned long val)
 			      "movec %0,%/dfc\n\t"
 			      : /* no outputs */ : "r" (val) : "memory");
 }
+static inline unsigned long get_fc(void)
+{
+	unsigned long val;
+	__asm__ ("movec %/dfc,%0":"=r" (val):);
+	WARN_ON_ONCE(val != USER_DATA);
+	return val;
+}
 #else
 static inline void set_fc(unsigned long val)
 {
diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
index c5e0a4f93bd5..1b073299fa55 100644
--- a/arch/m68k/kernel/traps.c
+++ b/arch/m68k/kernel/traps.c
@@ -181,6 +181,7 @@ static inline void access_error060 (struct frame *fp)
 static inline unsigned long probe040(int iswrite, unsigned long addr, int wbs)
 {
 	unsigned long mmusr;
+	unsigned long old_fc = get_fc();
 
 	set_fc(wbs);
 
@@ -191,7 +192,7 @@ static inline unsigned long probe040(int iswrite, unsigned long addr, int wbs)
 
 	asm volatile (".chip 68040; movec %%mmusr,%0; .chip 68k" : "=r" (mmusr));
 
-	set_fc(USER_DATA);
+	set_fc(old_fc);
 
 	return mmusr;
 }
@@ -200,6 +201,7 @@ static inline int do_040writeback1(unsigned short wbs, unsigned long wba,
 				   unsigned long wbd)
 {
 	int res = 0;
+	unsigned long old_fc = get_fc();
 
 	set_fc(wbs);
 
@@ -215,7 +217,7 @@ static inline int do_040writeback1(unsigned short wbs, unsigned long wba,
 		break;
 	}
 
-	set_fc(USER_DATA);
+	set_fc(old_fc);
 
 	pr_debug("do_040writeback1, res=%d\n", res);
 

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-11  9:12                                 ` Christoph Hellwig
@ 2021-08-12  3:37                                   ` Michael Schmitz
  0 siblings, 0 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-08-12  3:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

Hi Christoph,

I run my tests on a 030, so that patch is unlikely to have any effect 
there. I'll give it a spin though (once my current tests with restoring 
USER_DATA everywhere but logging if get_fc() reads something else) are 
done.

I wonder whether copy_thread() ought to set SUPER_DATA when creating 
kernel threads though (see the comment there). With init_thread starting 
out with USER_DATA now, we'd have to explicitly set SUPER_DATA instead 
of just copying the current value there...

Leaves flush_icache_range() and flush_tlb_page() to worry about.

Cheers,

	Michael

Am 11.08.2021 um 21:12 schrieb Christoph Hellwig:
> Hi Michael,
>
> from looking at your patches I suspect the 040 traps might be able
> to be called witha different fc.  Can you check if the below patch,
> which is a cut down version of your changes makes all the issues
> go away?
>
> diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
> index debb4537eab5..a554e529fd26 100644
> --- a/arch/m68k/include/asm/processor.h
> +++ b/arch/m68k/include/asm/processor.h
> @@ -94,6 +94,13 @@ static inline void set_fc(unsigned long val)
>  			      "movec %0,%/dfc\n\t"
>  			      : /* no outputs */ : "r" (val) : "memory");
>  }
> +static inline unsigned long get_fc(void)
> +{
> +	unsigned long val;
> +	__asm__ ("movec %/dfc,%0":"=r" (val):);
> +	WARN_ON_ONCE(val != USER_DATA);
> +	return val;
> +}
>  #else
>  static inline void set_fc(unsigned long val)
>  {
> diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
> index c5e0a4f93bd5..1b073299fa55 100644
> --- a/arch/m68k/kernel/traps.c
> +++ b/arch/m68k/kernel/traps.c
> @@ -181,6 +181,7 @@ static inline void access_error060 (struct frame *fp)
>  static inline unsigned long probe040(int iswrite, unsigned long addr, int wbs)
>  {
>  	unsigned long mmusr;
> +	unsigned long old_fc = get_fc();
>
>  	set_fc(wbs);
>
> @@ -191,7 +192,7 @@ static inline unsigned long probe040(int iswrite, unsigned long addr, int wbs)
>
>  	asm volatile (".chip 68040; movec %%mmusr,%0; .chip 68k" : "=r" (mmusr));
>
> -	set_fc(USER_DATA);
> +	set_fc(old_fc);
>
>  	return mmusr;
>  }
> @@ -200,6 +201,7 @@ static inline int do_040writeback1(unsigned short wbs, unsigned long wba,
>  				   unsigned long wbd)
>  {
>  	int res = 0;
> +	unsigned long old_fc = get_fc();
>
>  	set_fc(wbs);
>
> @@ -215,7 +217,7 @@ static inline int do_040writeback1(unsigned short wbs, unsigned long wba,
>  		break;
>  	}
>
> -	set_fc(USER_DATA);
> +	set_fc(old_fc);
>
>  	pr_debug("do_040writeback1, res=%d\n", res);
>
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-06  3:10                               ` Michael Schmitz
  2021-08-11  9:12                                 ` Christoph Hellwig
@ 2021-08-15  7:42                                 ` Christoph Hellwig
  2021-08-15 19:17                                   ` Michael Schmitz
  1 sibling, 1 reply; 73+ messages in thread
From: Christoph Hellwig @ 2021-08-15  7:42 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Christoph Hellwig, Andreas Schwab, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k, Linus Torvalds

Hi Michael,

after your last mail I went back and looked very closely at the
set_fs removal and your debug patches below.  And one hunk stands out:

>  static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
>  {
> -	if (vma->vm_mm == current->active_mm)
> +	if (vma->vm_mm == current->active_mm) {
> +		unsigned long old_fc = force_user_fc_begin();
>  		__flush_tlb_one(addr);
> +		force_user_fc_end(old_fc);
> +	}

This is the only old user of force_uaccess_begin, and the only one
where this patch adds back a DFC/SFC access where there was none at
at all with the set_fs removal.  So I'd be curious if you just add this
hunk (plus the supporting infrastructure) on top of my tree for now to
see if there were some side effects of the instructions that were
important, be that seralization, timing or anything else.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-15  7:42                                 ` Christoph Hellwig
@ 2021-08-15 19:17                                   ` Michael Schmitz
  2021-08-16  6:58                                     ` Christoph Hellwig
  0 siblings, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-08-15 19:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

Hi Christoph,

On 15/08/21 7:42 pm, Christoph Hellwig wrote:
> Hi Michael,
>
> after your last mail I went back and looked very closely at the
> set_fs removal and your debug patches below.  And one hunk stands out:
>
>>   static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
>>   {
>> -	if (vma->vm_mm == current->active_mm)
>> +	if (vma->vm_mm == current->active_mm) {
>> +		unsigned long old_fc = force_user_fc_begin();
>>   		__flush_tlb_one(addr);
>> +		force_user_fc_end(old_fc);
>> +	}
> This is the only old user of force_uaccess_begin, and the only one
> where this patch adds back a DFC/SFC access where there was none at
> at all with the set_fs removal.  So I'd be curious if you just add this
> hunk (plus the supporting infrastructure) on top of my tree for now to
> see if there were some side effects of the instructions that were
> important, be that seralization, timing or anything else.

Would it be acceptable to add this on top of the 
probe040/do_040writeback1 patch that I'm currently testing?

Cheers,

     Michael



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-15 19:17                                   ` Michael Schmitz
@ 2021-08-16  6:58                                     ` Christoph Hellwig
       [not found]                                       ` <23f745f2-9086-81fb-3d9e-40ea08a1923@linux-m68k.org>
  2021-09-21 21:14                                       ` Michael Schcmitz
  0 siblings, 2 replies; 73+ messages in thread
From: Christoph Hellwig @ 2021-08-16  6:58 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Christoph Hellwig, Andreas Schwab, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k, Linus Torvalds

On Mon, Aug 16, 2021 at 07:17:15AM +1200, Michael Schmitz wrote:
> Would it be acceptable to add this on top of the probe040/do_040writeback1 
> patch that I'm currently testing?

Sure.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-09-21 21:14                                       ` Michael Schcmitz
@ 2021-08-22 19:33                                         ` Michael Schmitz
  2021-08-23  4:04                                           ` Michael Schmitz
  2021-08-23 17:59                                           ` Linus Torvalds
  0 siblings, 2 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-08-22 19:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

Hi Christoph,

Am 22.09.2021 um 09:14 schrieb Michael Schmitz:
> Hi Christoph,
>
> no errors seen after five days of testing. Adding in a VM stressor now ...


Got this overnight:

> [536154.200000] *** FORMAT ERROR ***   FORMAT=0
> [536154.210000] Current process id is 4656
> [536154.230000] BAD KERNEL TRAP: 00000000
> [536154.240000] Modules linked in: atari_scsi ne 8390p [last unloaded: atari_scsi]
> [536154.260000] PC: [<00002a8c>] resume_userspace+0x14/0x16
> [536154.270000] SR: 2208  SP: 977bd1be  a2: 8009b5e8
> [536154.290000] d0: 8009b5e8    d1: cfcfcfcf    d2: 00000000    d3: ffffffff
> [536154.300000] d4: 00000000    d5: 00000000    a0: 8008a108    a1: 8009b7df
> [536154.320000] Process savelog (pid: 4656, task=e49aa246)
> [536154.330000] Frame format=0
> [536154.340000] Stack from 00cc5fa4:
> [536154.340000]         02088004 3666b008 1c0eb209 007eb5e8 8006a2d0 efaec378 8004366c 61ff61ff
> [536154.340000]         8006a2d4 8006a2d2 00000000 030dfffb 0044fffa 0e000000 fffa1a00 fffa1c00
> [536154.340000]         fffa1e00 fffb0e40 fffb0e80 00049b66 00000040 005f5800 00000001
> [536154.460000] Call Trace: [<00049b66>] __handle_irq_event_percpu+0x38/0xce
> [536154.530000]
> [536154.540000] Code: 1029 0007 660c 4cdf 073e 201f 588f dfdf <4e73> 254f 0400 e308 660a 487a ffe0 60ff 002d 13c6 598f 48e7 031e 486f 001c 61ff
> [536154.630000] Disabling lock debugging due to kernel taint

I'll try Finn's script next. Any advice what part of your patch to 
revert next? Maybe flush_icache_range() ??

Cheers,

	Michael


>
> Cheers,
>
>     Michael
>
>
> On 16/08/21 18:58, Christoph Hellwig wrote:
>> On Mon, Aug 16, 2021 at 07:17:15AM +1200, Michael Schmitz wrote:
>>> Would it be acceptable to add this on top of the
>>> probe040/do_040writeback1
>>> patch that I'm currently testing?
>> Sure.
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-22 19:33                                         ` Michael Schmitz
@ 2021-08-23  4:04                                           ` Michael Schmitz
  2021-08-23 17:59                                           ` Linus Torvalds
  1 sibling, 0 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-08-23  4:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds



On 23/08/21 07:33, Michael Schmitz wrote:
> Got this overnight:
>
>> [536154.200000] *** FORMAT ERROR *** FORMAT=0
>> [536154.210000] Current process id is 4656
>> [536154.230000] BAD KERNEL TRAP: 00000000
>> [536154.240000] Modules linked in: atari_scsi ne 8390p [last 
>> unloaded: atari_scsi]
>> [536154.260000] PC: [<00002a8c>] resume_userspace+0x14/0x16
>> [536154.270000] SR: 2208  SP: 977bd1be  a2: 8009b5e8
>> [536154.290000] d0: 8009b5e8    d1: cfcfcfcf    d2: 00000000 d3: 
>> ffffffff
>> [536154.300000] d4: 00000000    d5: 00000000    a0: 8008a108 a1: 
>> 8009b7df
>> [536154.320000] Process savelog (pid: 4656, task=e49aa246)
>> [536154.330000] Frame format=0
>> [536154.340000] Stack from 00cc5fa4:
>> [536154.340000]         02088004 3666b008 1c0eb209 007eb5e8 8006a2d0 
>> efaec378 8004366c 61ff61ff
>> [536154.340000]         8006a2d4 8006a2d2 00000000 030dfffb 0044fffa 
>> 0e000000 fffa1a00 fffa1c00
>> [536154.340000]         fffa1e00 fffb0e40 fffb0e80 00049b66 00000040 
>> 005f5800 00000001
>> [536154.460000] Call Trace: [<00049b66>] 
>> __handle_irq_event_percpu+0x38/0xce
>> [536154.530000]
>> [536154.540000] Code: 1029 0007 660c 4cdf 073e 201f 588f dfdf <4e73> 
>> 254f 0400 e308 660a 487a ffe0 60ff 002d 13c6 598f 48e7 031e 486f 001c 
>> 61ff
>> [536154.630000] Disabling lock debugging due to kernel taint
>
FWIW - that's the 'rte' at the end of resume_userspace / 
ret_from_exception, which is on our interrupt return path.
Stack smash??

Cheers,

     Michael



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-22 19:33                                         ` Michael Schmitz
  2021-08-23  4:04                                           ` Michael Schmitz
@ 2021-08-23 17:59                                           ` Linus Torvalds
  2021-08-23 21:31                                             ` Michael Schmitz
  2021-09-14  2:43                                             ` Michael Schmitz
  1 sibling, 2 replies; 73+ messages in thread
From: Linus Torvalds @ 2021-08-23 17:59 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Christoph Hellwig, Andreas Schwab, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

On Sun, Aug 22, 2021 at 12:34 PM Michael Schmitz <schmitzmic@gmail.com> wrote:
>
> Got this overnight:
>
> > [536154.200000] *** FORMAT ERROR ***   FORMAT=0
> > [536154.210000] Current process id is 4656
> > [536154.230000] BAD KERNEL TRAP: 00000000
> > [536154.240000] Modules linked in: atari_scsi ne 8390p [last unloaded: atari_scsi]
> > [536154.260000] PC: [<00002a8c>] resume_userspace+0x14/0x16
> > [536154.270000] SR: 2208  SP: 977bd1be  a2: 8009b5e8
> > [536154.290000] d0: 8009b5e8    d1: cfcfcfcf    d2: 00000000    d3: ffffffff
> > [536154.300000] d4: 00000000    d5: 00000000    a0: 8008a108    a1: 8009b7df
> > [536154.320000] Process savelog (pid: 4656, task=e49aa246)
> > [536154.330000] Frame format=0
> > [536154.340000] Stack from 00cc5fa4:
> > [536154.340000]         02088004 3666b008 1c0eb209 007eb5e8 8006a2d0 efaec378 8004366c 61ff61ff
> > [536154.340000]         8006a2d4 8006a2d2 00000000 030dfffb 0044fffa 0e000000 fffa1a00 fffa1c00
> > [536154.340000]         fffa1e00 fffb0e40 fffb0e80 00049b66 00000040 005f5800 00000001

Strange. If I read that stack frame correctly, that seems to be an
exception frame of type 0xb ("Long Bus Cycle").

Plus the frame content is then apparently corrupted enough that the
rte causes an exception on trying to restore it.

None of which makes sense or seems to have much at all to do with any
of these patches. Yes, we mess with the exception frame, but only for
fork(), and while "copy_process()" doesn't set any frame type, I see
only two cases:

 - the kernel thread one does a "memset()" to clear it, so you should
end up with frame type 0

 - the user thread case copies the original frame format (which I
think is just the system call frame from the TRAP instruction).

Are you 100% sure your hardware is stable?

                  Linus

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-23 17:59                                           ` Linus Torvalds
@ 2021-08-23 21:31                                             ` Michael Schmitz
  2021-08-23 21:49                                               ` Linus Torvalds
  2021-09-14  2:43                                             ` Michael Schmitz
  1 sibling, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-08-23 21:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Andreas Schwab, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

Hi Linus,


On 24/08/21 05:59, Linus Torvalds wrote:
> On Sun, Aug 22, 2021 at 12:34 PM Michael Schmitz <schmitzmic@gmail.com> wrote:
>> Got this overnight:
>>
>>> [536154.200000] *** FORMAT ERROR ***   FORMAT=0
>>> [536154.210000] Current process id is 4656
>>> [536154.230000] BAD KERNEL TRAP: 00000000
>>> [536154.240000] Modules linked in: atari_scsi ne 8390p [last unloaded: atari_scsi]
>>> [536154.260000] PC: [<00002a8c>] resume_userspace+0x14/0x16
>>> [536154.270000] SR: 2208  SP: 977bd1be  a2: 8009b5e8
>>> [536154.290000] d0: 8009b5e8    d1: cfcfcfcf    d2: 00000000    d3: ffffffff
>>> [536154.300000] d4: 00000000    d5: 00000000    a0: 8008a108    a1: 8009b7df
>>> [536154.320000] Process savelog (pid: 4656, task=e49aa246)
>>> [536154.330000] Frame format=0
>>> [536154.340000] Stack from 00cc5fa4:
>>> [536154.340000]         02088004 3666b008 1c0eb209 007eb5e8 8006a2d0 efaec378 8004366c 61ff61ff
>>> [536154.340000]         8006a2d4 8006a2d2 00000000 030dfffb 0044fffa 0e000000 fffa1a00 fffa1c00
>>> [536154.340000]         fffa1e00 fffb0e40 fffb0e80 00049b66 00000040 005f5800 00000001
> Strange. If I read that stack frame correctly, that seems to be an
> exception frame of type 0xb ("Long Bus Cycle").
Not sure where you get the 0xb from - the frame format I see is 0. 0xb 
would print additional information before the stack dump. Format 0 
doesn't appear in the stack frame struct definition in asm/traps.h.

I have no 68k processor manual at hand, so no idea whether frame format 
0 even exists.
>
> Plus the frame content is then apparently corrupted enough that the
> rte causes an exception on trying to restore it.
>
> None of which makes sense or seems to have much at all to do with any
> of these patches. Yes, we mess with the exception frame, but only for
> fork(), and while "copy_process()" doesn't set any frame type, I see
> only two cases:
>
>   - the kernel thread one does a "memset()" to clear it, so you should
> end up with frame type 0
I need to reread your description of how the kernel thread creation 
works - ret_from_kernel_thread() uses the same return path that 
resume_userspace() appears on, so we might end up here through that 
path. I've an idea how to test this, but that might be a little acacemic...
>
>   - the user thread case copies the original frame format (which I
> think is just the system call frame from the TRAP instruction).
>
> Are you 100% sure your hardware is stable?
I've always thought so. But going back through quite a few years of 
console log dumps, I now see that this format error has happened as 
early as 2017 (kernel 4.10.0-rc2). So it appears this is not a 
regression caused by Christoph's patches after all.

Sorry for causing all this confusion...

Cheers,

     Michael






^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-23 21:31                                             ` Michael Schmitz
@ 2021-08-23 21:49                                               ` Linus Torvalds
  2021-08-24  8:08                                                 ` Andreas Schwab
  2021-08-24  8:44                                                 ` Michael Schmitz
  0 siblings, 2 replies; 73+ messages in thread
From: Linus Torvalds @ 2021-08-23 21:49 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Christoph Hellwig, Andreas Schwab, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

On Mon, Aug 23, 2021 at 2:31 PM Michael Schmitz <schmitzmic@gmail.com> wrote:
>
> Hi Linus,
>
>
> On 24/08/21 05:59, Linus Torvalds wrote:
> > On Sun, Aug 22, 2021 at 12:34 PM Michael Schmitz <schmitzmic@gmail.com> wrote:
> >> Got this overnight:
> >>
> >>> [536154.200000] *** FORMAT ERROR ***   FORMAT=0
> >>> [536154.210000] Current process id is 4656
> >>> [536154.230000] BAD KERNEL TRAP: 00000000
> >>> [536154.240000] Modules linked in: atari_scsi ne 8390p [last unloaded: atari_scsi]
> >>> [536154.260000] PC: [<00002a8c>] resume_userspace+0x14/0x16
> >>> [536154.270000] SR: 2208  SP: 977bd1be  a2: 8009b5e8
> >>> [536154.290000] d0: 8009b5e8    d1: cfcfcfcf    d2: 00000000    d3: ffffffff
> >>> [536154.300000] d4: 00000000    d5: 00000000    a0: 8008a108    a1: 8009b7df
> >>> [536154.320000] Process savelog (pid: 4656, task=e49aa246)
> >>> [536154.330000] Frame format=0
> >>> [536154.340000] Stack from 00cc5fa4:
> >>> [536154.340000]         02088004 3666b008 1c0eb209 007eb5e8 8006a2d0 efaec378 8004366c 61ff61ff
> >>> [536154.340000]         8006a2d4 8006a2d2 00000000 030dfffb 0044fffa 0e000000 fffa1a00 fffa1c00
> >>> [536154.340000]         fffa1e00 fffb0e40 fffb0e80 00049b66 00000040 005f5800 00000001
> > Strange. If I read that stack frame correctly, that seems to be an
> > exception frame of type 0xb ("Long Bus Cycle").
>
> Not sure where you get the 0xb from - the frame format I see is 0.

The "Frame format=0" is, afaik, the format of when the error happens.

But I'm looking at what the stack contents are at the time of the
exception, to see what actually *triggered* the error.

And if I read it right, the RTE that took the format error fault had this stack:

        02088004 3666b008 1c0eb209 ..

and that means that it tried to return with status register 0208
(which looks correct: supervisor bit clear), to PC 80043666 (which
might be right - it's a similar address to the ones in A0/A1).

And then the format/vector value is b008.

That is, if I read it right, format 0xb, which is that "Long Bus Cycle" thing.

NOTE! I may be misreading this entirely. My m68k knowledge is so
out-of-date that it's not even funny, and it actually predates all
this fancy format stuff. So I'm just pattern-matching with the manuals
I found online.

               Linus

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-23 21:49                                               ` Linus Torvalds
@ 2021-08-24  8:08                                                 ` Andreas Schwab
  2021-08-24  8:44                                                 ` Michael Schmitz
  1 sibling, 0 replies; 73+ messages in thread
From: Andreas Schwab @ 2021-08-24  8:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michael Schmitz, Christoph Hellwig, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

On Aug 23 2021, Linus Torvalds wrote:

> On Mon, Aug 23, 2021 at 2:31 PM Michael Schmitz <schmitzmic@gmail.com> wrote:
>>
>> Hi Linus,
>>
>>
>> On 24/08/21 05:59, Linus Torvalds wrote:
>> > On Sun, Aug 22, 2021 at 12:34 PM Michael Schmitz <schmitzmic@gmail.com> wrote:
>> >> Got this overnight:
>> >>
>> >>> [536154.200000] *** FORMAT ERROR ***   FORMAT=0
>> >>> [536154.210000] Current process id is 4656
>> >>> [536154.230000] BAD KERNEL TRAP: 00000000
>> >>> [536154.240000] Modules linked in: atari_scsi ne 8390p [last unloaded: atari_scsi]
>> >>> [536154.260000] PC: [<00002a8c>] resume_userspace+0x14/0x16
>> >>> [536154.270000] SR: 2208  SP: 977bd1be  a2: 8009b5e8
>> >>> [536154.290000] d0: 8009b5e8    d1: cfcfcfcf    d2: 00000000    d3: ffffffff
>> >>> [536154.300000] d4: 00000000    d5: 00000000    a0: 8008a108    a1: 8009b7df
>> >>> [536154.320000] Process savelog (pid: 4656, task=e49aa246)
>> >>> [536154.330000] Frame format=0
>> >>> [536154.340000] Stack from 00cc5fa4:
>> >>> [536154.340000]         02088004 3666b008 1c0eb209 007eb5e8 8006a2d0 efaec378 8004366c 61ff61ff
>> >>> [536154.340000]         8006a2d4 8006a2d2 00000000 030dfffb 0044fffa 0e000000 fffa1a00 fffa1c00
>> >>> [536154.340000]         fffa1e00 fffb0e40 fffb0e80 00049b66 00000040 005f5800 00000001
>> > Strange. If I read that stack frame correctly, that seems to be an
>> > exception frame of type 0xb ("Long Bus Cycle").
>>
>> Not sure where you get the 0xb from - the frame format I see is 0.
>
> The "Frame format=0" is, afaik, the format of when the error happens.
>
> But I'm looking at what the stack contents are at the time of the
> exception, to see what actually *triggered* the error.
>
> And if I read it right, the RTE that took the format error fault had this stack:
>
>         02088004 3666b008 1c0eb209 ..
>
> and that means that it tried to return with status register 0208
> (which looks correct: supervisor bit clear), to PC 80043666 (which
> might be right - it's a similar address to the ones in A0/A1).
>
> And then the format/vector value is b008.
>
> That is, if I read it right, format 0xb, which is that "Long Bus Cycle" thing.

Right.  It was a fault in stage C of the instruction pipeline (SSW =
0xb209) at 0x8006a2d2-2 (BADDR-2).

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-23 21:49                                               ` Linus Torvalds
  2021-08-24  8:08                                                 ` Andreas Schwab
@ 2021-08-24  8:44                                                 ` Michael Schmitz
  2021-08-24  8:59                                                   ` Andreas Schwab
  1 sibling, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-08-24  8:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Andreas Schwab, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

Thanks Linus (and Andreas),


On 24/08/21 09:49, Linus Torvalds wrote:
> On Mon, Aug 23, 2021 at 2:31 PM Michael Schmitz <schmitzmic@gmail.com> wrote:
>> Hi Linus,
>>
>>
>> On 24/08/21 05:59, Linus Torvalds wrote:
>>> On Sun, Aug 22, 2021 at 12:34 PM Michael Schmitz <schmitzmic@gmail.com> wrote:
>>>> Got this overnight:
>>>>
>>>>> [536154.200000] *** FORMAT ERROR ***   FORMAT=0
>>>>> [536154.210000] Current process id is 4656
>>>>> [536154.230000] BAD KERNEL TRAP: 00000000
>>>>> [536154.240000] Modules linked in: atari_scsi ne 8390p [last unloaded: atari_scsi]
>>>>> [536154.260000] PC: [<00002a8c>] resume_userspace+0x14/0x16
>>>>> [536154.270000] SR: 2208  SP: 977bd1be  a2: 8009b5e8
>>>>> [536154.290000] d0: 8009b5e8    d1: cfcfcfcf    d2: 00000000    d3: ffffffff
>>>>> [536154.300000] d4: 00000000    d5: 00000000    a0: 8008a108    a1: 8009b7df
>>>>> [536154.320000] Process savelog (pid: 4656, task=e49aa246)
>>>>> [536154.330000] Frame format=0
>>>>> [536154.340000] Stack from 00cc5fa4:
>>>>> [536154.340000]         02088004 3666b008 1c0eb209 007eb5e8 8006a2d0 efaec378 8004366c 61ff61ff
>>>>> [536154.340000]         8006a2d4 8006a2d2 00000000 030dfffb 0044fffa 0e000000 fffa1a00 fffa1c00
>>>>> [536154.340000]         fffa1e00 fffb0e40 fffb0e80 00049b66 00000040 005f5800 00000001
>>> Strange. If I read that stack frame correctly, that seems to be an
>>> exception frame of type 0xb ("Long Bus Cycle").
>> Not sure where you get the 0xb from - the frame format I see is 0.
> The "Frame format=0" is, afaik, the format of when the error happens.
>
> But I'm looking at what the stack contents are at the time of the
> exception, to see what actually *triggered* the error.
>
> And if I read it right, the RTE that took the format error fault had this stack:
>
>          02088004 3666b008 1c0eb209 ..
>
> and that means that it tried to return with status register 0208
> (which looks correct: supervisor bit clear), to PC 80043666 (which
> might be right - it's a similar address to the ones in A0/A1).
I see - found some information on the 030 exception stack frame format 
online now.

>
> And then the format/vector value is b008.
>
> That is, if I read it right, format 0xb, which is that "Long Bus Cycle" thing.
>
> NOTE! I may be misreading this entirely. My m68k knowledge is so
> out-of-date that it's not even funny, and it actually predates all
> this fancy format stuff. So I'm just pattern-matching with the manuals
> I found online.
I'll take your word for it (especially after Andreas confirmed it :-). 
The information in our show_registers ought to be enough for me to 
decode the various faults of this type I've seen, and spot any patterns.

Cheers,

     Michael


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-24  8:44                                                 ` Michael Schmitz
@ 2021-08-24  8:59                                                   ` Andreas Schwab
  2021-08-25  7:51                                                     ` Michael Schmitz
  0 siblings, 1 reply; 73+ messages in thread
From: Andreas Schwab @ 2021-08-24  8:59 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Linus Torvalds, Christoph Hellwig, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

FWIW, you should be able to download the full 68030 user's manual (as
well as all other 680x0 manuals) from the NXP website.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-24  8:59                                                   ` Andreas Schwab
@ 2021-08-25  7:51                                                     ` Michael Schmitz
  2021-08-25  8:44                                                       ` Andreas Schwab
  0 siblings, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-08-25  7:51 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Linus Torvalds, Christoph Hellwig, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

Thanks Andreas,

the manuals on the NXP web site don't have the exception frame format 
tables, but I found a copy of the PRM at U Ohio that does.

Casual inspection of the old format errors at that same rte instruction 
shows it's always the same format code, and SR always has the supervisor 
bit clear. No pattern that I can see on the other contents of the frame. 
I've seen SSW and ISC all clear in one case, that can't be right?

Cheers,

     Michael



On 24/08/21 20:59, Andreas Schwab wrote:
> FWIW, you should be able to download the full 68030 user's manual (as
> well as all other 680x0 manuals) from the NXP website.
>
> Andreas.
>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-25  7:51                                                     ` Michael Schmitz
@ 2021-08-25  8:44                                                       ` Andreas Schwab
  2021-08-25 22:59                                                         ` Michael Schmitz
  0 siblings, 1 reply; 73+ messages in thread
From: Andreas Schwab @ 2021-08-25  8:44 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Linus Torvalds, Christoph Hellwig, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

On Aug 25 2021, Michael Schmitz wrote:

> the manuals on the NXP web site don't have the exception frame format
> tables,

See 8.4 EXCEPTION STACK FRAME FORMATS.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-25  8:44                                                       ` Andreas Schwab
@ 2021-08-25 22:59                                                         ` Michael Schmitz
  2021-08-25 23:30                                                           ` Brad Boyer
  2021-08-26  7:45                                                           ` Andreas Schwab
  0 siblings, 2 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-08-25 22:59 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Linus Torvalds, Christoph Hellwig, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

On Wed, Aug 25, 2021 at 8:44 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
>
> On Aug 25 2021, Michael Schmitz wrote:
>
> > the manuals on the NXP web site don't have the exception frame format
> > tables,
>
> See 8.4 EXCEPTION STACK FRAME FORMATS.

Not in my copy of part 2 of the 030 manual. The section is there, the
pages that are marked 'Exception Stack Frame format' (pp 8-32, 8-33)
are there, but the pages are blank (except for header and footer).

Tried with acroread, xpdf and evince.

The original Programmer's Reference Manual has the frame formats in
section B.2 (without any explanation of the fields' meanings,
however).

Cheers,

    Michael

>
> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-25 22:59                                                         ` Michael Schmitz
@ 2021-08-25 23:30                                                           ` Brad Boyer
  2021-08-26  7:46                                                             ` Michael Schmitz
  2021-08-26  7:45                                                           ` Andreas Schwab
  1 sibling, 1 reply; 73+ messages in thread
From: Brad Boyer @ 2021-08-25 23:30 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Andreas Schwab, Linus Torvalds, Christoph Hellwig,
	Geert Uytterhoeven, Greg Ungerer, linux-m68k

On Thu, Aug 26, 2021 at 10:59:30AM +1200, Michael Schmitz wrote:
> On Wed, Aug 25, 2021 at 8:44 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
> >
> > On Aug 25 2021, Michael Schmitz wrote:
> >
> > > the manuals on the NXP web site don't have the exception frame format
> > > tables,
> >
> > See 8.4 EXCEPTION STACK FRAME FORMATS.
> 
> Not in my copy of part 2 of the 030 manual. The section is there, the
> pages that are marked 'Exception Stack Frame format' (pp 8-32, 8-33)
> are there, but the pages are blank (except for header and footer).
> 
> Tried with acroread, xpdf and evince.
> 
> The original Programmer's Reference Manual has the frame formats in
> section B.2 (without any explanation of the fields' meanings,
> however).

It shows up fine for me in the embedded PDF reader in Chrome. There's
what is clearly a picture of the formats, as the image quality is
really bad. I can read it, but the lines are kind of jagged and the
text is pretty messy. Specifically, I'm looking at page 300 (8-33) in
the PDF found at this URL:

https://www.nxp.com/docs/en/reference-manual/MC68030UM.pdf

Perhaps those other programs aren't handling however that raw image
is encoded inside the file.

	Brad Boyer
	flar@allandria.com


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-25 22:59                                                         ` Michael Schmitz
  2021-08-25 23:30                                                           ` Brad Boyer
@ 2021-08-26  7:45                                                           ` Andreas Schwab
  1 sibling, 0 replies; 73+ messages in thread
From: Andreas Schwab @ 2021-08-26  7:45 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Linus Torvalds, Christoph Hellwig, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

On Aug 26 2021, Michael Schmitz wrote:

> On Wed, Aug 25, 2021 at 8:44 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
>>
>> On Aug 25 2021, Michael Schmitz wrote:
>>
>> > the manuals on the NXP web site don't have the exception frame format
>> > tables,
>>
>> See 8.4 EXCEPTION STACK FRAME FORMATS.
>
> Not in my copy of part 2 of the 030 manual.

That looks like a bug in whatever produced the separate parts.  Use the
full manual instead.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-25 23:30                                                           ` Brad Boyer
@ 2021-08-26  7:46                                                             ` Michael Schmitz
  0 siblings, 0 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-08-26  7:46 UTC (permalink / raw)
  To: Brad Boyer
  Cc: Andreas Schwab, Linus Torvalds, Christoph Hellwig,
	Geert Uytterhoeven, Greg Ungerer, linux-m68k

Hi Brad,

thanks, I can see the tables in that one. My searches only showed the 
newer version which comes in two parts ...

The 'vector offset' 8 in the format/vector field indicates a bus error. 
Not sure that helps pin down the cause without debugging the 030 bus 
error handler. Format b bus faults might be quite frequent ...

Cheers,

     Michael



On 26/08/21 11:30, Brad Boyer wrote:
> On Thu, Aug 26, 2021 at 10:59:30AM +1200, Michael Schmitz wrote:
>> On Wed, Aug 25, 2021 at 8:44 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
>>> On Aug 25 2021, Michael Schmitz wrote:
>>>
>>>> the manuals on the NXP web site don't have the exception frame format
>>>> tables,
>>> See 8.4 EXCEPTION STACK FRAME FORMATS.
>> Not in my copy of part 2 of the 030 manual. The section is there, the
>> pages that are marked 'Exception Stack Frame format' (pp 8-32, 8-33)
>> are there, but the pages are blank (except for header and footer).
>>
>> Tried with acroread, xpdf and evince.
>>
>> The original Programmer's Reference Manual has the frame formats in
>> section B.2 (without any explanation of the fields' meanings,
>> however).
> It shows up fine for me in the embedded PDF reader in Chrome. There's
> what is clearly a picture of the formats, as the image quality is
> really bad. I can read it, but the lines are kind of jagged and the
> text is pretty messy. Specifically, I'm looking at page 300 (8-33) in
> the PDF found at this URL:
>
> https://www.nxp.com/docs/en/reference-manual/MC68030UM.pdf
>
> Perhaps those other programs aren't handling however that raw image
> is encoded inside the file.
>
> 	Brad Boyer
> 	flar@allandria.com
>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
       [not found]                                                             ` <31f27da7-be60-8eb-9834-748b653c2246@linux-m68k.org>
@ 2021-09-07  3:28                                                               ` Finn Thain
  2021-09-07  5:53                                                                 ` Michael Schmitz
  0 siblings, 1 reply; 73+ messages in thread
From: Finn Thain @ 2021-09-07  3:28 UTC (permalink / raw)
  To: linux-m68k

Hi All,

While running stress-ng on my Quadra 630, I observed a reproducible panic 
in do_exit():

        if (unlikely(in_interrupt()))
                panic("Aiee, killing interrupt handler!");

The console log says,

running --mmap -1 --mmap-odirect --mmap-bytes 100% -t 60 --timestamp --no-rand-seed --times
stress-ng: 22:22:17.70 info:  [19803] setting to a 60 second run per stressor
stress-ng: 22:22:17.70 info:  [19803] dispatching hogs: 1 mmap
[12634.030000] Kernel panic - not syncing: Aiee, killing interrupt handler!
[12634.030000] CPU: 0 PID: 19805 Comm: stress-ng Not tainted 5.14.0-mac #2
[12634.030000] Stack from 00b27de4:
[12634.030000]         00b27de4 0045b8ea 0045b8ea 00020000 00b27e00 003cb320 0045b8ea 00b27e20
[12634.030000]         003ca326 00020000 418004fc 00b26000 060324c0 00b26000 00ae9680 00b27e5c
[12634.030000]         0001dcc2 00453506 00000009 418004fc 00b26000 00000000 0741b000 00000009
[12634.030000]         00000008 00b27f38 00ae9680 00000006 00000000 00000001 00b27e6c 0001de78
[12634.030000]         00000009 06034510 00b27eb8 000271bc 00000009 0000000f 0000000e c043c000
[12634.030000]         00000000 0741b000 00000003 00b27f98 efe91934 efe90898 00025fc6 00b26000
[12634.030000] Call Trace: [<00020000>] resource_list_free+0x36/0x52
[12634.030000]  [<003cb320>] dump_stack+0x10/0x16
[12634.030000]  [<003ca326>] panic+0xba/0x2bc
[12634.030000]  [<00020000>] resource_list_free+0x36/0x52
[12634.030000]  [<0001dcc2>] do_exit+0x87e/0x9d6
[12634.030000]  [<0001de78>] do_group_exit+0x28/0xb6
[12634.030000]  [<000271bc>] get_signal+0x126/0x720
[12634.030000]  [<00025fc6>] send_signal+0xde/0x16e
[12634.030000]  [<00004bfa>] do_notify_resume+0x38/0x60a
[12634.030000]  [<00027076>] force_sig_fault_to_task+0x36/0x3a
[12634.030000]  [<00027092>] force_sig_fault+0x18/0x1c
[12634.030000]  [<00006e68>] send_fault_sig+0x44/0xc6
[12634.030000]  [<00006204>] buserr_c+0x260/0x5ca
[12634.030000]  [<00002cd8>] do_signal_return+0x10/0x1a
[12634.030000]  [<0018800e>] ext4_fill_super+0x1da0/0x3842
[12634.030000]  [<0010800a>] fsnotify_flush_notify+0x3c/0x72
[12634.030000] 
[12634.030000] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]---


I haven't found the cause yet. While investigating I tried enabling 
various debug options including CONFIG_DEBUG_SPINLOCK. I then got 
backtrace below from the same machine.

# /root/stress-ng --mmap -1 --mmap-odirect --mmap-bytes=100% -t 1800
stress-ng: info:  [49] setting to a 1800 second (30 mins, 0.00 secs) run per stressor
stress-ng: info:  [49] dispatching hogs: 1 mmap
[  248.710000] BUG: spinlock trylock failure on UP on CPU#0, stress-ng/51
[  248.710000]  lock: 0x53d4a0, .magic: dead4ead, .owner: stress-ng/51, .owner_cpu: 0
[  248.710000] CPU: 0 PID: 51 Comm: stress-ng Not tainted 5.14.0-multi-debug #2
[  248.710000] Stack from 00b159ac:
[  248.710000]         00b159ac 004da40f 004da40f 00000000 00b159c8 0043e88e 004da40f 00b159f4
[  248.710000]         0006316c 004d240e 0053d4a0 dead4ead 0095ebba 00000033 00000000 0053d4a0
[  248.710000]         00b15ae4 0055e6ac 00b15a0c 000632b2 0053d4a0 004d2488 ffffebf7 0053d458
[  248.710000]         00b15a18 0044780e 0053d4a0 00b15a3c 002b8694 0053d4a0 00000001 00000040
[  248.710000]         4bfffffe 0000001a 00528200 00528260 00b15a60 00069f6a 0000000e 00014200
[  248.710000]         00528200 00b15a5c 00b12600 00528200 00014200 00b15a7c 00069ff0 00528200
[  248.710000] Call Trace: [<0043e88e>] dump_stack+0x10/0x16
[  248.710000]  [<0006316c>] spin_dump+0x6c/0xc0
[  248.710000]  [<000632b2>] do_raw_spin_trylock+0x32/0x80
[  248.710000]  [<0044780e>] _raw_spin_trylock+0xe/0x40
[  248.710000]  [<002b8694>] add_interrupt_randomness+0x154/0x1c0
[  248.710000]  [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
[  248.710000]  [<00014200>] INV_L2+0x8/0x10
[  248.710000]  [<00014200>] INV_L2+0x8/0x10
[  248.710000]  [<00069ff0>] handle_irq_event+0x30/0x80
[  248.710000]  [<0006d9e4>] handle_simple_irq+0x64/0x80
[  248.710000]  [<000695fe>] generic_handle_irq+0x3e/0x80
[  248.710000]  [<0000b944>] via1_irq+0x44/0xc0
[  248.710000]  [<000695fe>] generic_handle_irq+0x3e/0x80
[  248.710000]  [<00002ea4>] do_IRQ+0x24/0x40
[  248.710000]  [<00002d74>] auto_irqhandler_fixup+0x4/0xc
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<002b86b0>] add_interrupt_randomness+0x170/0x1c0
[  248.710000]  [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
[  248.710000]  [<00069ff0>] handle_irq_event+0x30/0x80
[  248.710000]  [<0006d9e4>] handle_simple_irq+0x64/0x80
[  248.710000]  [<000695c0>] generic_handle_irq+0x0/0x80
[  248.710000]  [<000695fe>] generic_handle_irq+0x3e/0x80
[  248.710000]  [<0000bad4>] via2_irq+0x54/0x80
[  248.710000]  [<000695fe>] generic_handle_irq+0x3e/0x80
[  248.710000]  [<00002ea4>] do_IRQ+0x24/0x40
[  248.710000]  [<00002d74>] auto_irqhandler_fixup+0x4/0xc
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<00404040>] rtm_dump_nexthop+0x80/0x180
[  248.710000]  [<00033f20>] irq_exit+0x60/0x80
[  248.710000]  [<00002eaa>] do_IRQ+0x2a/0x40
[  248.710000]  [<00002d74>] auto_irqhandler_fixup+0x4/0xc
[  248.710000]  [<00025a9b>] _FP_CALL_TOP+0xcc9d/0xd512
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<00025a9b>] _FP_CALL_TOP+0xcc9d/0xd512
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<0019e3e4>] ext4_get_block+0x24/0x40
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<0013a8f6>] __block_write_begin_int+0x1b6/0x700
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<0011b176>] update_time+0x36/0x40
[  248.710000]  [<0013b3ee>] block_page_mkwrite+0xae/0x100
[  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[  248.710000]  [<0019e3c0>] ext4_get_block+0x0/0x40
[  248.710000]  [<00005526>] sys_cacheflush+0x2a6/0x500
[  248.710000]  [<001a5676>] ext4_page_mkwrite+0x2f6/0x6c0
[  248.710000]  [<0019e3c0>] ext4_get_block+0x0/0x40
[  248.710000]  [<000d51c0>] find_vma+0x0/0x80
[  248.710000]  [<00444500>] down_read+0x0/0x200
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<0019e3c0>] ext4_get_block+0x0/0x40
[  248.710000]  [<000ff292>] fput+0x12/0x40
[  248.710000]  [<000cbbc0>] do_page_mkwrite+0x40/0x140
[  248.710000]  [<004478ce>] _raw_spin_unlock+0xe/0x40
[  248.710000]  [<000cd516>] do_wp_page+0x296/0x4c0
[  248.710000]  [<000cfedc>] handle_mm_fault+0x79c/0x900
[  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
[  248.710000]  [<00007402>] do_page_fault+0x102/0x240
[  248.710000]  [<0000653c>] buserr_c+0x17c/0x600
[  248.710000]  [<00002ba8>] buserr+0x20/0x28
[  248.710000]  [<0008800d>] get_next_modinfo+0xcd/0x100
[  248.710000]  [<0009800d>] cgroup1_pidlist_destroy_all+0x8d/0xc0


Apparently add_interrupt_randomness() got re-entered, causing 
spin_trylock() to contest input_pool.lock. But I don't see anything wrong 
there -- false positive?

I was able to reproduce that on both QEMU and Aranym:

[    0.000000] Linux version 5.14.0-multi-debug (fthain@nippy) (m68k-linux-gnu-gcc (btc) 6.4.0, GNU ld (btc) 2.28) #2 Sat Sep 4 15:58:43 AEST 2021
[    0.000000] Saving 202 bytes of bootinfo
[    0.000000] printk: console [debug0] enabled
[    0.000000] printk: debug: ignoring loglevel setting.
Blitter tried to read byte from register ff8a00 at 00816a
[    0.000000] Atari hardware found: VIDEL STDMA-SCSI ST_MFP YM2149 PCM CODEC DSP56K SCC ANALOG_JOY BLITTER IDE TT_CLK FDC_SPEED
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000000000-0x00000081ffffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000000dfffff]
[    0.000000]   node   0: [mem 0x0000000001000000-0x00000000081fffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x00000000081fffff]
[    0.000000] NatFeats found (ARAnyM, 1.0)
[    0.000000] initrd: 07f3ea00 - 08200000
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 32475
[    0.000000] Kernel command line: debug=par console=tty0 ignore_loglevel initcall_blacklist=ide_falcon_driver_init,nfhd_init BOOT_IMAGE=vmlinux
[    0.000000] blacklisting initcall ide_falcon_driver_init
[    0.000000] blacklisting initcall nfhd_init
[    0.000000] Unknown command line parameters: initcall_blacklist=ide_falcon_driver_init BOOT_IMAGE=vmlinux
[    0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[    0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
[    0.000000] Sorting __ex_table...
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 120072K/131072K available (4385K kernel code, 357K rwdata, 772K rodata, 148K init, 121K bss, 11000K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS: 200
[    0.000000] clocksource: mfp: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 77769386670404 ns
[    0.010000] Console: colour dummy device 80x25
[    0.010000] printk: console [tty0] enabled
[    0.010000] Calibrating delay loop... 132.50 BogoMIPS (lpj=662528)
[    0.090000] pid_max: default: 32768 minimum: 301
[    0.090000] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.090000] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.110000] devtmpfs: initialized
[    0.120000] random: get_random_u32 called from bucket_table_alloc+0x13e/0x180 with crng_init=0
[    0.120000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.130000] futex hash table entries: 256 (order: 0, 7168 bytes, linear)
[    0.130000] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.130000] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
[    0.140000] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.190000] wait_for_initramfs() called before rootfs_initcalls
[    0.200000] SCSI subsystem initialized
[    0.200000] libata version 3.00 loaded.
[    0.210000] clocksource: Switched to clocksource mfp
[    0.280000] NET: Registered PF_INET protocol family
[    0.280000] IP idents hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.280000] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 6144 bytes, linear)
[    0.290000] TCP established hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.290000] TCP bind hash table entries: 1024 (order: 2, 20480 bytes, linear)
[    0.290000] TCP: Hash tables configured (established 1024 bind 1024)
[    0.290000] UDP hash table entries: 256 (order: 1, 12288 bytes, linear)
[    0.290000] UDP-Lite hash table entries: 256 (order: 1, 12288 bytes, linear)
[    0.300000] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.300000] RPC: Registered named UNIX socket transport module.
[    0.300000] RPC: Registered udp transport module.
[    0.300000] RPC: Registered tcp transport module.
[    0.310000] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.310000] initcall nfhd_init blacklisted
[    0.310000] Trying to unpack rootfs image as initramfs...
[    0.330000] nfeth: API 5
[    0.330000] workingset: timestamp_bits=30 max_order=15 bucket_order=0
[    0.700000] Freeing initrd memory: 2820K
[    0.710000] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[    0.720000] io scheduler mq-deadline registered
[    0.720000] atafb atafb: phys_screen_base 6dc000 screen_len 311296
[    0.730000] atafb atafb: Determined 640x480, depth 4
[    0.730000] atafb atafb:    virtual 640x972
[    0.740000] Console: switching to colour frame buffer device 80x30
[    0.750000] fb0: frame buffer device, using 304K of video memory
[    0.750000] pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@kernel.crashing.org>)
[    0.750000] Non-volatile memory driver v1.3
[    0.780000] brd: module loaded
[    1.080000] scsi host0: Atari native SCSI, irq 15, io_port 0x0, base 0x0, can_queue 1, cmd_per_lun 2, sg_tablesize 1, this_id 7, flags { }
[    1.960000] random: fast init done
[    2.890000] atari-falcon-ide atari-falcon-ide: Atari Falcon and Q40/Q60 PATA controller
[    2.950000] scsi host1: pata_falcon
[    2.970000] ata1: PATA max PIO4 cmd 0xfff00000 ctl 0xfff00038 no IRQ, using PIO polling
[    3.020000] aoe: AoE v85 initialised.
[    3.050000] mousedev: PS/2 mouse device common for all mice
[    3.330000] input: Atari Keyboard as /devices/virtual/input/input0
[    3.350000] ata1.00: NODEV after polling detection
[    3.370000] ata1.01: ATA-2: slave, , max PIO2
[    3.390000] ata1.01: 322560 sectors, multi 0: LBA 
[    3.400000] ata1.01: configured for PIO
[    3.430000] scsi 1:0:1:0: Direct-Access     ATA      slave            n/a  PQ: 0 ANSI: 5
[    3.490000] scsi 1:0:1:0: Attached scsi generic sg0 type 0
[    3.520000] sd 1:0:1:0: [sda] 322560 512-byte logical blocks: (165 MB/158 MiB)
[    3.570000] sd 1:0:1:0: [sda] Write Protect is off
[    3.590000] sd 1:0:1:0: [sda] Mode Sense: 00 3a 00 00
[    3.620000] sd 1:0:1:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    3.670000] rtc-generic rtc-generic: registered as rtc0
[    3.690000] NET: Registered PF_PACKET protocol family
[    3.730000] sd 1:0:1:0: [sda] Attached SCSI disk
[    3.760000] Freeing unused kernel image (initmem) memory: 148K
[    3.790000] This architecture does not have kernel memory protection.
[    3.810000] Run /init as init process
[    3.840000]   with arguments:
[    3.870000]     /init
[    3.880000]   with environment:
[    3.910000]     HOME=/
[    3.920000]     TERM=linux
[    3.940000]     initcall_blacklist=ide_falcon_driver_init
[    3.970000]     BOOT_IMAGE=vmlinux
[   85.240000] random: crng init done
[  139.730000] EXT4-fs (sda): mounting ext2 file system using the ext4 subsystem
[  139.770000] EXT4-fs (sda): mounted filesystem without journal. Opts: (null). Quota mode: disabled.
[ 4089.070000] BUG: spinlock trylock failure on UP on CPU#0, stress-ng/1341
[ 4089.070000]  lock: 0x53d4a0, .magic: dead4ead, .owner: stress-ng/1341, .owner_cpu: 0
[ 4089.070000] CPU: 0 PID: 1341 Comm: stress-ng Not tainted 5.14.0-multi-debug #2
[ 4089.070000] Stack from 00a7dcc8:
[ 4089.070000]         00a7dcc8 004da40f 004da40f 00000000 00a7dce4 0043e88e 004da40f 00a7dd10
[ 4089.070000]         0006316c 004d240e 0053d4a0 dead4ead 00956bba 0000053d 00000000 0053d4a0
[ 4089.070000]         00a7ddd8 0055e6ac 00a7dd28 000632b2 0053d4a0 004d2488 0005c81b 0053d458
[ 4089.070000]         00a7dd34 0044780e 0053d4a0 00a7dd58 002b8694 0053d4a0 00000001 0000007f
[ 4089.070000]         00860000 00000011 00528138 00528198 00a7dd7c 00069f6a 0000000d 00014200
[ 4089.070000]         00528138 00a7dd78 00a7def0 00528138 00014200 00a7dd98 00069ff0 00528138
[ 4089.070000] Call Trace: [<0043e88e>] dump_stack+0x10/0x16
[ 4089.070000]  [<0006316c>] spin_dump+0x6c/0xc0
[ 4089.070000]  [<000632b2>] do_raw_spin_trylock+0x32/0x80
[ 4089.070000]  [<0005c81b>] init_dl_bandwidth+0x1b/0x80
[ 4089.070000]  [<0044780e>] _raw_spin_trylock+0xe/0x40
[ 4089.070000]  [<002b8694>] add_interrupt_randomness+0x154/0x1c0
[ 4089.070000]  [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
[ 4089.070000]  [<00014200>] INV_L2+0x8/0x10
[ 4089.070000]  [<00014200>] INV_L2+0x8/0x10
[ 4089.070000]  [<00069ff0>] handle_irq_event+0x30/0x80
[ 4089.070000]  [<0006d9e4>] handle_simple_irq+0x64/0x80
[ 4089.070000]  [<000695fe>] generic_handle_irq+0x3e/0x80
[ 4089.070000]  [<00002ea4>] do_IRQ+0x24/0x40
[ 4089.070000]  [<00002da8>] user_irqvec_fixup+0xc/0x14
[ 4089.070000]  [<0005c81a>] init_dl_bandwidth+0x1a/0x80
[ 4089.070000]  [<000d51c0>] find_vma+0x0/0x80
[ 4089.070000]  [<00444500>] down_read+0x0/0x200
[ 4089.070000]  [<002b86b0>] add_interrupt_randomness+0x170/0x1c0
[ 4089.070000]  [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
[ 4089.070000]  [<00069ff0>] handle_irq_event+0x30/0x80
[ 4089.070000]  [<0006d9e4>] handle_simple_irq+0x64/0x80
[ 4089.070000]  [<000695fe>] generic_handle_irq+0x3e/0x80
[ 4089.070000]  [<00002ea4>] do_IRQ+0x24/0x40
[ 4089.070000]  [<00002d74>] auto_irqhandler_fixup+0x4/0xc
[ 4089.070000]  [<0000653c>] buserr_c+0x17c/0x600
[ 4089.070000]  [<00002ba8>] buserr+0x20/0x28
[ 4089.070000]  [<0008800d>] get_next_modinfo+0xcd/0x100
[ 4089.070000] 

Does anyone know what causes this?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-07  3:28                                                               ` Mainline kernel crashes, was " Finn Thain
@ 2021-09-07  5:53                                                                 ` Michael Schmitz
  2021-09-07 23:50                                                                   ` Finn Thain
  0 siblings, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-09-07  5:53 UTC (permalink / raw)
  To: Finn Thain, linux-m68k

Hi Finn,

On 07/09/21 15:28, Finn Thain wrote:
> Hi All,
>
> While running stress-ng on my Quadra 630, I observed a reproducible panic
> in do_exit():
>
>         if (unlikely(in_interrupt()))
>                 panic("Aiee, killing interrupt handler!");
>
> The console log says,
>
> running --mmap -1 --mmap-odirect --mmap-bytes 100% -t 60 --timestamp --no-rand-seed --times
> stress-ng: 22:22:17.70 info:  [19803] setting to a 60 second run per stressor
> stress-ng: 22:22:17.70 info:  [19803] dispatching hogs: 1 mmap
> [12634.030000] Kernel panic - not syncing: Aiee, killing interrupt handler!
> [12634.030000] CPU: 0 PID: 19805 Comm: stress-ng Not tainted 5.14.0-mac #2
> [12634.030000] Stack from 00b27de4:
> [12634.030000]         00b27de4 0045b8ea 0045b8ea 00020000 00b27e00 003cb320 0045b8ea 00b27e20
> [12634.030000]         003ca326 00020000 418004fc 00b26000 060324c0 00b26000 00ae9680 00b27e5c
> [12634.030000]         0001dcc2 00453506 00000009 418004fc 00b26000 00000000 0741b000 00000009
> [12634.030000]         00000008 00b27f38 00ae9680 00000006 00000000 00000001 00b27e6c 0001de78
> [12634.030000]         00000009 06034510 00b27eb8 000271bc 00000009 0000000f 0000000e c043c000
> [12634.030000]         00000000 0741b000 00000003 00b27f98 efe91934 efe90898 00025fc6 00b26000
> [12634.030000] Call Trace: [<00020000>] resource_list_free+0x36/0x52
> [12634.030000]  [<003cb320>] dump_stack+0x10/0x16
> [12634.030000]  [<003ca326>] panic+0xba/0x2bc
> [12634.030000]  [<00020000>] resource_list_free+0x36/0x52
> [12634.030000]  [<0001dcc2>] do_exit+0x87e/0x9d6
> [12634.030000]  [<0001de78>] do_group_exit+0x28/0xb6
> [12634.030000]  [<000271bc>] get_signal+0x126/0x720
> [12634.030000]  [<00025fc6>] send_signal+0xde/0x16e
> [12634.030000]  [<00004bfa>] do_notify_resume+0x38/0x60a
> [12634.030000]  [<00027076>] force_sig_fault_to_task+0x36/0x3a
> [12634.030000]  [<00027092>] force_sig_fault+0x18/0x1c
> [12634.030000]  [<00006e68>] send_fault_sig+0x44/0xc6
> [12634.030000]  [<00006204>] buserr_c+0x260/0x5ca
> [12634.030000]  [<00002cd8>] do_signal_return+0x10/0x1a
> [12634.030000]  [<0018800e>] ext4_fill_super+0x1da0/0x3842
> [12634.030000]  [<0010800a>] fsnotify_flush_notify+0x3c/0x72
> [12634.030000]
> [12634.030000] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]---
>
>
> I haven't found the cause yet. While investigating I tried enabling
> various debug options including CONFIG_DEBUG_SPINLOCK. I then got
> backtrace below from the same machine.
>
> # /root/stress-ng --mmap -1 --mmap-odirect --mmap-bytes=100% -t 1800
> stress-ng: info:  [49] setting to a 1800 second (30 mins, 0.00 secs) run per stressor
> stress-ng: info:  [49] dispatching hogs: 1 mmap
> [  248.710000] BUG: spinlock trylock failure on UP on CPU#0, stress-ng/51
> [  248.710000]  lock: 0x53d4a0, .magic: dead4ead, .owner: stress-ng/51, .owner_cpu: 0
> [  248.710000] CPU: 0 PID: 51 Comm: stress-ng Not tainted 5.14.0-multi-debug #2
> [  248.710000] Stack from 00b159ac:
> [  248.710000]         00b159ac 004da40f 004da40f 00000000 00b159c8 0043e88e 004da40f 00b159f4
> [  248.710000]         0006316c 004d240e 0053d4a0 dead4ead 0095ebba 00000033 00000000 0053d4a0
> [  248.710000]         00b15ae4 0055e6ac 00b15a0c 000632b2 0053d4a0 004d2488 ffffebf7 0053d458
> [  248.710000]         00b15a18 0044780e 0053d4a0 00b15a3c 002b8694 0053d4a0 00000001 00000040
> [  248.710000]         4bfffffe 0000001a 00528200 00528260 00b15a60 00069f6a 0000000e 00014200
> [  248.710000]         00528200 00b15a5c 00b12600 00528200 00014200 00b15a7c 00069ff0 00528200
> [  248.710000] Call Trace: [<0043e88e>] dump_stack+0x10/0x16
> [  248.710000]  [<0006316c>] spin_dump+0x6c/0xc0
> [  248.710000]  [<000632b2>] do_raw_spin_trylock+0x32/0x80
> [  248.710000]  [<0044780e>] _raw_spin_trylock+0xe/0x40
> [  248.710000]  [<002b8694>] add_interrupt_randomness+0x154/0x1c0
> [  248.710000]  [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
> [  248.710000]  [<00014200>] INV_L2+0x8/0x10
> [  248.710000]  [<00014200>] INV_L2+0x8/0x10
> [  248.710000]  [<00069ff0>] handle_irq_event+0x30/0x80
> [  248.710000]  [<0006d9e4>] handle_simple_irq+0x64/0x80
> [  248.710000]  [<000695fe>] generic_handle_irq+0x3e/0x80
> [  248.710000]  [<0000b944>] via1_irq+0x44/0xc0
> [  248.710000]  [<000695fe>] generic_handle_irq+0x3e/0x80
> [  248.710000]  [<00002ea4>] do_IRQ+0x24/0x40
> [  248.710000]  [<00002d74>] auto_irqhandler_fixup+0x4/0xc
> [  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
> [  248.710000]  [<002b86b0>] add_interrupt_randomness+0x170/0x1c0
> [  248.710000]  [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
> [  248.710000]  [<00069ff0>] handle_irq_event+0x30/0x80
> [  248.710000]  [<0006d9e4>] handle_simple_irq+0x64/0x80
> [  248.710000]  [<000695c0>] generic_handle_irq+0x0/0x80
> [  248.710000]  [<000695fe>] generic_handle_irq+0x3e/0x80
> [  248.710000]  [<0000bad4>] via2_irq+0x54/0x80
> [  248.710000]  [<000695fe>] generic_handle_irq+0x3e/0x80
> [  248.710000]  [<00002ea4>] do_IRQ+0x24/0x40
> [  248.710000]  [<00002d74>] auto_irqhandler_fixup+0x4/0xc
> [  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
> [  248.710000]  [<00404040>] rtm_dump_nexthop+0x80/0x180
> [  248.710000]  [<00033f20>] irq_exit+0x60/0x80
> [  248.710000]  [<00002eaa>] do_IRQ+0x2a/0x40
> [  248.710000]  [<00002d74>] auto_irqhandler_fixup+0x4/0xc
> [  248.710000]  [<00025a9b>] _FP_CALL_TOP+0xcc9d/0xd512
> [  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [  248.710000]  [<00025a9b>] _FP_CALL_TOP+0xcc9d/0xd512
> [  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
> [  248.710000]  [<0019e3e4>] ext4_get_block+0x24/0x40
> [  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
> [  248.710000]  [<0013a8f6>] __block_write_begin_int+0x1b6/0x700
> [  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
> [  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [  248.710000]  [<0011b176>] update_time+0x36/0x40
> [  248.710000]  [<0013b3ee>] block_page_mkwrite+0xae/0x100
> [  248.710000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [  248.710000]  [<0019e3c0>] ext4_get_block+0x0/0x40
> [  248.710000]  [<00005526>] sys_cacheflush+0x2a6/0x500
> [  248.710000]  [<001a5676>] ext4_page_mkwrite+0x2f6/0x6c0
> [  248.710000]  [<0019e3c0>] ext4_get_block+0x0/0x40
> [  248.710000]  [<000d51c0>] find_vma+0x0/0x80
> [  248.710000]  [<00444500>] down_read+0x0/0x200
> [  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
> [  248.710000]  [<0019e3c0>] ext4_get_block+0x0/0x40
> [  248.710000]  [<000ff292>] fput+0x12/0x40
> [  248.710000]  [<000cbbc0>] do_page_mkwrite+0x40/0x140
> [  248.710000]  [<004478ce>] _raw_spin_unlock+0xe/0x40
> [  248.710000]  [<000cd516>] do_wp_page+0x296/0x4c0
> [  248.710000]  [<000cfedc>] handle_mm_fault+0x79c/0x900
> [  248.710000]  [<0000129b>] kernel_pg_dir+0x29b/0x1000
> [  248.710000]  [<00007402>] do_page_fault+0x102/0x240
> [  248.710000]  [<0000653c>] buserr_c+0x17c/0x600
> [  248.710000]  [<00002ba8>] buserr+0x20/0x28
> [  248.710000]  [<0008800d>] get_next_modinfo+0xcd/0x100
> [  248.710000]  [<0009800d>] cgroup1_pidlist_destroy_all+0x8d/0xc0
>
>
> Apparently add_interrupt_randomness() got re-entered, causing
> spin_trylock() to contest input_pool.lock. But I don't see anything wrong
> there -- false positive?

That can happen if interrupts are nested. In that sense, it's a false 
positive from CONFIG_DEBUG_SPINLOCK (failing to take a lock that the 
code knows how to deal with shouldn't cause a warning).

(This very event of reentering add_interrupt_randomness() from a nested 
interrupt caused an arcane bug in get_reg() to write beyond the end of 
memory on my Falcon. get_reg() uses READ_ONCE()/WRITE_ONCE() to guard 
against the effects of reentry now. That's why I said I have a hunch 
that we've been here before...

You could argue that disabling interrupts around that entropy mixing 
step would be an alternative, but Linus didn't like that for get_reg() 
either :-) )

>
> I was able to reproduce that on both QEMU and Aranym:
>
> [    0.000000] Linux version 5.14.0-multi-debug (fthain@nippy) (m68k-linux-gnu-gcc (btc) 6.4.0, GNU ld (btc) 2.28) #2 Sat Sep 4 15:58:43 AEST 2021
> [    0.000000] Saving 202 bytes of bootinfo
> [    0.000000] printk: console [debug0] enabled
> [    0.000000] printk: debug: ignoring loglevel setting.
> Blitter tried to read byte from register ff8a00 at 00816a
> [    0.000000] Atari hardware found: VIDEL STDMA-SCSI ST_MFP YM2149 PCM CODEC DSP56K SCC ANALOG_JOY BLITTER IDE TT_CLK FDC_SPEED
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000000000000-0x00000081ffffffff]
> [    0.000000]   Normal   empty
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000000000-0x0000000000dfffff]
> [    0.000000]   node   0: [mem 0x0000000001000000-0x00000000081fffff]
> [    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x00000000081fffff]
> [    0.000000] NatFeats found (ARAnyM, 1.0)
> [    0.000000] initrd: 07f3ea00 - 08200000
> [    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
> [    0.000000] pcpu-alloc: [0] 0
> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 32475
> [    0.000000] Kernel command line: debug=par console=tty0 ignore_loglevel initcall_blacklist=ide_falcon_driver_init,nfhd_init BOOT_IMAGE=vmlinux
> [    0.000000] blacklisting initcall ide_falcon_driver_init
> [    0.000000] blacklisting initcall nfhd_init
> [    0.000000] Unknown command line parameters: initcall_blacklist=ide_falcon_driver_init BOOT_IMAGE=vmlinux
> [    0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
> [    0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
> [    0.000000] Sorting __ex_table...
> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> [    0.000000] Memory: 120072K/131072K available (4385K kernel code, 357K rwdata, 772K rodata, 148K init, 121K bss, 11000K reserved, 0K cma-reserved)
> [    0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> [    0.000000] NR_IRQS: 200
> [    0.000000] clocksource: mfp: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 77769386670404 ns
> [    0.010000] Console: colour dummy device 80x25
> [    0.010000] printk: console [tty0] enabled
> [    0.010000] Calibrating delay loop... 132.50 BogoMIPS (lpj=662528)
> [    0.090000] pid_max: default: 32768 minimum: 301
> [    0.090000] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> [    0.090000] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> [    0.110000] devtmpfs: initialized
> [    0.120000] random: get_random_u32 called from bucket_table_alloc+0x13e/0x180 with crng_init=0
> [    0.120000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
> [    0.130000] futex hash table entries: 256 (order: 0, 7168 bytes, linear)
> [    0.130000] NET: Registered PF_NETLINK/PF_ROUTE protocol family
> [    0.130000] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
> [    0.140000] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
> [    0.190000] wait_for_initramfs() called before rootfs_initcalls
> [    0.200000] SCSI subsystem initialized
> [    0.200000] libata version 3.00 loaded.
> [    0.210000] clocksource: Switched to clocksource mfp
> [    0.280000] NET: Registered PF_INET protocol family
> [    0.280000] IP idents hash table entries: 2048 (order: 2, 16384 bytes, linear)
> [    0.280000] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 6144 bytes, linear)
> [    0.290000] TCP established hash table entries: 1024 (order: 0, 4096 bytes, linear)
> [    0.290000] TCP bind hash table entries: 1024 (order: 2, 20480 bytes, linear)
> [    0.290000] TCP: Hash tables configured (established 1024 bind 1024)
> [    0.290000] UDP hash table entries: 256 (order: 1, 12288 bytes, linear)
> [    0.290000] UDP-Lite hash table entries: 256 (order: 1, 12288 bytes, linear)
> [    0.300000] NET: Registered PF_UNIX/PF_LOCAL protocol family
> [    0.300000] RPC: Registered named UNIX socket transport module.
> [    0.300000] RPC: Registered udp transport module.
> [    0.300000] RPC: Registered tcp transport module.
> [    0.310000] RPC: Registered tcp NFSv4.1 backchannel transport module.
> [    0.310000] initcall nfhd_init blacklisted
> [    0.310000] Trying to unpack rootfs image as initramfs...
> [    0.330000] nfeth: API 5
> [    0.330000] workingset: timestamp_bits=30 max_order=15 bucket_order=0
> [    0.700000] Freeing initrd memory: 2820K
> [    0.710000] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
> [    0.720000] io scheduler mq-deadline registered
> [    0.720000] atafb atafb: phys_screen_base 6dc000 screen_len 311296
> [    0.730000] atafb atafb: Determined 640x480, depth 4
> [    0.730000] atafb atafb:    virtual 640x972
> [    0.740000] Console: switching to colour frame buffer device 80x30
> [    0.750000] fb0: frame buffer device, using 304K of video memory
> [    0.750000] pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@kernel.crashing.org>)
> [    0.750000] Non-volatile memory driver v1.3
> [    0.780000] brd: module loaded
> [    1.080000] scsi host0: Atari native SCSI, irq 15, io_port 0x0, base 0x0, can_queue 1, cmd_per_lun 2, sg_tablesize 1, this_id 7, flags { }
> [    1.960000] random: fast init done
> [    2.890000] atari-falcon-ide atari-falcon-ide: Atari Falcon and Q40/Q60 PATA controller
> [    2.950000] scsi host1: pata_falcon
> [    2.970000] ata1: PATA max PIO4 cmd 0xfff00000 ctl 0xfff00038 no IRQ, using PIO polling
> [    3.020000] aoe: AoE v85 initialised.
> [    3.050000] mousedev: PS/2 mouse device common for all mice
> [    3.330000] input: Atari Keyboard as /devices/virtual/input/input0
> [    3.350000] ata1.00: NODEV after polling detection
> [    3.370000] ata1.01: ATA-2: slave, , max PIO2
> [    3.390000] ata1.01: 322560 sectors, multi 0: LBA
> [    3.400000] ata1.01: configured for PIO
> [    3.430000] scsi 1:0:1:0: Direct-Access     ATA      slave            n/a  PQ: 0 ANSI: 5
> [    3.490000] scsi 1:0:1:0: Attached scsi generic sg0 type 0
> [    3.520000] sd 1:0:1:0: [sda] 322560 512-byte logical blocks: (165 MB/158 MiB)
> [    3.570000] sd 1:0:1:0: [sda] Write Protect is off
> [    3.590000] sd 1:0:1:0: [sda] Mode Sense: 00 3a 00 00
> [    3.620000] sd 1:0:1:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
> [    3.670000] rtc-generic rtc-generic: registered as rtc0
> [    3.690000] NET: Registered PF_PACKET protocol family
> [    3.730000] sd 1:0:1:0: [sda] Attached SCSI disk
> [    3.760000] Freeing unused kernel image (initmem) memory: 148K
> [    3.790000] This architecture does not have kernel memory protection.
> [    3.810000] Run /init as init process
> [    3.840000]   with arguments:
> [    3.870000]     /init
> [    3.880000]   with environment:
> [    3.910000]     HOME=/
> [    3.920000]     TERM=linux
> [    3.940000]     initcall_blacklist=ide_falcon_driver_init
> [    3.970000]     BOOT_IMAGE=vmlinux
> [   85.240000] random: crng init done
> [  139.730000] EXT4-fs (sda): mounting ext2 file system using the ext4 subsystem
> [  139.770000] EXT4-fs (sda): mounted filesystem without journal. Opts: (null). Quota mode: disabled.
> [ 4089.070000] BUG: spinlock trylock failure on UP on CPU#0, stress-ng/1341
> [ 4089.070000]  lock: 0x53d4a0, .magic: dead4ead, .owner: stress-ng/1341, .owner_cpu: 0
> [ 4089.070000] CPU: 0 PID: 1341 Comm: stress-ng Not tainted 5.14.0-multi-debug #2
> [ 4089.070000] Stack from 00a7dcc8:
> [ 4089.070000]         00a7dcc8 004da40f 004da40f 00000000 00a7dce4 0043e88e 004da40f 00a7dd10
> [ 4089.070000]         0006316c 004d240e 0053d4a0 dead4ead 00956bba 0000053d 00000000 0053d4a0
> [ 4089.070000]         00a7ddd8 0055e6ac 00a7dd28 000632b2 0053d4a0 004d2488 0005c81b 0053d458
> [ 4089.070000]         00a7dd34 0044780e 0053d4a0 00a7dd58 002b8694 0053d4a0 00000001 0000007f
> [ 4089.070000]         00860000 00000011 00528138 00528198 00a7dd7c 00069f6a 0000000d 00014200
> [ 4089.070000]         00528138 00a7dd78 00a7def0 00528138 00014200 00a7dd98 00069ff0 00528138
> [ 4089.070000] Call Trace: [<0043e88e>] dump_stack+0x10/0x16
> [ 4089.070000]  [<0006316c>] spin_dump+0x6c/0xc0
> [ 4089.070000]  [<000632b2>] do_raw_spin_trylock+0x32/0x80
> [ 4089.070000]  [<0005c81b>] init_dl_bandwidth+0x1b/0x80
> [ 4089.070000]  [<0044780e>] _raw_spin_trylock+0xe/0x40
> [ 4089.070000]  [<002b8694>] add_interrupt_randomness+0x154/0x1c0
> [ 4089.070000]  [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
> [ 4089.070000]  [<00014200>] INV_L2+0x8/0x10
> [ 4089.070000]  [<00014200>] INV_L2+0x8/0x10
> [ 4089.070000]  [<00069ff0>] handle_irq_event+0x30/0x80
> [ 4089.070000]  [<0006d9e4>] handle_simple_irq+0x64/0x80
> [ 4089.070000]  [<000695fe>] generic_handle_irq+0x3e/0x80
> [ 4089.070000]  [<00002ea4>] do_IRQ+0x24/0x40
> [ 4089.070000]  [<00002da8>] user_irqvec_fixup+0xc/0x14
> [ 4089.070000]  [<0005c81a>] init_dl_bandwidth+0x1a/0x80
> [ 4089.070000]  [<000d51c0>] find_vma+0x0/0x80
> [ 4089.070000]  [<00444500>] down_read+0x0/0x200
> [ 4089.070000]  [<002b86b0>] add_interrupt_randomness+0x170/0x1c0
> [ 4089.070000]  [<00069f6a>] handle_irq_event_percpu+0x2a/0x80
> [ 4089.070000]  [<00069ff0>] handle_irq_event+0x30/0x80
> [ 4089.070000]  [<0006d9e4>] handle_simple_irq+0x64/0x80
> [ 4089.070000]  [<000695fe>] generic_handle_irq+0x3e/0x80
> [ 4089.070000]  [<00002ea4>] do_IRQ+0x24/0x40
> [ 4089.070000]  [<00002d74>] auto_irqhandler_fixup+0x4/0xc
> [ 4089.070000]  [<0000653c>] buserr_c+0x17c/0x600
> [ 4089.070000]  [<00002ba8>] buserr+0x20/0x28
> [ 4089.070000]  [<0008800d>] get_next_modinfo+0xcd/0x100
> [ 4089.070000]
>
> Does anyone know what causes this?

Our practice to run interrupt handlers at the IPL of the current 
interrupt under service, instead of disabling local interrupts for the 
duration of the handler?

Not sure whether SMP systems keep a shared entropy pool, or multiple 
per-cpu ones, but in this particular instance, the spin lock prevented 
further damage. I'm worried about other cases where re-entering the 
common interrupt code manipulates shared state that goes undetected. But 
then, that code has certainly been audited many times...

Cheers,

	Michael


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-07  5:53                                                                 ` Michael Schmitz
@ 2021-09-07 23:50                                                                   ` Finn Thain
  2021-09-08  8:54                                                                     ` Michael Schmitz
  0 siblings, 1 reply; 73+ messages in thread
From: Finn Thain @ 2021-09-07 23:50 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: linux-m68k

On Tue, 7 Sep 2021, Michael Schmitz wrote:

> > Does anyone know what causes this?
> 
> Our practice to run interrupt handlers at the IPL of the current 
> interrupt under service, instead of disabling local interrupts for the 
> duration of the handler?
> 

Lock contention will happen anyway.

If using spin_trylock() outside of "atomic" context was a bug (really?) 
then it can't be used here.

Perhaps add_interrupt_randomness() should use the lock in irq mode, like 
the rest of drivers/char/random.c does.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-07 23:50                                                                   ` Finn Thain
@ 2021-09-08  8:54                                                                     ` Michael Schmitz
  2021-09-09  9:40                                                                       ` Finn Thain
  2021-09-09 22:51                                                                       ` Finn Thain
  0 siblings, 2 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-09-08  8:54 UTC (permalink / raw)
  To: Finn Thain; +Cc: linux-m68k

Hi Finn,

On 08/09/21 11:50, Finn Thain wrote:
> On Tue, 7 Sep 2021, Michael Schmitz wrote:
>
>>> Does anyone know what causes this?
>>
>> Our practice to run interrupt handlers at the IPL of the current
>> interrupt under service, instead of disabling local interrupts for the
>> duration of the handler?
>>
>
> Lock contention will happen anyway.
>
> If using spin_trylock() outside of "atomic" context was a bug (really?)
> then it can't be used here.

add_interrupt_randomness() relies on interrupts being disabled in 
__handle_irq_event_percpu(), so this check assumes atomic context.

Our definition of irqs_disabled() invalidates that assumption.

> Perhaps add_interrupt_randomness() should use the lock in irq mode, like
> the rest of drivers/char/random.c does.

That might deadlock on SMP when a task reading from the random pool 
holding the lock executes on the same CPU as the interrupt.

In the UP case, the spinlock is optimized away, and other users taking 
the lock also disable interrupts as you said. That ought to protect 
against re-entering _mix_pool_bytes(), unless we're re-entering from 
another, higher priority interrupt. We either need to disable interrupts 
before entering _mix_pool_bytes() on UP, or treat this spin_trylock() as 
real lock operation (not optimized away).

I think this is something that should be discussed with Ted Ts'o.


In a related case, I've managed to swap my 'resume_userspace' format 
error for a nice 'illegal instruction' format error apparently caused by 
an invalid function pointer in __handle_irq_event_percpu(), just by 
disabling all interrupts upon entering the auto_inthandler and 
user_inthandler exception handlers. This bug is quite readily reproduced 
by running your kernel_coverage.sh script in a loop (panics on the first 
stress test on the second pass):

Stress run 2
Logging to stress-ng-20210908-0838.log
./kernel-coverage.sh: line 272: lcov: command not found
running --fork 1 --fork-vm -t 60 --timestamp --no-rand-seed --times
stress-ng: 08:40:08.70 info:  [1914] setting to a 60 second run per stressor
stress-ng: 08:40:08.82 info:  [1914] dispatching hogs: 1 fork
packet_write_wait: Connection to 10.1.1.4 port 22: Broken pipe

Why disabling interrupts during interrupt processing would make matters 
worse doesn't make any sense to me...

Cheers,

	Michael

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-08  8:54                                                                     ` Michael Schmitz
@ 2021-09-09  9:40                                                                       ` Finn Thain
  2021-09-09 23:29                                                                         ` Michael Schmitz
  2021-09-09 22:51                                                                       ` Finn Thain
  1 sibling, 1 reply; 73+ messages in thread
From: Finn Thain @ 2021-09-09  9:40 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: linux-m68k

On Wed, 8 Sep 2021, Michael Schmitz wrote:

> On 08/09/21 11:50, Finn Thain wrote:
> > On Tue, 7 Sep 2021, Michael Schmitz wrote:
> > 
> > > > Does anyone know what causes this?
> > > 
> > > Our practice to run interrupt handlers at the IPL of the current 
> > > interrupt under service, instead of disabling local interrupts for 
> > > the duration of the handler?
> > > 
> > 
> > Lock contention will happen anyway.
> > 
> > If using spin_trylock() outside of "atomic" context was a bug 
> > (really?) then it can't be used here.
> 
> add_interrupt_randomness() relies on interrupts being disabled in 
> __handle_irq_event_percpu(), so this check assumes atomic context.
> 

But __handle_irq_event_percpu() also assumes that local interrupts have 
been disabled. There is a WARN_ONCE to that effect.

> Our definition of irqs_disabled() invalidates that assumption.
> 
> > Perhaps add_interrupt_randomness() should use the lock in irq mode, 
> > like the rest of drivers/char/random.c does.
> 
> That might deadlock on SMP when a task reading from the random pool 
> holding the lock executes on the same CPU as the interrupt.
> 

That does not make sense to me. add_interrupt_randomness() effectively 
does acquire the lock in irq mode on SMP ports yet it doesn't deadlock.

> In the UP case, the spinlock is optimized away, 

Yes and spin_trylock() always succeeds. Hence CONFIG_SPINLOCK_DEBUG 
asserts that the lock is never contended. 

One thing that bothered me when I raised this issue was the apparent false 
positive from CONFIG_SPINLOCK_DEBUG. But after sleeping on it, it makes 
more sense to me.

(Perhaps the difficulty here is !SMP && !PREEMPT_COUNT. If PREEMPT_COUNT 
was available on m68k then spin_trylock() might be expected to test 
preempt_count, and the problem would not arise.)

BTW, the problem went away when I changed to irq mode locking to avoid any 
lock contention, like so:

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 605969ed0f96..4e73c87cbdb8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1248,6 +1248,7 @@ void add_interrupt_randomness(int irq, int irq_flags)
 	struct fast_pool	*fast_pool = this_cpu_ptr(&irq_randomness);
 	struct pt_regs		*regs = get_irq_regs();
 	unsigned long		now = jiffies;
+	unsigned long		flags;
 	cycles_t		cycles = random_get_entropy();
 	__u32			c_high, j_high;
 	__u64			ip;
@@ -1281,12 +1282,12 @@ void add_interrupt_randomness(int irq, int irq_flags)
 		return;
 
 	r = &input_pool;
-	if (!spin_trylock(&r->lock))
+	if (!spin_trylock_irqsave(&r->lock, flags))
 		return;
 
 	fast_pool->last = now;
 	__mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
-	spin_unlock(&r->lock);
+	spin_unlock_irqrestore(&r->lock, flags);
 
 	fast_pool->count = 0;
 

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-08  8:54                                                                     ` Michael Schmitz
  2021-09-09  9:40                                                                       ` Finn Thain
@ 2021-09-09 22:51                                                                       ` Finn Thain
  2021-09-10  0:03                                                                         ` Michael Schmitz
  1 sibling, 1 reply; 73+ messages in thread
From: Finn Thain @ 2021-09-09 22:51 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: linux-m68k

On Wed, 8 Sep 2021, Michael Schmitz wrote:

> 
> In a related case, I've managed to swap my 'resume_userspace' format 
> error for a nice 'illegal instruction' format error apparently caused by 
> an invalid function pointer in __handle_irq_event_percpu(), just by 
> disabling all interrupts upon entering the auto_inthandler and 
> user_inthandler exception handlers. This bug is quite readily reproduced 
> by running your kernel_coverage.sh script in a loop (panics on the first 
> stress test on the second pass):
> 
> Stress run 2
> Logging to stress-ng-20210908-0838.log
> ./kernel-coverage.sh: line 272: lcov: command not found
> running --fork 1 --fork-vm -t 60 --timestamp --no-rand-seed --times
> stress-ng: 08:40:08.70 info:  [1914] setting to a 60 second run per stressor
> stress-ng: 08:40:08.82 info:  [1914] dispatching hogs: 1 fork
> packet_write_wait: Connection to 10.1.1.4 port 22: Broken pipe
> 
> Why disabling interrupts during interrupt processing would make matters 
> worse doesn't make any sense to me...
> 

Are you able to reproduce that with a stock mainline build?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-09  9:40                                                                       ` Finn Thain
@ 2021-09-09 23:29                                                                         ` Michael Schmitz
  0 siblings, 0 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-09-09 23:29 UTC (permalink / raw)
  To: Finn Thain; +Cc: linux-m68k

Hi Finn,

On 09/09/21 21:40, Finn Thain wrote:
> On Wed, 8 Sep 2021, Michael Schmitz wrote:
>
>> On 08/09/21 11:50, Finn Thain wrote:
>>> On Tue, 7 Sep 2021, Michael Schmitz wrote:
>>>
>>>>> Does anyone know what causes this?
>>>>
>>>> Our practice to run interrupt handlers at the IPL of the current
>>>> interrupt under service, instead of disabling local interrupts for
>>>> the duration of the handler?
>>>>
>>>
>>> Lock contention will happen anyway.
>>>
>>> If using spin_trylock() outside of "atomic" context was a bug
>>> (really?) then it can't be used here.
>>
>> add_interrupt_randomness() relies on interrupts being disabled in
>> __handle_irq_event_percpu(), so this check assumes atomic context.
>>
>
> But __handle_irq_event_percpu() also assumes that local interrupts have
> been disabled. There is a WARN_ONCE to that effect.

Yes, but that WARN_ON depends on irqs_disabled() returning true if and 
only if interrupts are fully disabled, i.e. the IPL is 7. Our 
implementation of irqs_disabled() returns true if the IPL is > 0 (in 
general) or < 2 (Atari).

>
>> Our definition of irqs_disabled() invalidates that assumption.
>>
>>> Perhaps add_interrupt_randomness() should use the lock in irq mode,
>>> like the rest of drivers/char/random.c does.
>>
>> That might deadlock on SMP when a task reading from the random pool
>> holding the lock executes on the same CPU as the interrupt.
>>
>
> That does not make sense to me. add_interrupt_randomness() effectively
> does acquire the lock in irq mode on SMP ports yet it doesn't deadlock.

It uses spin_trylock() while interrupts are disabled, so someone must 
have worried about deadlocks there.

>> In the UP case, the spinlock is optimized away,
>
> Yes and spin_trylock() always succeeds. Hence CONFIG_SPINLOCK_DEBUG
> asserts that the lock is never contended.

That assertion fails here (IMO because we allow interrupts in code that 
assumes they have been disabled, and tries unsuccessfully to assert 
that), but that's OK in this case, as the code protected by the lock 
isn't called if locking fails.

>
> One thing that bothered me when I raised this issue was the apparent false
> positive from CONFIG_SPINLOCK_DEBUG. But after sleeping on it, it makes
> more sense to me.
>
> (Perhaps the difficulty here is !SMP && !PREEMPT_COUNT. If PREEMPT_COUNT
> was available on m68k then spin_trylock() might be expected to test
> preempt_count, and the problem would not arise.)
>
> BTW, the problem went away when I changed to irq mode locking to avoid any
> lock contention, like so:
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 605969ed0f96..4e73c87cbdb8 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1248,6 +1248,7 @@ void add_interrupt_randomness(int irq, int irq_flags)
>  	struct fast_pool	*fast_pool = this_cpu_ptr(&irq_randomness);
>  	struct pt_regs		*regs = get_irq_regs();
>  	unsigned long		now = jiffies;
> +	unsigned long		flags;
>  	cycles_t		cycles = random_get_entropy();
>  	__u32			c_high, j_high;
>  	__u64			ip;
> @@ -1281,12 +1282,12 @@ void add_interrupt_randomness(int irq, int irq_flags)
>  		return;
>
>  	r = &input_pool;
> -	if (!spin_trylock(&r->lock))
> +	if (!spin_trylock_irqsave(&r->lock, flags))
>  		return;
>
>  	fast_pool->last = now;
>  	__mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
> -	spin_unlock(&r->lock);
> +	spin_unlock_irqrestore(&r->lock, flags);
>
>  	fast_pool->count = 0;
>

You add a redundant interrupt disable/enable for SMP or PREEMPT arches. 
That might be justified if it fixes a kernel crash, not merely a 
CONFIG_DEBUG_SPINLOCK warning.

If we can add preempt count checking or actual spinlocks (without adding 
too much overhead everywhere else), that might fix this issue without 
need for a patch to random.c?

(But then, so would fixing irqs_disables(), or running interrupt 
handlers with interrupts disabled...)

Cheers,

	Michael


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-09 22:51                                                                       ` Finn Thain
@ 2021-09-10  0:03                                                                         ` Michael Schmitz
  2021-09-12  0:51                                                                           ` Finn Thain
  0 siblings, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-09-10  0:03 UTC (permalink / raw)
  To: Finn Thain; +Cc: linux-m68k

Hi Finn,

On 10/09/21 10:51, Finn Thain wrote:
> On Wed, 8 Sep 2021, Michael Schmitz wrote:
>
>>
>> In a related case, I've managed to swap my 'resume_userspace' format
>> error for a nice 'illegal instruction' format error apparently caused by
>> an invalid function pointer in __handle_irq_event_percpu(), just by
>> disabling all interrupts upon entering the auto_inthandler and
>> user_inthandler exception handlers. This bug is quite readily reproduced
>> by running your kernel_coverage.sh script in a loop (panics on the first
>> stress test on the second pass):
>>
>> Stress run 2
>> Logging to stress-ng-20210908-0838.log
>> ./kernel-coverage.sh: line 272: lcov: command not found
>> running --fork 1 --fork-vm -t 60 --timestamp --no-rand-seed --times
>> stress-ng: 08:40:08.70 info:  [1914] setting to a 60 second run per stressor
>> stress-ng: 08:40:08.82 info:  [1914] dispatching hogs: 1 fork
>> packet_write_wait: Connection to 10.1.1.4 port 22: Broken pipe
>>
>> Why disabling interrupts during interrupt processing would make matters
>> worse doesn't make any sense to me...
>>
>
> Are you able to reproduce that with a stock mainline build?

I still need to try that. The kernel panic went away when adding 
additional local_irq_save()/local_irq_restore() in do_IRQ(), which 
should done have nothing except slightly increase code size. Now trying 
to find out whether interrupts are reenabled any time during interrupt 
processing ...

Cheers,

	Michael



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-10  0:03                                                                         ` Michael Schmitz
@ 2021-09-12  0:51                                                                           ` Finn Thain
  2021-09-12  3:55                                                                             ` Brad Boyer
  2021-09-13  1:27                                                                             ` Finn Thain
  0 siblings, 2 replies; 73+ messages in thread
From: Finn Thain @ 2021-09-12  0:51 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: linux-m68k

On Fri, 10 Sep 2021, Michael Schmitz wrote:

> On 10/09/21 10:51, Finn Thain wrote:
> > On Wed, 8 Sep 2021, Michael Schmitz wrote:
> > 
> > > 
> > > In a related case, I've managed to swap my 'resume_userspace' format 
> > > error for a nice 'illegal instruction' format error apparently 
> > > caused by an invalid function pointer in 
> > > __handle_irq_event_percpu(), just by disabling all interrupts upon 
> > > entering the auto_inthandler and user_inthandler exception handlers. 
> > > This bug is quite readily reproduced by running your 
> > > kernel_coverage.sh script in a loop (panics on the first stress test 
> > > on the second pass):
> > > 
> > > Stress run 2
> > > Logging to stress-ng-20210908-0838.log
> > > ./kernel-coverage.sh: line 272: lcov: command not found
> > > running --fork 1 --fork-vm -t 60 --timestamp --no-rand-seed --times
> > > stress-ng: 08:40:08.70 info:  [1914] setting to a 60 second run per
> > > stressor
> > > stress-ng: 08:40:08.82 info:  [1914] dispatching hogs: 1 fork
> > > packet_write_wait: Connection to 10.1.1.4 port 22: Broken pipe
> > > 
> > > Why disabling interrupts during interrupt processing would make 
> > > matters worse doesn't make any sense to me...
> > > 
> > 
> > Are you able to reproduce that with a stock mainline build?
> 
> I still need to try that. The kernel panic went away when adding 
> additional local_irq_save()/local_irq_restore() in do_IRQ(),

I eventually realized that the code using percpu variables inside 
add_interrupt_randomness() outside of the spinlock may also need irqs 
disabled. So I've now done as you did, that is,

diff --git a/arch/m68k/kernel/irq.c b/arch/m68k/kernel/irq.c
index 9ab4f550342e..b46d8a57f4da 100644
--- a/arch/m68k/kernel/irq.c
+++ b/arch/m68k/kernel/irq.c
@@ -20,10 +20,13 @@
 asmlinkage void do_IRQ(int irq, struct pt_regs *regs)
 {
 	struct pt_regs *oldregs = set_irq_regs(regs);
+	unsigned long flags;
 
+	local_irq_save(flags);
 	irq_enter();
 	generic_handle_irq(irq);
 	irq_exit();
+	local_irq_restore(flags);
 
 	set_irq_regs(oldregs);
 }

There may be a better way to achieve that. If the final IPL can be found 
in regs then it doesn't need to be saved again.

I haven't looked for a possible entropy pool improvement from correct 
locking in random.c -- it would not surprise me if there was one.

But none of this explains the panics I saw so I went looking for potential 
race conditions in the irq_enter_rcu() and irq_exit_rcu() code. I haven't 
found the bug yet.

However, I did notice that preempt_count_add() and preempt_count_sub() and 
the associated macros in linux/preempt.h produce very inefficient code in 
the interrupt fast path. Here's one example (there are others).

000323de <irq_enter_rcu>:
   323de:       4e56 0000       linkw %fp,#0
   323e2:       200f            movel %sp,%d0
   323e4:       0280 ffff e000  andil #-8192,%d0
   323ea:       2040            moveal %d0,%a0
   323ec:       2028 000c       movel %a0@(12),%d0
   323f0:       0680 0001 0000  addil #65536,%d0
   323f6:       2140 000c       movel %d0,%a0@(12)
   323fa:       082a 0001 000f  btst #1,%a2@(15)
   32400:       670c            beqs 3240e <irq_enter_rcu+0x30>
   32402:       2028 000c       movel %a0@(12),%d0
   32406:       2028 000c       movel %a0@(12),%d0
   3240a:       2028 000c       movel %a0@(12),%d0
   3240e:       4e5e            unlk %fp
   32410:       4e75            rts

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-12  0:51                                                                           ` Finn Thain
@ 2021-09-12  3:55                                                                             ` Brad Boyer
  2021-09-13  1:27                                                                             ` Finn Thain
  1 sibling, 0 replies; 73+ messages in thread
From: Brad Boyer @ 2021-09-12  3:55 UTC (permalink / raw)
  To: Finn Thain; +Cc: Michael Schmitz, linux-m68k

On Sun, Sep 12, 2021 at 10:51:21AM +1000, Finn Thain wrote:
> However, I did notice that preempt_count_add() and preempt_count_sub() and 
> the associated macros in linux/preempt.h produce very inefficient code in 
> the interrupt fast path. Here's one example (there are others).
> 
> 000323de <irq_enter_rcu>:
>    323de:       4e56 0000       linkw %fp,#0
>    323e2:       200f            movel %sp,%d0
>    323e4:       0280 ffff e000  andil #-8192,%d0
>    323ea:       2040            moveal %d0,%a0
>    323ec:       2028 000c       movel %a0@(12),%d0
>    323f0:       0680 0001 0000  addil #65536,%d0
>    323f6:       2140 000c       movel %d0,%a0@(12)
>    323fa:       082a 0001 000f  btst #1,%a2@(15)
>    32400:       670c            beqs 3240e <irq_enter_rcu+0x30>
>    32402:       2028 000c       movel %a0@(12),%d0
>    32406:       2028 000c       movel %a0@(12),%d0
>    3240a:       2028 000c       movel %a0@(12),%d0
>    3240e:       4e5e            unlk %fp
>    32410:       4e75            rts

If I'm reading the code correctly, this is due to the use of READ_ONCE
in a bunch of places. I'm pretty sure that forces the compiler to
produce a separate read instruction each time even if the value isn't
used or the operation could have otherwise been optimized. Reading
the result it's obvious this isn't useful (particularly the three
discarded reads of the same address), but I think the compiler is
just doing what we told it to do. Each time we load %a0@(12), that's
one of these instances (I think it's the preempt count in this case).
Obviously every one of them would be safe to optimize given the
context, but the compiler doesn't know that.

I suspect this could be optimized by defining alternate versions of
some of the relevant macros that do nothing for cases like this.
However, it might be tricky to figure out ways to do this that
won't break something else.

	Brad Boyer
	flar@allandria.com


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-12  0:51                                                                           ` Finn Thain
  2021-09-12  3:55                                                                             ` Brad Boyer
@ 2021-09-13  1:27                                                                             ` Finn Thain
  2021-09-13  3:26                                                                               ` Michael Schmitz
  1 sibling, 1 reply; 73+ messages in thread
From: Finn Thain @ 2021-09-13  1:27 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: linux-m68k

On Sun, 12 Sep 2021, Finn Thain wrote:

> ... I've now done as you did, that is,
> 
> diff --git a/arch/m68k/kernel/irq.c b/arch/m68k/kernel/irq.c
> index 9ab4f550342e..b46d8a57f4da 100644
> --- a/arch/m68k/kernel/irq.c
> +++ b/arch/m68k/kernel/irq.c
> @@ -20,10 +20,13 @@
>  asmlinkage void do_IRQ(int irq, struct pt_regs *regs)
>  {
>  	struct pt_regs *oldregs = set_irq_regs(regs);
> +	unsigned long flags;
>  
> +	local_irq_save(flags);
>  	irq_enter();
>  	generic_handle_irq(irq);
>  	irq_exit();
> +	local_irq_restore(flags);
>  
>  	set_irq_regs(oldregs);
>  }
> 
> There may be a better way to achieve that. If the final IPL can be found 
> in regs then it doesn't need to be saved again.
> 
> I haven't looked for a possible entropy pool improvement from correct 
> locking in random.c -- it would not surprise me if there was one.
> 
> But none of this explains the panics I saw so I went looking for potential 
> race conditions in the irq_enter_rcu() and irq_exit_rcu() code. I haven't 
> found the bug yet.
> 

Turns out that the panic bug was not affected by that patch...

running --mmap -1 --mmap-odirect --mmap-bytes 100% -t 60 --timestamp --no-rand-seed --times
stress-ng: 17:06:09.62 info:  [1241] setting to a 60 second run per stressor
stress-ng: 17:06:09.62 info:  [1241] dispatching hogs: 1 mmap
[  807.270000] Kernel panic - not syncing: Aiee, killing interrupt handler!
[  807.270000] CPU: 0 PID: 1243 Comm: stress-ng Not tainted 5.14.0-multi-00002-g69f953866c7e #2
[  807.270000] Stack from 00bcbde4:
[  807.270000]         00bcbde4 00488d85 00488d85 000c0000 00bcbe00 003f3708 00488d85 00bcbe20
[  807.270000]         003f270e 000c0000 418004fc 00bca000 009f8a80 00bca000 00a06fc0 00bcbe5c
[  807.270000]         000317f6 0048098b 00000009 418004fc 00bca000 00000000 07408000 00000009
[  807.270000]         00000008 00bcbf38 00a06fc0 00000006 00000000 00000001 00bcbe6c 000319ac
[  807.270000]         00000009 01438a20 00bcbeb8 0003acf0 00000009 0000000f 0000000e c043c000
[  807.270000]         00000000 07408000 00000003 00bcbf98 efb2c944 efb2b8a8 00039afa 00bca000
[  807.270000] Call Trace: [<000c0000>] insert_vmap_area.constprop.91+0xbc/0x15a
[  807.270000]  [<003f3708>] dump_stack+0x10/0x16
[  807.270000]  [<003f270e>] panic+0xba/0x2bc
[  807.270000]  [<000c0000>] insert_vmap_area.constprop.91+0xbc/0x15a
[  807.270000]  [<000317f6>] do_exit+0x87e/0x9d6
[  807.270000]  [<000319ac>] do_group_exit+0x28/0xb6
[  807.270000]  [<0003acf0>] get_signal+0x126/0x720
[  807.270000]  [<00039afa>] send_signal+0xde/0x16e
[  807.270000]  [<00004f70>] do_notify_resume+0x38/0x61c
[  807.270000]  [<0003abaa>] force_sig_fault_to_task+0x36/0x3a
[  807.270000]  [<0003abc6>] force_sig_fault+0x18/0x1c
[  807.270000]  [<000074f4>] send_fault_sig+0x44/0xc6
[  807.270000]  [<00006a62>] buserr_c+0x2c8/0x6a2
[  807.270000]  [<00002cfc>] do_signal_return+0x10/0x1a
[  807.270000]  [<0018800e>] ext4_htree_fill_tree+0x7c/0x32a
[  807.270000]  [<0010800a>] d_absolute_path+0x18/0x6a
[  807.270000] 
[  807.270000] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]---

On the Quadra 630, the panic almost completely disappeared when I enabled 
the relevant CONFIG_DEBUG_* options. After about 7 hours of stress testing 
I got this:

[23982.680000] list_add corruption. next->prev should be prev (00b51e98), but was 00bb22d8. (next=00b75cd0).
[23982.690000] kernel BUG at lib/list_debug.c:25!
[23982.700000] *** TRAP #7 ***   FORMAT=0
[23982.710000] Current process id is 15489
[23982.720000] BAD KERNEL TRAP: 00000000
[23982.740000] Modules linked in:
[23982.750000] PC: [<00261e62>] __list_add_valid+0x62/0xc0
[23982.760000] SR: 2000  SP: e2fb938b  a2: 00bcba80
[23982.770000] d0: 00000022    d1: 00000002    d2: 008c4e40    d3: 00b7a9c0
[23982.780000] d4: 00b51e98    d5: 000da3c0    a0: 00067f00    a1: 00b51d2c
[23982.790000] Process stress-ng (pid: 15489, task=35ee07ca)
[23982.800000] Frame format=0 
[23982.810000] Stack from 00b51e80:
[23982.810000]         004cbab9 004ea3a1 00000019 004ea34f 00b51e98 00bb22d8 00b75cd0 008c4e38
[23982.810000]         00b51ecc 000da3f2 008c4e40 00b51e98 00b75cd0 00b51e5c 000f5d40 00b75cd0
[23982.810000]         00b7a9c0 00bb22d0 00b7a9c0 00b51f04 000dc346 00b51e5c 008c4e38 00b7a9c0
[23982.810000]         c4c97000 00000000 c4c96000 00102073 00b14960 c4c97000 00b51e5c 00b75c94
[23982.810000]         00000001 00b51f24 000d5628 00b51e5c 00b75c94 00102070 00000000 00b75c94
[23982.810000]         00b75c94 00b51f3c 000d5728 00b14960 00b75c94 c4c97000 00000000 00b51f78
[23982.830000] Call Trace: [<000da3f2>] anon_vma_chain_link+0x32/0x80
[23982.840000]  [<000f5d40>] kmem_cache_alloc+0x0/0x200
[23982.850000]  [<000dc346>] anon_vma_clone+0xc6/0x180
[23982.860000]  [<00102073>] cdev_get+0x33/0x80
[23982.870000]  [<000d5628>] __split_vma+0x68/0x140
[23982.880000]  [<00102070>] cdev_get+0x30/0x80
[23982.890000]  [<000d5728>] split_vma+0x28/0x40
[23982.900000]  [<000d83ba>] mprotect_fixup+0x13a/0x200
[23982.910000]  [<00102070>] cdev_get+0x30/0x80
[23982.920000]  [<000d8280>] mprotect_fixup+0x0/0x200
[23982.930000]  [<000d85b2>] sys_mprotect+0x132/0x1c0
[23982.940000]  [<00102070>] cdev_get+0x30/0x80
[23982.950000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[23982.960000]  [<000071df>] flush_icache_range+0x1f/0x40
[23982.970000]  [<00002ca4>] syscall+0x8/0xc
[23982.980000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[23982.990000]  [<00001000>] kernel_pg_dir+0x0/0x1000
[23983.000000]  [<00002000>] _start+0x0/0x40
[23983.010000]  [<0018800e>] ext4_ext_remove_space+0x20e/0x1540
[23983.030000] 
[23983.040000] Code: 4879 004e a3a1 4879 004c bab9 4e93 4e47 <b089> 6704 b088 661c 2f08 2f2e 000c 2f00 4879 004e a404 47f9 0043 d16c 4e93 4878
[23983.060000] Disabling lock debugging due to kernel taint

I am still unable to reproduce this in Aranym or QEMU. (Though I did find 
a QEMU bug in the attempt.)

I suppose list pointer corruption could have resulted in the above panic 
had it gone undetected. So it's tempting to blame the panic on bad DRAM -- 
especially if this anon_vma_chain struct always gets placed at the same 
physical address (?)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-13  1:27                                                                             ` Finn Thain
@ 2021-09-13  3:26                                                                               ` Michael Schmitz
  2021-09-13  5:22                                                                                 ` Finn Thain
  0 siblings, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-09-13  3:26 UTC (permalink / raw)
  To: Finn Thain; +Cc: linux-m68k

Hi Finn,

On 13/09/21 13:27, Finn Thain wrote:
> On Sun, 12 Sep 2021, Finn Thain wrote:
>
>> ... I've now done as you did, that is,
>>
>> diff --git a/arch/m68k/kernel/irq.c b/arch/m68k/kernel/irq.c
>> index 9ab4f550342e..b46d8a57f4da 100644
>> --- a/arch/m68k/kernel/irq.c
>> +++ b/arch/m68k/kernel/irq.c
>> @@ -20,10 +20,13 @@
>>  asmlinkage void do_IRQ(int irq, struct pt_regs *regs)
>>  {
>>  	struct pt_regs *oldregs = set_irq_regs(regs);
>> +	unsigned long flags;
>>
>> +	local_irq_save(flags);
>>  	irq_enter();
>>  	generic_handle_irq(irq);
>>  	irq_exit();
>> +	local_irq_restore(flags);
>>
>>  	set_irq_regs(oldregs);
>>  }
>>
>> There may be a better way to achieve that. If the final IPL can be found
>> in regs then it doesn't need to be saved again.
>>
>> I haven't looked for a possible entropy pool improvement from correct
>> locking in random.c -- it would not surprise me if there was one.
>>
>> But none of this explains the panics I saw so I went looking for potential
>> race conditions in the irq_enter_rcu() and irq_exit_rcu() code. I haven't
>> found the bug yet.
>>
>
> Turns out that the panic bug was not affected by that patch...
>
> running --mmap -1 --mmap-odirect --mmap-bytes 100% -t 60 --timestamp --no-rand-seed --times
> stress-ng: 17:06:09.62 info:  [1241] setting to a 60 second run per stressor
> stress-ng: 17:06:09.62 info:  [1241] dispatching hogs: 1 mmap
> [  807.270000] Kernel panic - not syncing: Aiee, killing interrupt handler!
> [  807.270000] CPU: 0 PID: 1243 Comm: stress-ng Not tainted 5.14.0-multi-00002-g69f953866c7e #2
> [  807.270000] Stack from 00bcbde4:
> [  807.270000]         00bcbde4 00488d85 00488d85 000c0000 00bcbe00 003f3708 00488d85 00bcbe20
> [  807.270000]         003f270e 000c0000 418004fc 00bca000 009f8a80 00bca000 00a06fc0 00bcbe5c
> [  807.270000]         000317f6 0048098b 00000009 418004fc 00bca000 00000000 07408000 00000009
> [  807.270000]         00000008 00bcbf38 00a06fc0 00000006 00000000 00000001 00bcbe6c 000319ac
> [  807.270000]         00000009 01438a20 00bcbeb8 0003acf0 00000009 0000000f 0000000e c043c000
> [  807.270000]         00000000 07408000 00000003 00bcbf98 efb2c944 efb2b8a8 00039afa 00bca000
> [  807.270000] Call Trace: [<000c0000>] insert_vmap_area.constprop.91+0xbc/0x15a
> [  807.270000]  [<003f3708>] dump_stack+0x10/0x16
> [  807.270000]  [<003f270e>] panic+0xba/0x2bc
> [  807.270000]  [<000c0000>] insert_vmap_area.constprop.91+0xbc/0x15a
> [  807.270000]  [<000317f6>] do_exit+0x87e/0x9d6
> [  807.270000]  [<000319ac>] do_group_exit+0x28/0xb6
> [  807.270000]  [<0003acf0>] get_signal+0x126/0x720
> [  807.270000]  [<00039afa>] send_signal+0xde/0x16e
> [  807.270000]  [<00004f70>] do_notify_resume+0x38/0x61c
> [  807.270000]  [<0003abaa>] force_sig_fault_to_task+0x36/0x3a
> [  807.270000]  [<0003abc6>] force_sig_fault+0x18/0x1c
> [  807.270000]  [<000074f4>] send_fault_sig+0x44/0xc6
> [  807.270000]  [<00006a62>] buserr_c+0x2c8/0x6a2
> [  807.270000]  [<00002cfc>] do_signal_return+0x10/0x1a
> [  807.270000]  [<0018800e>] ext4_htree_fill_tree+0x7c/0x32a
> [  807.270000]  [<0010800a>] d_absolute_path+0x18/0x6a
> [  807.270000]
> [  807.270000] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]---
>
> On the Quadra 630, the panic almost completely disappeared when I enabled
> the relevant CONFIG_DEBUG_* options. After about 7 hours of stress testing
> I got this:
>
> [23982.680000] list_add corruption. next->prev should be prev (00b51e98), but was 00bb22d8. (next=00b75cd0).

I chased a similar list corruption bug (shadow LRU list corrupt in 
mm/workingset.c:shadow_lru_isolate()) in 4.10. I believe that was 
related to an out of bounds memory access - maybe get_reg() from 
drivers/char/random.c but it might have been something else.

That bug had disappeared in 4.12, haven't seen it ever since.


> [23982.690000] kernel BUG at lib/list_debug.c:25!
> [23982.700000] *** TRAP #7 ***   FORMAT=0
> [23982.710000] Current process id is 15489
> [23982.720000] BAD KERNEL TRAP: 00000000
> [23982.740000] Modules linked in:
> [23982.750000] PC: [<00261e62>] __list_add_valid+0x62/0xc0
> [23982.760000] SR: 2000  SP: e2fb938b  a2: 00bcba80
> [23982.770000] d0: 00000022    d1: 00000002    d2: 008c4e40    d3: 00b7a9c0
> [23982.780000] d4: 00b51e98    d5: 000da3c0    a0: 00067f00    a1: 00b51d2c
> [23982.790000] Process stress-ng (pid: 15489, task=35ee07ca)
> [23982.800000] Frame format=0
> [23982.810000] Stack from 00b51e80:
> [23982.810000]         004cbab9 004ea3a1 00000019 004ea34f 00b51e98 00bb22d8 00b75cd0 008c4e38
> [23982.810000]         00b51ecc 000da3f2 008c4e40 00b51e98 00b75cd0 00b51e5c 000f5d40 00b75cd0
> [23982.810000]         00b7a9c0 00bb22d0 00b7a9c0 00b51f04 000dc346 00b51e5c 008c4e38 00b7a9c0
> [23982.810000]         c4c97000 00000000 c4c96000 00102073 00b14960 c4c97000 00b51e5c 00b75c94
> [23982.810000]         00000001 00b51f24 000d5628 00b51e5c 00b75c94 00102070 00000000 00b75c94
> [23982.810000]         00b75c94 00b51f3c 000d5728 00b14960 00b75c94 c4c97000 00000000 00b51f78
> [23982.830000] Call Trace: [<000da3f2>] anon_vma_chain_link+0x32/0x80
> [23982.840000]  [<000f5d40>] kmem_cache_alloc+0x0/0x200
> [23982.850000]  [<000dc346>] anon_vma_clone+0xc6/0x180
> [23982.860000]  [<00102073>] cdev_get+0x33/0x80
> [23982.870000]  [<000d5628>] __split_vma+0x68/0x140
> [23982.880000]  [<00102070>] cdev_get+0x30/0x80
> [23982.890000]  [<000d5728>] split_vma+0x28/0x40
> [23982.900000]  [<000d83ba>] mprotect_fixup+0x13a/0x200
> [23982.910000]  [<00102070>] cdev_get+0x30/0x80
> [23982.920000]  [<000d8280>] mprotect_fixup+0x0/0x200
> [23982.930000]  [<000d85b2>] sys_mprotect+0x132/0x1c0
> [23982.940000]  [<00102070>] cdev_get+0x30/0x80
> [23982.950000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [23982.960000]  [<000071df>] flush_icache_range+0x1f/0x40
> [23982.970000]  [<00002ca4>] syscall+0x8/0xc
> [23982.980000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [23982.990000]  [<00001000>] kernel_pg_dir+0x0/0x1000
> [23983.000000]  [<00002000>] _start+0x0/0x40
> [23983.010000]  [<0018800e>] ext4_ext_remove_space+0x20e/0x1540
> [23983.030000]
> [23983.040000] Code: 4879 004e a3a1 4879 004c bab9 4e93 4e47 <b089> 6704 b088 661c 2f08 2f2e 000c 2f00 4879 004e a404 47f9 0043 d16c 4e93 4878
> [23983.060000] Disabling lock debugging due to kernel taint
>
> I am still unable to reproduce this in Aranym or QEMU. (Though I did find
> a QEMU bug in the attempt.)
>
> I suppose list pointer corruption could have resulted in the above panic
> had it gone undetected. So it's tempting to blame the panic on bad DRAM --

Yes, such list corruption may well cause a kernel panic. I'd expect bad 
DRAM to manifest other ways before a corrupted linked list causes a 
kernel panic though. Filesystem corruption, for instance.

> especially if this anon_vma_chain struct always gets placed at the same
> physical address (?)

Does it? I think this would be part of the per-process VMA data, not 
something like page tables ...

Incidentally - have you ever checked whether Al Viro's signal handling 
fixes have an impact on these bugs?

Cheers,

	Michael



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-13  3:26                                                                               ` Michael Schmitz
@ 2021-09-13  5:22                                                                                 ` Finn Thain
  2021-09-13  7:20                                                                                   ` Michael Schmitz
  0 siblings, 1 reply; 73+ messages in thread
From: Finn Thain @ 2021-09-13  5:22 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: linux-m68k

On Mon, 13 Sep 2021, Michael Schmitz wrote:

> > 
> > [23982.680000] list_add corruption. next->prev should be prev 
> > (00b51e98), but was 00bb22d8. (next=00b75cd0).
> 
> I chased a similar list corruption bug (shadow LRU list corrupt in 
> mm/workingset.c:shadow_lru_isolate()) in 4.10. I believe that was 
> related to an out of bounds memory access - maybe get_reg() from 
> drivers/char/random.c but it might have been something else.
> 
> That bug had disappeared in 4.12, haven't seen it ever since.
> 

Do all of your builds have BUG_ON_DATA_CORRUPTION and DEBUG_LIST enabled?

> 
> Incidentally - have you ever checked whether Al Viro's signal handling 
> fixes have an impact on these bugs?
> 

I will try that patch series if you think it is related.

So far the problem seems to be confined to one machine. Stress tests on 
other mac models did not yet reproduce the problem.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-13  5:22                                                                                 ` Finn Thain
@ 2021-09-13  7:20                                                                                   ` Michael Schmitz
  2021-09-14  3:13                                                                                     ` Michael Schmitz
  2021-09-15  1:38                                                                                     ` Finn Thain
  0 siblings, 2 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-09-13  7:20 UTC (permalink / raw)
  To: Finn Thain; +Cc: linux-m68k

Hi Finn,

On 13/09/21 17:22, Finn Thain wrote:
> On Mon, 13 Sep 2021, Michael Schmitz wrote:
>
>>>
>>> [23982.680000] list_add corruption. next->prev should be prev
>>> (00b51e98), but was 00bb22d8. (next=00b75cd0).
>>
>> I chased a similar list corruption bug (shadow LRU list corrupt in
>> mm/workingset.c:shadow_lru_isolate()) in 4.10. I believe that was
>> related to an out of bounds memory access - maybe get_reg() from
>> drivers/char/random.c but it might have been something else.
>>
>> That bug had disappeared in 4.12, haven't seen it ever since.
>>
>
> Do all of your builds have BUG_ON_DATA_CORRUPTION and DEBUG_LIST enabled?

None had, but that particular list corruption had generated warnings, 
and null pointer accesses. __list_del() uses WRITE_ONCE() now, can't 
remember that from 4.10 (but the log for linux/list.h doesn't mention 
adding WRITE_ONCE so I suppose it must have been there).

>
>>
>> Incidentally - have you ever checked whether Al Viro's signal handling
>> fixes have an impact on these bugs?
>>
>
> I will try that patch series if you think it is related.

Initial tests look promising (but I've said that before).

> So far the problem seems to be confined to one machine. Stress tests on
> other mac models did not yet reproduce the problem.

Yes, that's suspicious. I'll keep you posted.

Cheers,

	Michael

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-23 17:59                                           ` Linus Torvalds
  2021-08-23 21:31                                             ` Michael Schmitz
@ 2021-09-14  2:43                                             ` Michael Schmitz
  2021-09-14 15:54                                               ` Linus Torvalds
  1 sibling, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-09-14  2:43 UTC (permalink / raw)
  To: Linus Torvalds, Al Viro
  Cc: Christoph Hellwig, Andreas Schwab, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

Hi Linus,

having run stress tests on a kernel with Al Viro's signal handling fixes
applied for the past two days with no further errors, I am now quite
confident that the format error I saw in resume_userspace() and a bus
error in setup_frame() were caused by multiple pending signals, and the
resulting stack mangling that Al's patches fix.

I'll try to conclusively prove that for the setup_frame() case, but for
the sake of bringing Christoph's set_fs patches for m68k to a close,
I'll go with this assumption.

Al: I'm still poring over some of the subtleties of your patch series,
but you can add my Tested-by at least (030 only).

Cheers,

    Michael


Am 24.08.21 um 05:59 schrieb Linus Torvalds:
> On Sun, Aug 22, 2021 at 12:34 PM Michael Schmitz <schmitzmic@gmail.com> wrote:
>> Got this overnight:
>>
>>> [536154.200000] *** FORMAT ERROR ***   FORMAT=0
>>> [536154.210000] Current process id is 4656
>>> [536154.230000] BAD KERNEL TRAP: 00000000
>>> [536154.240000] Modules linked in: atari_scsi ne 8390p [last unloaded: atari_scsi]
>>> [536154.260000] PC: [<00002a8c>] resume_userspace+0x14/0x16
>>> [536154.270000] SR: 2208  SP: 977bd1be  a2: 8009b5e8
>>> [536154.290000] d0: 8009b5e8    d1: cfcfcfcf    d2: 00000000    d3: ffffffff
>>> [536154.300000] d4: 00000000    d5: 00000000    a0: 8008a108    a1: 8009b7df
>>> [536154.320000] Process savelog (pid: 4656, task=e49aa246)
>>> [536154.330000] Frame format=0
>>> [536154.340000] Stack from 00cc5fa4:
>>> [536154.340000]         02088004 3666b008 1c0eb209 007eb5e8 8006a2d0 efaec378 8004366c 61ff61ff
>>> [536154.340000]         8006a2d4 8006a2d2 00000000 030dfffb 0044fffa 0e000000 fffa1a00 fffa1c00
>>> [536154.340000]         fffa1e00 fffb0e40 fffb0e80 00049b66 00000040 005f5800 00000001
> Strange. If I read that stack frame correctly, that seems to be an
> exception frame of type 0xb ("Long Bus Cycle").
>
> Plus the frame content is then apparently corrupted enough that the
> rte causes an exception on trying to restore it.
>
> None of which makes sense or seems to have much at all to do with any
> of these patches. Yes, we mess with the exception frame, but only for
> fork(), and while "copy_process()" doesn't set any frame type, I see
> only two cases:
>
>  - the kernel thread one does a "memset()" to clear it, so you should
> end up with frame type 0
>
>  - the user thread case copies the original frame format (which I
> think is just the system call frame from the TRAP instruction).
>
> Are you 100% sure your hardware is stable?
>
>                   Linus


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-13  7:20                                                                                   ` Michael Schmitz
@ 2021-09-14  3:13                                                                                     ` Michael Schmitz
  2021-09-15  1:38                                                                                     ` Finn Thain
  1 sibling, 0 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-09-14  3:13 UTC (permalink / raw)
  To: Finn Thain; +Cc: linux-m68k

Hi Finn,

On 13/09/21 19:20, Michael Schmitz wrote:
>>>
>>> Incidentally - have you ever checked whether Al Viro's signal handling
>>> fixes have an impact on these bugs?
>>>
>>
>> I will try that patch series if you think it is related.
>
> Initial tests look promising (but I've said that before).
>
>> So far the problem seems to be confined to one machine. Stress tests on
>> other mac models did not yet reproduce the problem.
>
> Yes, that's suspicious. I'll keep you posted.

I've seen neither format errors in resume_userspace() nor bus errors in 
setup_frame() after two days of tests (and would have expected multiple 
errors in that amount of time). Al's patches appear to have fixed these 
errors for me. And that makes a lot of sense, considering handling of 
multiple signals was broken before, and would have been expected to 
result in corrupt exception frames on return from exception, and just 
the sort of format errors I'd seen.

I still have to trace what issues the incorrect exception stack handling 
would have caused in setup_frame() to force a bus error on the first 
__put_user() there.

My tests were run successfully without any disabling of interrupts games 
BTW.

The issues on 040 would be very similar, and I don't think you'd seen 
format errors on exception return or bus errors in setup_frame(), so 
this might not matter to you.

Cheers,

	Michael



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-09-14  2:43                                             ` Michael Schmitz
@ 2021-09-14 15:54                                               ` Linus Torvalds
  2021-09-14 16:28                                                 ` Al Viro
  0 siblings, 1 reply; 73+ messages in thread
From: Linus Torvalds @ 2021-09-14 15:54 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Al Viro, Christoph Hellwig, Andreas Schwab, Geert Uytterhoeven,
	Greg Ungerer, linux-m68k

On Mon, Sep 13, 2021 at 7:43 PM Michael Schmitz <schmitzmic@gmail.com> wrote:
>
> having run stress tests on a kernel with Al Viro's signal handling fixes
> applied for the past two days with no further errors, I am now quite
> confident that the format error I saw in resume_userspace() and a bus
> error in setup_frame() were caused by multiple pending signals, and the
> resulting stack mangling that Al's patches fix.

Sounds good. Except you say "Al Viro's signal handling fixes" without
giving any context, so I have no idea which patches you are referring
to.

Regardless, if there's some fixes queue that also removes set_fs()
from m68k, I'm willing to take it for 5.15 just to have another
architecture down and done with.

And even if the set_fs() patches might still be under discussion, the
format errors have been going on for some time, so the signal fixes -
wherever they are - seem to be no-brainers if they fix those.

          Linus

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-09-14 15:54                                               ` Linus Torvalds
@ 2021-09-14 16:28                                                 ` Al Viro
  2021-09-14 16:38                                                   ` Linus Torvalds
  0 siblings, 1 reply; 73+ messages in thread
From: Al Viro @ 2021-09-14 16:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michael Schmitz, Christoph Hellwig, Andreas Schwab,
	Geert Uytterhoeven, Greg Ungerer, linux-m68k

On Tue, Sep 14, 2021 at 08:54:26AM -0700, Linus Torvalds wrote:
> On Mon, Sep 13, 2021 at 7:43 PM Michael Schmitz <schmitzmic@gmail.com> wrote:
> >
> > having run stress tests on a kernel with Al Viro's signal handling fixes
> > applied for the past two days with no further errors, I am now quite
> > confident that the format error I saw in resume_userspace() and a bus
> > error in setup_frame() were caused by multiple pending signals, and the
> > resulting stack mangling that Al's patches fix.
> 
> Sounds good. Except you say "Al Viro's signal handling fixes" without
> giving any context, so I have no idea which patches you are referring
> to.
> 
> Regardless, if there's some fixes queue that also removes set_fs()
> from m68k, I'm willing to take it for 5.15 just to have another
> architecture down and done with.
> 
> And even if the set_fs() patches might still be under discussion, the
> format errors have been going on for some time, so the signal fixes -
> wherever they are - seem to be no-brainers if they fix those.

	vfs.git #{untested,fixes}.m68k, series posted to m68k folks back in July.
3 commits in there, and I've some local followup cleanups (ifdefs in m68k/signal.c
are bloody confused), but the latter got stalled by ptrace and signal review and
documentation writing...

	See https://lkml.kernel.org/lkml/YP2dTQPm1wGPWFgD@zeniv-ca.linux.org.uk/T/
for original thread.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-09-14 16:28                                                 ` Al Viro
@ 2021-09-14 16:38                                                   ` Linus Torvalds
  2021-09-15  1:06                                                     ` Al Viro
  0 siblings, 1 reply; 73+ messages in thread
From: Linus Torvalds @ 2021-09-14 16:38 UTC (permalink / raw)
  To: Al Viro
  Cc: Michael Schmitz, Christoph Hellwig, Andreas Schwab,
	Geert Uytterhoeven, Greg Ungerer, linux-m68k

On Tue, Sep 14, 2021 at 9:28 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> See https://lkml.kernel.org/lkml/YP2dTQPm1wGPWFgD@zeniv-ca.linux.org.uk/T/
> for original thread.

Ok, thanks for the pointer.

I'm actually surprised we have that "multiple nested signals" thing
still. I had this memory of us having removed that, and just returning
to user space with the one signal frame set up.

I was clearly wrong.

           Linus

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-09-14 16:38                                                   ` Linus Torvalds
@ 2021-09-15  1:06                                                     ` Al Viro
  0 siblings, 0 replies; 73+ messages in thread
From: Al Viro @ 2021-09-15  1:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michael Schmitz, Christoph Hellwig, Andreas Schwab,
	Geert Uytterhoeven, Greg Ungerer, linux-m68k

On Tue, Sep 14, 2021 at 09:38:58AM -0700, Linus Torvalds wrote:
> On Tue, Sep 14, 2021 at 9:28 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > See https://lkml.kernel.org/lkml/YP2dTQPm1wGPWFgD@zeniv-ca.linux.org.uk/T/
> > for original thread.
> 
> Ok, thanks for the pointer.
> 
> I'm actually surprised we have that "multiple nested signals" thing
> still. I had this memory of us having removed that, and just returning
> to user space with the one signal frame set up.

Too much headache and too many things to keep track of - for example,
you have to cope with fun like raising (and handling) SIGTRAP on
singlestepping into signal handler on some architectures.  That would
need to be special-cased and so would SIGSEGV on failure to set sigframe
up, etc.

Al, back to digging through io_uring and hating every minute of that...

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-13  7:20                                                                                   ` Michael Schmitz
  2021-09-14  3:13                                                                                     ` Michael Schmitz
@ 2021-09-15  1:38                                                                                     ` Finn Thain
  2021-09-15  8:37                                                                                       ` Michael Schmitz
  1 sibling, 1 reply; 73+ messages in thread
From: Finn Thain @ 2021-09-15  1:38 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: linux-m68k

On Mon, 13 Sep 2021, Michael Schmitz wrote:

> > > 
> > > Incidentally - have you ever checked whether Al Viro's signal 
> > > handling fixes have an impact on these bugs?
> > > 
> > 
> > I will try that patch series if you think it is related.
> 
> Initial tests look promising (but I've said that before).
> 

Here's what I found in recent tests on my Quadra 630.

The usual stress-ng panic can happen without list corruption, even with 
local_irq_save/restore() added to do_IRQ().

The panic did not show up at all during stress tests with Al's signal 
handling patch series.

I think my results are consistent with yours.

The kernel's 'memtest' didn't detect any bad DRAM but it isn't 
particularly thorough so I'm running some tests with memtester-4.5.1.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-15  1:38                                                                                     ` Finn Thain
@ 2021-09-15  8:37                                                                                       ` Michael Schmitz
  2021-09-16  9:04                                                                                         ` Finn Thain
  0 siblings, 1 reply; 73+ messages in thread
From: Michael Schmitz @ 2021-09-15  8:37 UTC (permalink / raw)
  To: Finn Thain; +Cc: linux-m68k

Hi Finn,

On 15/09/21 13:38, Finn Thain wrote:
> On Mon, 13 Sep 2021, Michael Schmitz wrote:
>
>>>>
>>>> Incidentally - have you ever checked whether Al Viro's signal
>>>> handling fixes have an impact on these bugs?
>>>>
>>>
>>> I will try that patch series if you think it is related.
>>
>> Initial tests look promising (but I've said that before).
>>
>
> Here's what I found in recent tests on my Quadra 630.
>
> The usual stress-ng panic can happen without list corruption, even with
> local_irq_save/restore() added to do_IRQ().
>
> The panic did not show up at all during stress tests with Al's signal
> handling patch series.
>
> I think my results are consistent with yours.

Thanks - that's encouraging to hear. My tests with Christoph's patches 
on top of Al's haven't shown any further errors either, but I'll give 
that combination some more workout.

Would you care to add your tested-by for Al's patches?

Cheers,

	Michael

>
> The kernel's 'memtest' didn't detect any bad DRAM but it isn't
> particularly thorough so I'm running some tests with memtester-4.5.1.
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-15  8:37                                                                                       ` Michael Schmitz
@ 2021-09-16  9:04                                                                                         ` Finn Thain
  2021-09-16 22:28                                                                                           ` Michael Schmitz
  0 siblings, 1 reply; 73+ messages in thread
From: Finn Thain @ 2021-09-16  9:04 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: linux-m68k

On Wed, 15 Sep 2021, Michael Schmitz wrote:

> On 15/09/21 13:38, Finn Thain wrote:
> > On Mon, 13 Sep 2021, Michael Schmitz wrote:
> > 
> > > > > Incidentally - have you ever checked whether Al Viro's signal 
> > > > > handling fixes have an impact on these bugs?
> > > > 
> > > > I will try that patch series if you think it is related.
> > > 
> > > Initial tests look promising (but I've said that before).
> > 
> > Here's what I found in recent tests on my Quadra 630.
> > 
> > The usual stress-ng panic can happen without list corruption, even 
> > with local_irq_save/restore() added to do_IRQ().
> > 
> > The panic did not show up at all during stress tests with Al's signal 
> > handling patch series.
> > 
> > I think my results are consistent with yours.
> 
> Thanks - that's encouraging to hear. My tests with Christoph's patches 
> on top of Al's haven't shown any further errors either, but I'll give 
> that combination some more workout.

Further stress testing here using Al's patches did eventually result in 
the same panic that I see using mainline (below).

> 
> Would you care to add your tested-by for Al's patches?

Sure. I haven't seen any regression, so
Tested-by: Finn Thain <fthain@linux-m68k.org>

---
running --mmap -1 --mmap-osync --mmap-bytes 100% -t 60 --timestamp --no-rand-seed --times
stress-ng: 22:52:11.63 info:  [5491] setting to a 60 second run per stressor
stress-ng: 22:52:11.64 info:  [5491] dispatching hogs: 1 mmap
[ 9858.090000] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 9858.090000] CPU: 0 PID: 5493 Comm: stress-ng Not tainted 5.14.0-multi-00003-gb2406d5d331a #7
[ 9858.090000] Stack from 00b4bde4:
[ 9858.090000]         00b4bde4 00488d5f 00488d5f 00040000 00b4be00 003f3630 00488d5f 00b4be20
[ 9858.090000]         003f2636 00040000 418004fc 00b4a000 009f8540 00b4a000 00a07440 00b4be5c
[ 9858.090000]         0003171e 00480965 00000009 418004fc 00b4a000 00000000 073f8000 00000009
[ 9858.090000]         00000008 00b4bf38 00a07440 00000006 00000000 00000001 00b4be6c 000318d4
[ 9858.090000]         00000009 01438f30 00b4beb8 0003ac18 00000009 0000000f 0000000e c043c000
[ 9858.090000]         00000000 073f8000 00000003 00b4bf98 eff82944 eff818a8 00039a22 00b4a000
[ 9858.090000] Call Trace: [<00040000>] rcu_free_pwq+0x1c/0x1e
[ 9858.090000]  [<003f3630>] dump_stack+0x10/0x16
[ 9858.090000]  [<003f2636>] panic+0xba/0x2bc
[ 9858.090000]  [<00040000>] rcu_free_pwq+0x1c/0x1e
[ 9858.090000]  [<0003171e>] do_exit+0x87e/0x9d6
[ 9858.090000]  [<000318d4>] do_group_exit+0x28/0xb6
[ 9858.090000]  [<0003ac18>] get_signal+0x126/0x720
[ 9858.090000]  [<00039a22>] send_signal+0xde/0x16e
[ 9858.090000]  [<00004f0c>] do_notify_resume+0x38/0x5dc
[ 9858.090000]  [<0003aad2>] force_sig_fault_to_task+0x36/0x3a
[ 9858.090000]  [<0003aaee>] force_sig_fault+0x18/0x1c
[ 9858.090000]  [<00007450>] send_fault_sig+0x44/0xc6
[ 9858.090000]  [<000069be>] buserr_c+0x2c8/0x6a2
[ 9858.090000]  [<00002cd8>] do_signal_return+0x10/0x1a
[ 9858.090000]  [<0018800e>] ext4_htree_fill_tree+0x154/0x32a
[ 9858.090000]  [<0010800a>] d_path+0x86/0x114
[ 9858.090000] 
[ 9858.090000] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]---

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: Mainline kernel crashes, was Re: RFC: remove set_fs for m68k
  2021-09-16  9:04                                                                                         ` Finn Thain
@ 2021-09-16 22:28                                                                                           ` Michael Schmitz
  0 siblings, 0 replies; 73+ messages in thread
From: Michael Schmitz @ 2021-09-16 22:28 UTC (permalink / raw)
  To: Finn Thain; +Cc: linux-m68k

Hi Finn,

On 16/09/21 21:04, Finn Thain wrote:
> On Wed, 15 Sep 2021, Michael Schmitz wrote:
>
>> On 15/09/21 13:38, Finn Thain wrote:
>>> On Mon, 13 Sep 2021, Michael Schmitz wrote:
>>>
>>>>>> Incidentally - have you ever checked whether Al Viro's signal
>>>>>> handling fixes have an impact on these bugs?
>>>>>
>>>>> I will try that patch series if you think it is related.
>>>>
>>>> Initial tests look promising (but I've said that before).
>>>
>>> Here's what I found in recent tests on my Quadra 630.
>>>
>>> The usual stress-ng panic can happen without list corruption, even
>>> with local_irq_save/restore() added to do_IRQ().
>>>
>>> The panic did not show up at all during stress tests with Al's signal
>>> handling patch series.
>>>
>>> I think my results are consistent with yours.
>>
>> Thanks - that's encouraging to hear. My tests with Christoph's patches
>> on top of Al's haven't shown any further errors either, but I'll give
>> that combination some more workout.
>
> Further stress testing here using Al's patches did eventually result in
> the same panic that I see using mainline (below).

That's bad - there's another bug lurking in the exception return code, 
it seems. Not a regression though.

>
>>
>> Would you care to add your tested-by for Al's patches?
>
> Sure. I haven't seen any regression, so
> Tested-by: Finn Thain <fthain@linux-m68k.org>
>
> ---
> running --mmap -1 --mmap-osync --mmap-bytes 100% -t 60 --timestamp --no-rand-seed --times
> stress-ng: 22:52:11.63 info:  [5491] setting to a 60 second run per stressor
> stress-ng: 22:52:11.64 info:  [5491] dispatching hogs: 1 mmap
> [ 9858.090000] Kernel panic - not syncing: Aiee, killing interrupt handler!

That one's from do_exit(), right at the start. Can you instrument that 
to print the hardirq and softirq counts separate?

> [ 9858.090000] CPU: 0 PID: 5493 Comm: stress-ng Not tainted 5.14.0-multi-00003-gb2406d5d331a #7
> [ 9858.090000] Stack from 00b4bde4:
> [ 9858.090000]         00b4bde4 00488d5f 00488d5f 00040000 00b4be00 003f3630 00488d5f 00b4be20
> [ 9858.090000]         003f2636 00040000 418004fc 00b4a000 009f8540 00b4a000 00a07440 00b4be5c
> [ 9858.090000]         0003171e 00480965 00000009 418004fc 00b4a000 00000000 073f8000 00000009
> [ 9858.090000]         00000008 00b4bf38 00a07440 00000006 00000000 00000001 00b4be6c 000318d4
> [ 9858.090000]         00000009 01438f30 00b4beb8 0003ac18 00000009 0000000f 0000000e c043c000
> [ 9858.090000]         00000000 073f8000 00000003 00b4bf98 eff82944 eff818a8 00039a22 00b4a000
> [ 9858.090000] Call Trace: [<00040000>] rcu_free_pwq+0x1c/0x1e
> [ 9858.090000]  [<003f3630>] dump_stack+0x10/0x16
> [ 9858.090000]  [<003f2636>] panic+0xba/0x2bc
> [ 9858.090000]  [<00040000>] rcu_free_pwq+0x1c/0x1e
> [ 9858.090000]  [<0003171e>] do_exit+0x87e/0x9d6

That offset into do_exit() does not make sense to me - in my version, 
that's beyond the end of do_exit(). Does this correspond to the 
in_interrupt() test in do_exit() in your image?

> [ 9858.090000]  [<000318d4>] do_group_exit+0x28/0xb6
> [ 9858.090000]  [<0003ac18>] get_signal+0x126/0x720
> [ 9858.090000]  [<00039a22>] send_signal+0xde/0x16e
> [ 9858.090000]  [<00004f0c>] do_notify_resume+0x38/0x5dc
> [ 9858.090000]  [<0003aad2>] force_sig_fault_to_task+0x36/0x3a
> [ 9858.090000]  [<0003aaee>] force_sig_fault+0x18/0x1c
> [ 9858.090000]  [<00007450>] send_fault_sig+0x44/0xc6
> [ 9858.090000]  [<000069be>] buserr_c+0x2c8/0x6a2
> [ 9858.090000]  [<00002cd8>] do_signal_return+0x10/0x1a

RESTORE_SWITCH_STACK in my version. We don't get there in interrupt 
context unless it's the only interrupt on the kernel stack.

This is after do_notify_resume() which would have called setup_frame() 
in case there was a signal pending (which we can pretty much assume 
here, unless you're tracing stress-ng).

I can't see anything in do_signal() and its call chain that would cause 
our stack pointer to change upon return from do_notify_resume() ...

Could you add code to do_notify_resume() that compares the 'regs' 
argument upon entry and return, and prints both if there is a mismatch?

I know, grasping at straws again ...

Cheers,

	Michael


> [ 9858.090000]  [<0018800e>] ext4_htree_fill_tree+0x154/0x32a
> [ 9858.090000]  [<0010800a>] d_path+0x86/0x114
> [ 9858.090000]
> [ 9858.090000] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]---
>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: remove set_fs for m68k
  2021-08-16  6:58                                     ` Christoph Hellwig
       [not found]                                       ` <23f745f2-9086-81fb-3d9e-40ea08a1923@linux-m68k.org>
@ 2021-09-21 21:14                                       ` Michael Schcmitz
  2021-08-22 19:33                                         ` Michael Schmitz
  1 sibling, 1 reply; 73+ messages in thread
From: Michael Schcmitz @ 2021-09-21 21:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Schwab, Geert Uytterhoeven, Greg Ungerer, linux-m68k,
	Linus Torvalds

Hi Christoph,

no errors seen after five days of testing. Adding in a VM stressor now ...

Cheers,

     Michael


On 16/08/21 18:58, Christoph Hellwig wrote:
> On Mon, Aug 16, 2021 at 07:17:15AM +1200, Michael Schmitz wrote:
>> Would it be acceptable to add this on top of the probe040/do_040writeback1
>> patch that I'm currently testing?
> Sure.


^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2021-09-16 22:30 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-09  7:01 RFC: remove set_fs for m68k Christoph Hellwig
2021-07-09  7:01 ` [PATCH 1/7] m68k: document that access_ok is broken for !CONFIG_CPU_HAS_ADDRESS_SPACES Christoph Hellwig
2021-07-09  7:01 ` [PATCH 2/7] m68k: use BUILD_BUG for passing invalid sizes to get_user/put_user Christoph Hellwig
2021-07-09  7:01 ` [PATCH 3/7] m68k: remove the inline copy_{from,to}_user variants Christoph Hellwig
2021-07-09  7:01 ` [PATCH 4/7] m68k: remove the err argument to the get_user/put_user assembly helpers Christoph Hellwig
2021-07-09  7:01 ` [PATCH 5/7] m68k: factor the 8-byte lowlevel {get,put}_user code into helpers Christoph Hellwig
2021-07-09  7:01 ` [PATCH 6/7] m68k: provide __{get,put}_kernel_nofault Christoph Hellwig
2021-07-09  7:01 ` [PATCH 7/7] m68k: remove set_fs() Christoph Hellwig
2021-07-11  7:20 ` RFC: remove set_fs for m68k Michael Schmitz
2021-07-12  9:50   ` Christoph Hellwig
2021-07-12 10:20   ` Andreas Schwab
2021-07-12 19:12     ` Michael Schmitz
2021-07-13  5:41       ` Christoph Hellwig
2021-07-13  8:16         ` Michael Schmitz
2021-07-13  8:54           ` Christoph Hellwig
2021-07-14 19:26             ` Michael Schmitz
2021-07-14 20:03               ` Andreas Schwab
2021-07-15  5:44                 ` Michael Schmitz
2021-07-16  2:03               ` Michael Schmitz
2021-07-17  5:41                 ` Michael Schmitz
2021-07-18  1:14                   ` Michael Schmitz
2021-07-21 17:05                     ` Christoph Hellwig
2021-07-21 19:20                       ` Michael Schmitz
2021-07-23  4:00                       ` Michael Schmitz
2021-07-23  5:11                         ` Christoph Hellwig
2021-07-25  7:36                           ` Michael Schmitz
2021-07-31 19:31                             ` Michael Schmitz
2021-08-06  3:10                               ` Michael Schmitz
2021-08-11  9:12                                 ` Christoph Hellwig
2021-08-12  3:37                                   ` Michael Schmitz
2021-08-15  7:42                                 ` Christoph Hellwig
2021-08-15 19:17                                   ` Michael Schmitz
2021-08-16  6:58                                     ` Christoph Hellwig
     [not found]                                       ` <23f745f2-9086-81fb-3d9e-40ea08a1923@linux-m68k.org>
     [not found]                                         ` <20210816075155.GA29187@lst.de>
     [not found]                                           ` <d407a2a1-738b-5cd5-c2ed-b7250c5da8ec@gmail.com>
     [not found]                                             ` <83571ae-10ae-2919-cde-b6b4a5769c9@linux-m68k.org>
     [not found]                                               ` <dc594142-e459-533e-cac2-c7a213cec464@gmail.com>
     [not found]                                                 ` <f4ab2dcb-6761-c60b-54ce-35d0d017d371@gmail.com>
     [not found]                                                   ` <d772d22e-a945-3e35-80a2-f4783893bea@linux-m68k.org>
     [not found]                                                     ` <b2c55280-657b-51c2-065c-3fc93db050b9@gmail.com>
     [not found]                                                       ` <d7b8f7eb-fc18-c8d-fe3e-dcdf19d3f4b@linux-m68k.org>
     [not found]                                                         ` <755e55ba-4ce2-b4e4-a628-5abc183a557a@linux-m68k.org>
     [not found]                                                           ` <b52a10fe-3e4b-5740-d3f8-52bce3bc988@linux-m68k.org>
     [not found]                                                             ` <31f27da7-be60-8eb-9834-748b653c2246@linux-m68k.org>
2021-09-07  3:28                                                               ` Mainline kernel crashes, was " Finn Thain
2021-09-07  5:53                                                                 ` Michael Schmitz
2021-09-07 23:50                                                                   ` Finn Thain
2021-09-08  8:54                                                                     ` Michael Schmitz
2021-09-09  9:40                                                                       ` Finn Thain
2021-09-09 23:29                                                                         ` Michael Schmitz
2021-09-09 22:51                                                                       ` Finn Thain
2021-09-10  0:03                                                                         ` Michael Schmitz
2021-09-12  0:51                                                                           ` Finn Thain
2021-09-12  3:55                                                                             ` Brad Boyer
2021-09-13  1:27                                                                             ` Finn Thain
2021-09-13  3:26                                                                               ` Michael Schmitz
2021-09-13  5:22                                                                                 ` Finn Thain
2021-09-13  7:20                                                                                   ` Michael Schmitz
2021-09-14  3:13                                                                                     ` Michael Schmitz
2021-09-15  1:38                                                                                     ` Finn Thain
2021-09-15  8:37                                                                                       ` Michael Schmitz
2021-09-16  9:04                                                                                         ` Finn Thain
2021-09-16 22:28                                                                                           ` Michael Schmitz
2021-09-21 21:14                                       ` Michael Schcmitz
2021-08-22 19:33                                         ` Michael Schmitz
2021-08-23  4:04                                           ` Michael Schmitz
2021-08-23 17:59                                           ` Linus Torvalds
2021-08-23 21:31                                             ` Michael Schmitz
2021-08-23 21:49                                               ` Linus Torvalds
2021-08-24  8:08                                                 ` Andreas Schwab
2021-08-24  8:44                                                 ` Michael Schmitz
2021-08-24  8:59                                                   ` Andreas Schwab
2021-08-25  7:51                                                     ` Michael Schmitz
2021-08-25  8:44                                                       ` Andreas Schwab
2021-08-25 22:59                                                         ` Michael Schmitz
2021-08-25 23:30                                                           ` Brad Boyer
2021-08-26  7:46                                                             ` Michael Schmitz
2021-08-26  7:45                                                           ` Andreas Schwab
2021-09-14  2:43                                             ` Michael Schmitz
2021-09-14 15:54                                               ` Linus Torvalds
2021-09-14 16:28                                                 ` Al Viro
2021-09-14 16:38                                                   ` Linus Torvalds
2021-09-15  1:06                                                     ` Al Viro
2021-07-12 19:04   ` Michael Schmitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.