linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/11] Rework READ_ONCE() to improve codegen
@ 2020-04-21 15:15 Will Deacon
  2020-04-21 15:15 ` [PATCH v4 01/11] compiler/gcc: Raise minimum GCC version for kernel builds to 4.8 Will Deacon
                   ` (11 more replies)
  0 siblings, 12 replies; 35+ messages in thread
From: Will Deacon @ 2020-04-21 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-team, Will Deacon, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

Hi,

It's me again. This is version four of the READ_ONCE() codegen improvement
patches that I previously posted here:

RFC: https://lore.kernel.org/lkml/20200110165636.28035-1-will@kernel.org
v2:  https://lore.kernel.org/lkml/20200123153341.19947-1-will@kernel.org
v3:  https://lore.kernel.org/lkml/20200415165218.20251-1-will@kernel.org/

Changes since v3 include:

  * Drop the patch warning for GCC versions prior to 4.8
  * Move the patch raising the minimum GCC version earlier in the series
  * Reintroduce the old READ_ONCE_NOCHECK behaviour so that it can continue
    to be used by stack unwinders

As before, this will break the build for sparc32 without the changes here:

  https://lore.kernel.org/lkml/20200414214011.2699-1-will@kernel.org

and boy are they proving to be popular (I'm interpreting the silence as
monumental joy).

Ta,

Will

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>

--->8

Will Deacon (11):
  compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
  netfilter: Avoid assigning 'const' pointer to non-const pointer
  net: tls: Avoid assigning 'const' pointer to non-const pointer
  fault_inject: Don't rely on "return value" from WRITE_ONCE()
  arm64: csum: Disable KASAN for do_csum()
  READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE()
  READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses
  READ_ONCE: Drop pointer qualifiers when reading from scalar types
  locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros
  arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release
    macros
  gcov: Remove old GCC 3.4 support

 Documentation/process/changes.rst |   2 +-
 arch/arm/crypto/Kconfig           |  12 +-
 arch/arm64/include/asm/barrier.h  |  16 +-
 arch/arm64/lib/csum.c             |  20 +-
 crypto/Kconfig                    |   1 -
 drivers/xen/time.c                |   2 +-
 include/asm-generic/barrier.h     |  16 +-
 include/linux/compiler-gcc.h      |   5 +-
 include/linux/compiler.h          | 145 ++++----
 include/linux/compiler_types.h    |  21 ++
 init/Kconfig                      |   1 -
 kernel/gcov/Kconfig               |  24 --
 kernel/gcov/Makefile              |   3 +-
 kernel/gcov/gcc_3_4.c             | 573 ------------------------------
 lib/fault-inject.c                |   4 +-
 net/netfilter/core.c              |   2 +-
 net/tls/tls_main.c                |   2 +-
 scripts/gcc-plugins/Kconfig       |   2 +-
 18 files changed, 132 insertions(+), 719 deletions(-)
 delete mode 100644 kernel/gcov/gcc_3_4.c

-- 
2.26.1.301.g55bc3eb7cb9-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4 01/11] compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
  2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
@ 2020-04-21 15:15 ` Will Deacon
  2020-04-21 17:15   ` Masahiro Yamada
  2020-04-21 15:15 ` [PATCH v4 02/11] netfilter: Avoid assigning 'const' pointer to non-const pointer Will Deacon
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Will Deacon @ 2020-04-21 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-team, Will Deacon, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

It is very rare to see versions of GCC prior to 4.8 being used to build
the mainline kernel. These old compilers are also know to have codegen
issues which can lead to silent miscompilation:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145

Raise the minimum GCC version for kernel build to 4.8 and remove some
tautological Kconfig dependencies as a consequence.

Cc: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 Documentation/process/changes.rst |  2 +-
 arch/arm/crypto/Kconfig           | 12 ++++++------
 crypto/Kconfig                    |  1 -
 include/linux/compiler-gcc.h      |  5 ++---
 init/Kconfig                      |  1 -
 scripts/gcc-plugins/Kconfig       |  2 +-
 6 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
index 91c5ff8e161e..5cfb54c2aaa6 100644
--- a/Documentation/process/changes.rst
+++ b/Documentation/process/changes.rst
@@ -29,7 +29,7 @@ you probably needn't concern yourself with pcmciautils.
 ====================== ===============  ========================================
         Program        Minimal version       Command to check the version
 ====================== ===============  ========================================
-GNU C                  4.6              gcc --version
+GNU C                  4.8              gcc --version
 GNU make               3.81             make --version
 binutils               2.23             ld -v
 flex                   2.5.35           flex --version
diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 2674de6ada1f..c9bf2df85cb9 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -30,7 +30,7 @@ config CRYPTO_SHA1_ARM_NEON
 
 config CRYPTO_SHA1_ARM_CE
 	tristate "SHA1 digest algorithm (ARM v8 Crypto Extensions)"
-	depends on KERNEL_MODE_NEON && (CC_IS_CLANG || GCC_VERSION >= 40800)
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_SHA1_ARM
 	select CRYPTO_HASH
 	help
@@ -39,7 +39,7 @@ config CRYPTO_SHA1_ARM_CE
 
 config CRYPTO_SHA2_ARM_CE
 	tristate "SHA-224/256 digest algorithm (ARM v8 Crypto Extensions)"
-	depends on KERNEL_MODE_NEON && (CC_IS_CLANG || GCC_VERSION >= 40800)
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_SHA256_ARM
 	select CRYPTO_HASH
 	help
@@ -96,7 +96,7 @@ config CRYPTO_AES_ARM_BS
 
 config CRYPTO_AES_ARM_CE
 	tristate "Accelerated AES using ARMv8 Crypto Extensions"
-	depends on KERNEL_MODE_NEON && (CC_IS_CLANG || GCC_VERSION >= 40800)
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_SKCIPHER
 	select CRYPTO_LIB_AES
 	select CRYPTO_SIMD
@@ -106,7 +106,7 @@ config CRYPTO_AES_ARM_CE
 
 config CRYPTO_GHASH_ARM_CE
 	tristate "PMULL-accelerated GHASH using NEON/ARMv8 Crypto Extensions"
-	depends on KERNEL_MODE_NEON && (CC_IS_CLANG || GCC_VERSION >= 40800)
+	depends on KERNEL_MODE_NEON
 	select CRYPTO_HASH
 	select CRYPTO_CRYPTD
 	select CRYPTO_GF128MUL
@@ -118,13 +118,13 @@ config CRYPTO_GHASH_ARM_CE
 
 config CRYPTO_CRCT10DIF_ARM_CE
 	tristate "CRCT10DIF digest algorithm using PMULL instructions"
-	depends on KERNEL_MODE_NEON && (CC_IS_CLANG || GCC_VERSION >= 40800)
+	depends on KERNEL_MODE_NEON
 	depends on CRC_T10DIF
 	select CRYPTO_HASH
 
 config CRYPTO_CRC32_ARM_CE
 	tristate "CRC32(C) digest algorithm using CRC and/or PMULL instructions"
-	depends on KERNEL_MODE_NEON && (CC_IS_CLANG || GCC_VERSION >= 40800)
+	depends on KERNEL_MODE_NEON
 	depends on CRC32
 	select CRYPTO_HASH
 
diff --git a/crypto/Kconfig b/crypto/Kconfig
index c24a47406f8f..34a8c5bfd062 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -316,7 +316,6 @@ config CRYPTO_AEGIS128
 config CRYPTO_AEGIS128_SIMD
 	bool "Support SIMD acceleration for AEGIS-128"
 	depends on CRYPTO_AEGIS128 && ((ARM || ARM64) && KERNEL_MODE_NEON)
-	depends on !ARM || CC_IS_CLANG || GCC_VERSION >= 40800
 	default y
 
 config CRYPTO_AEGIS128_AESNI_SSE2
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index d7ee4c6bad48..e2f725273261 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -10,7 +10,8 @@
 		     + __GNUC_MINOR__ * 100	\
 		     + __GNUC_PATCHLEVEL__)
 
-#if GCC_VERSION < 40600
+/* https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 */
+#if GCC_VERSION < 40800
 # error Sorry, your compiler is too old - please upgrade it.
 #endif
 
@@ -126,9 +127,7 @@
 #if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) && !defined(__CHECKER__)
 #define __HAVE_BUILTIN_BSWAP32__
 #define __HAVE_BUILTIN_BSWAP64__
-#if GCC_VERSION >= 40800
 #define __HAVE_BUILTIN_BSWAP16__
-#endif
 #endif /* CONFIG_ARCH_USE_BUILTIN_BSWAP && !__CHECKER__ */
 
 #if GCC_VERSION >= 70000
diff --git a/init/Kconfig b/init/Kconfig
index 9e22ee8fbd75..035d38a4f9ad 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1285,7 +1285,6 @@ config LD_DEAD_CODE_DATA_ELIMINATION
 	bool "Dead code and data elimination (EXPERIMENTAL)"
 	depends on HAVE_LD_DEAD_CODE_DATA_ELIMINATION
 	depends on EXPERT
-	depends on !(FUNCTION_TRACER && CC_IS_GCC && GCC_VERSION < 40800)
 	depends on $(cc-option,-ffunction-sections -fdata-sections)
 	depends on $(ld-option,--gc-sections)
 	help
diff --git a/scripts/gcc-plugins/Kconfig b/scripts/gcc-plugins/Kconfig
index 013ba3a57669..ce0b99fb5847 100644
--- a/scripts/gcc-plugins/Kconfig
+++ b/scripts/gcc-plugins/Kconfig
@@ -8,7 +8,7 @@ config HAVE_GCC_PLUGINS
 menuconfig GCC_PLUGINS
 	bool "GCC plugins"
 	depends on HAVE_GCC_PLUGINS
-	depends on CC_IS_GCC && GCC_VERSION >= 40800
+	depends on CC_IS_GCC
 	depends on $(success,$(srctree)/scripts/gcc-plugin.sh $(CC))
 	default y
 	help
-- 
2.26.1.301.g55bc3eb7cb9-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 02/11] netfilter: Avoid assigning 'const' pointer to non-const pointer
  2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
  2020-04-21 15:15 ` [PATCH v4 01/11] compiler/gcc: Raise minimum GCC version for kernel builds to 4.8 Will Deacon
@ 2020-04-21 15:15 ` Will Deacon
  2020-04-21 15:15 ` [PATCH v4 03/11] net: tls: " Will Deacon
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Will Deacon @ 2020-04-21 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-team, Will Deacon, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Florian Westphal, David S. Miller

nf_remove_net_hook() uses WRITE_ONCE() to assign a 'const' pointer to a
'non-const' pointer. Cleanups to the implementation of WRITE_ONCE() mean
that this will give rise to a compiler warning, just like a plain old
assignment would do:

  | In file included from ./include/linux/export.h:43,
  |                  from ./include/linux/linkage.h:7,
  |                  from ./include/linux/kernel.h:8,
  |                  from net/netfilter/core.c:9:
  | net/netfilter/core.c: In function ‘nf_remove_net_hook’:
  | ./include/linux/compiler.h:216:30: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
  |   *(volatile typeof(x) *)&(x) = (val);  \
  |                               ^
  | net/netfilter/core.c:379:3: note: in expansion of macro ‘WRITE_ONCE’
  |    WRITE_ONCE(orig_ops[i], &dummy_ops);
  |    ^~~~~~~~~~

Follow the pattern used elsewhere in this file and add a cast to 'void *'
to squash the warning.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: "David S. Miller" <davem@davemloft.net>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 net/netfilter/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 78f046ec506f..3ac7c8c1548d 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -376,7 +376,7 @@ static bool nf_remove_net_hook(struct nf_hook_entries *old,
 		if (orig_ops[i] != unreg)
 			continue;
 		WRITE_ONCE(old->hooks[i].hook, accept_all);
-		WRITE_ONCE(orig_ops[i], &dummy_ops);
+		WRITE_ONCE(orig_ops[i], (void *)&dummy_ops);
 		return true;
 	}
 
-- 
2.26.1.301.g55bc3eb7cb9-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 03/11] net: tls: Avoid assigning 'const' pointer to non-const pointer
  2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
  2020-04-21 15:15 ` [PATCH v4 01/11] compiler/gcc: Raise minimum GCC version for kernel builds to 4.8 Will Deacon
  2020-04-21 15:15 ` [PATCH v4 02/11] netfilter: Avoid assigning 'const' pointer to non-const pointer Will Deacon
@ 2020-04-21 15:15 ` Will Deacon
  2020-04-21 15:15 ` [PATCH v4 04/11] fault_inject: Don't rely on "return value" from WRITE_ONCE() Will Deacon
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Will Deacon @ 2020-04-21 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-team, Will Deacon, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers, Boris Pismenny, Aviad Yehezkel, John Fastabend,
	Daniel Borkmann

tls_build_proto() uses WRITE_ONCE() to assign a 'const' pointer to a
'non-const' pointer. Cleanups to the implementation of WRITE_ONCE() mean
that this will give rise to a compiler warning, just like a plain old
assignment would do:

  | net/tls/tls_main.c: In function ‘tls_build_proto’:
  | ./include/linux/compiler.h:229:30: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
  | net/tls/tls_main.c:640:4: note: in expansion of macro ‘smp_store_release’
  |   640 |    smp_store_release(&saved_tcpv6_prot, prot);
  |       |    ^~~~~~~~~~~~~~~~~

Drop the const qualifier from the local 'prot' variable, as it isn't
needed.

Cc: Boris Pismenny <borisp@mellanox.com>
Cc: Aviad Yehezkel <aviadye@mellanox.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Will Deacon <will@kernel.org>
---
 net/tls/tls_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 0e989005bdc2..ec10041c6b7d 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -629,7 +629,7 @@ struct tls_context *tls_ctx_create(struct sock *sk)
 static void tls_build_proto(struct sock *sk)
 {
 	int ip_ver = sk->sk_family == AF_INET6 ? TLSV6 : TLSV4;
-	const struct proto *prot = READ_ONCE(sk->sk_prot);
+	struct proto *prot = READ_ONCE(sk->sk_prot);
 
 	/* Build IPv6 TLS whenever the address of tcpv6 _prot changes */
 	if (ip_ver == TLSV6 &&
-- 
2.26.1.301.g55bc3eb7cb9-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 04/11] fault_inject: Don't rely on "return value" from WRITE_ONCE()
  2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
                   ` (2 preceding siblings ...)
  2020-04-21 15:15 ` [PATCH v4 03/11] net: tls: " Will Deacon
@ 2020-04-21 15:15 ` Will Deacon
  2020-04-21 15:15 ` [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum() Will Deacon
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Will Deacon @ 2020-04-21 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-team, Will Deacon, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers, Akinobu Mita

It's a bit weird that WRITE_ONCE() evaluates to the value it stores and
it's different to smp_store_release(), which can't be used this way.

In preparation for preventing this in WRITE_ONCE(), change the fault
injection code to use a local variable instead.

Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 lib/fault-inject.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/fault-inject.c b/lib/fault-inject.c
index 8186ca84910b..ce12621b4275 100644
--- a/lib/fault-inject.c
+++ b/lib/fault-inject.c
@@ -106,7 +106,9 @@ bool should_fail(struct fault_attr *attr, ssize_t size)
 		unsigned int fail_nth = READ_ONCE(current->fail_nth);
 
 		if (fail_nth) {
-			if (!WRITE_ONCE(current->fail_nth, fail_nth - 1))
+			fail_nth--;
+			WRITE_ONCE(current->fail_nth, fail_nth);
+			if (!fail_nth)
 				goto fail;
 
 			return false;
-- 
2.26.1.301.g55bc3eb7cb9-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum()
  2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
                   ` (3 preceding siblings ...)
  2020-04-21 15:15 ` [PATCH v4 04/11] fault_inject: Don't rely on "return value" from WRITE_ONCE() Will Deacon
@ 2020-04-21 15:15 ` Will Deacon
  2020-04-22  9:49   ` Mark Rutland
  2020-04-21 15:15 ` [PATCH v4 06/11] READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE() Will Deacon
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Will Deacon @ 2020-04-21 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-team, Will Deacon, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers, Robin Murphy

do_csum() over-reads the source buffer and therefore abuses
READ_ONCE_NOCHECK() to avoid tripping up KASAN. In preparation for
READ_ONCE_NOCHECK() becoming a macro, and therefore losing its
'__no_sanitize_address' annotation, just annotate do_csum() explicitly
and fall back to normal loads.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/lib/csum.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/lib/csum.c b/arch/arm64/lib/csum.c
index 60eccae2abad..78b87a64ca0a 100644
--- a/arch/arm64/lib/csum.c
+++ b/arch/arm64/lib/csum.c
@@ -14,7 +14,11 @@ static u64 accumulate(u64 sum, u64 data)
 	return tmp + (tmp >> 64);
 }
 
-unsigned int do_csum(const unsigned char *buff, int len)
+/*
+ * We over-read the buffer and this makes KASAN unhappy. Instead, disable
+ * instrumentation and call kasan explicitly.
+ */
+unsigned int __no_sanitize_address do_csum(const unsigned char *buff, int len)
 {
 	unsigned int offset, shift, sum;
 	const u64 *ptr;
@@ -42,7 +46,7 @@ unsigned int do_csum(const unsigned char *buff, int len)
 	 * odd/even alignment, and means we can ignore it until the very end.
 	 */
 	shift = offset * 8;
-	data = READ_ONCE_NOCHECK(*ptr++);
+	data = *ptr++;
 #ifdef __LITTLE_ENDIAN
 	data = (data >> shift) << shift;
 #else
@@ -58,10 +62,10 @@ unsigned int do_csum(const unsigned char *buff, int len)
 	while (unlikely(len > 64)) {
 		__uint128_t tmp1, tmp2, tmp3, tmp4;
 
-		tmp1 = READ_ONCE_NOCHECK(*(__uint128_t *)ptr);
-		tmp2 = READ_ONCE_NOCHECK(*(__uint128_t *)(ptr + 2));
-		tmp3 = READ_ONCE_NOCHECK(*(__uint128_t *)(ptr + 4));
-		tmp4 = READ_ONCE_NOCHECK(*(__uint128_t *)(ptr + 6));
+		tmp1 = *(__uint128_t *)ptr;
+		tmp2 = *(__uint128_t *)(ptr + 2);
+		tmp3 = *(__uint128_t *)(ptr + 4);
+		tmp4 = *(__uint128_t *)(ptr + 6);
 
 		len -= 64;
 		ptr += 8;
@@ -85,7 +89,7 @@ unsigned int do_csum(const unsigned char *buff, int len)
 		__uint128_t tmp;
 
 		sum64 = accumulate(sum64, data);
-		tmp = READ_ONCE_NOCHECK(*(__uint128_t *)ptr);
+		tmp = *(__uint128_t *)ptr;
 
 		len -= 16;
 		ptr += 2;
@@ -100,7 +104,7 @@ unsigned int do_csum(const unsigned char *buff, int len)
 	}
 	if (len > 0) {
 		sum64 = accumulate(sum64, data);
-		data = READ_ONCE_NOCHECK(*ptr);
+		data = *ptr;
 		len -= 8;
 	}
 	/*
-- 
2.26.1.301.g55bc3eb7cb9-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 06/11] READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE()
  2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
                   ` (4 preceding siblings ...)
  2020-04-21 15:15 ` [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum() Will Deacon
@ 2020-04-21 15:15 ` Will Deacon
  2020-04-22  9:51   ` Mark Rutland
  2020-04-21 15:15 ` [PATCH v4 07/11] READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses Will Deacon
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Will Deacon @ 2020-04-21 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-team, Will Deacon, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

The implementations of {READ,WRITE}_ONCE() suffer from a significant
amount of indirection and complexity due to a historic GCC bug:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145

which was originally worked around by 230fa253df63 ("kernel: Provide
READ_ONCE and ASSIGN_ONCE").

Since GCC 4.8 is fairly vintage at this point and we emit a warning if
we detect it during the build, return {READ,WRITE}_ONCE() to their former
glory with an implementation that is easier to understand and, crucially,
more amenable to optimisation. A side effect of this simplification is
that WRITE_ONCE() no longer returns a value, but nobody seems to be
relying on that and the new behaviour is aligned with smp_store_release().

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/compiler.h | 118 +++++++++++++--------------------------
 1 file changed, 39 insertions(+), 79 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 034b0a644efc..338111a448d0 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -177,60 +177,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
 # define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __LINE__)
 #endif
 
-#include <uapi/linux/types.h>
-
-#define __READ_ONCE_SIZE						\
-({									\
-	switch (size) {							\
-	case 1: *(__u8 *)res = *(volatile __u8 *)p; break;		\
-	case 2: *(__u16 *)res = *(volatile __u16 *)p; break;		\
-	case 4: *(__u32 *)res = *(volatile __u32 *)p; break;		\
-	case 8: *(__u64 *)res = *(volatile __u64 *)p; break;		\
-	default:							\
-		barrier();						\
-		__builtin_memcpy((void *)res, (const void *)p, size);	\
-		barrier();						\
-	}								\
-})
-
-static __always_inline
-void __read_once_size(const volatile void *p, void *res, int size)
-{
-	__READ_ONCE_SIZE;
-}
-
-#ifdef CONFIG_KASAN
-/*
- * We can't declare function 'inline' because __no_sanitize_address confilcts
- * with inlining. Attempt to inline it may cause a build failure.
- * 	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368
- * '__maybe_unused' allows us to avoid defined-but-not-used warnings.
- */
-# define __no_kasan_or_inline __no_sanitize_address notrace __maybe_unused
-#else
-# define __no_kasan_or_inline __always_inline
-#endif
-
-static __no_kasan_or_inline
-void __read_once_size_nocheck(const volatile void *p, void *res, int size)
-{
-	__READ_ONCE_SIZE;
-}
-
-static __always_inline void __write_once_size(volatile void *p, void *res, int size)
-{
-	switch (size) {
-	case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
-	case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
-	case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
-	case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
-	default:
-		barrier();
-		__builtin_memcpy((void *)p, (const void *)res, size);
-		barrier();
-	}
-}
-
 /*
  * Prevent the compiler from merging or refetching reads or writes. The
  * compiler is also forbidden from reordering successive instances of
@@ -240,11 +186,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
  * statements.
  *
  * These two macros will also work on aggregate data types like structs or
- * unions. If the size of the accessed data type exceeds the word size of
- * the machine (e.g., 32 bits or 64 bits) READ_ONCE() and WRITE_ONCE() will
- * fall back to memcpy(). There's at least two memcpy()s: one for the
- * __builtin_memcpy() and then one for the macro doing the copy of variable
- * - '__u' allocated on the stack.
+ * unions.
  *
  * Their two major use cases are: (1) Mediating communication between
  * process-level code and irq/NMI handlers, all running on the same CPU,
@@ -256,23 +198,49 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
 #include <asm/barrier.h>
 #include <linux/kasan-checks.h>
 
-#define __READ_ONCE(x, check)						\
+#define __READ_ONCE(x)	(*(volatile typeof(x) *)&(x))
+
+#define READ_ONCE(x)							\
 ({									\
-	union { typeof(x) __val; char __c[1]; } __u;			\
-	if (check)							\
-		__read_once_size(&(x), __u.__c, sizeof(x));		\
-	else								\
-		__read_once_size_nocheck(&(x), __u.__c, sizeof(x));	\
-	smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
-	__u.__val;							\
+	typeof(x) __x = __READ_ONCE(x);					\
+	smp_read_barrier_depends();					\
+	__x;								\
 })
-#define READ_ONCE(x) __READ_ONCE(x, 1)
+
+#define WRITE_ONCE(x, val)				\
+do {							\
+	*(volatile typeof(x) *)&(x) = (val);		\
+} while (0)
+
+#ifdef CONFIG_KASAN
+/*
+ * We can't declare function 'inline' because __no_sanitize_address conflicts
+ * with inlining. Attempt to inline it may cause a build failure.
+ *     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368
+ * '__maybe_unused' allows us to avoid defined-but-not-used warnings.
+ */
+# define __no_kasan_or_inline __no_sanitize_address notrace __maybe_unused
+#else
+# define __no_kasan_or_inline __always_inline
+#endif
+
+static __no_kasan_or_inline
+unsigned long __read_once_word_nocheck(const void *addr)
+{
+	return __READ_ONCE(*(unsigned long *)addr);
+}
 
 /*
- * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need
- * to hide memory access from KASAN.
+ * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need to load a
+ * word from memory atomically but without telling KASAN. This is usually
+ * used by unwinding code when walking the stack of a running process.
  */
-#define READ_ONCE_NOCHECK(x) __READ_ONCE(x, 0)
+#define READ_ONCE_NOCHECK(x)						\
+({									\
+	unsigned long __x = __read_once_word_nocheck(&(x));		\
+	smp_read_barrier_depends();					\
+	__x;								\
+})
 
 static __no_kasan_or_inline
 unsigned long read_word_at_a_time(const void *addr)
@@ -281,14 +249,6 @@ unsigned long read_word_at_a_time(const void *addr)
 	return *(unsigned long *)addr;
 }
 
-#define WRITE_ONCE(x, val) \
-({							\
-	union { typeof(x) __val; char __c[1]; } __u =	\
-		{ .__val = (__force typeof(x)) (val) }; \
-	__write_once_size(&(x), __u.__c, sizeof(x));	\
-	__u.__val;					\
-})
-
 #endif /* __KERNEL__ */
 
 /*
-- 
2.26.1.301.g55bc3eb7cb9-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 07/11] READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses
  2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
                   ` (5 preceding siblings ...)
  2020-04-21 15:15 ` [PATCH v4 06/11] READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE() Will Deacon
@ 2020-04-21 15:15 ` Will Deacon
  2020-04-24 16:31   ` Jann Horn
  2020-04-21 15:15 ` [PATCH v4 08/11] READ_ONCE: Drop pointer qualifiers when reading from scalar types Will Deacon
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Will Deacon @ 2020-04-21 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-team, Will Deacon, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

{READ,WRITE}_ONCE() cannot guarantee atomicity for arbitrary data sizes.
This can be surprising to callers that might incorrectly be expecting
atomicity for accesses to aggregate structures, although there are other
callers where tearing is actually permissable (e.g. if they are using
something akin to sequence locking to protect the access).

Linus sayeth:

  | We could also look at being stricter for the normal READ/WRITE_ONCE(),
  | and require that they are
  |
  | (a) regular integer types
  |
  | (b) fit in an atomic word
  |
  | We actually did (b) for a while, until we noticed that we do it on
  | loff_t's etc and relaxed the rules. But maybe we could have a
  | "non-atomic" version of READ/WRITE_ONCE() that is used for the
  | questionable cases?

The slight snag is that we also have to support 64-bit accesses on 32-bit
architectures, as these appear to be widespread and tend to work out ok
if either the architecture supports atomic 64-bit accesses (x86, armv7)
or if the variable being accesses represents a virtual address and
therefore only requires 32-bit atomicity in practice.

Take a step in that direction by introducing a variant of
'compiletime_assert_atomic_type()' and use it to check the pointer
argument to {READ,WRITE}_ONCE(). Expose __{READ,WRITE}_ONCE() variants
which are allowed to tear and convert the one broken caller over to the
new macros.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will@kernel.org>
---
 drivers/xen/time.c       |  2 +-
 include/linux/compiler.h | 33 ++++++++++++++++++++++++++++++---
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index 0968859c29d0..108edbcbc040 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -64,7 +64,7 @@ static void xen_get_runstate_snapshot_cpu_delta(
 	do {
 		state_time = get64(&state->state_entry_time);
 		rmb();	/* Hypervisor might update data. */
-		*res = READ_ONCE(*state);
+		*res = __READ_ONCE(*state);
 		rmb();	/* Hypervisor might update data. */
 	} while (get64(&state->state_entry_time) != state_time ||
 		 (state_time & XEN_RUNSTATE_UPDATE));
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 338111a448d0..50bb2461648f 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -198,20 +198,37 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
 #include <asm/barrier.h>
 #include <linux/kasan-checks.h>
 
-#define __READ_ONCE(x)	(*(volatile typeof(x) *)&(x))
+/*
+ * Use __READ_ONCE() instead of READ_ONCE() if you do not require any
+ * atomicity or dependency ordering guarantees. Note that this may result
+ * in tears!
+ */
+#define __READ_ONCE(x)	(*(const volatile typeof(x) *)&(x))
 
-#define READ_ONCE(x)							\
+#define __READ_ONCE_SCALAR(x)						\
 ({									\
 	typeof(x) __x = __READ_ONCE(x);					\
 	smp_read_barrier_depends();					\
 	__x;								\
 })
 
-#define WRITE_ONCE(x, val)				\
+#define READ_ONCE(x)							\
+({									\
+	compiletime_assert_rwonce_type(x);				\
+	__READ_ONCE_SCALAR(x);						\
+})
+
+#define __WRITE_ONCE(x, val)				\
 do {							\
 	*(volatile typeof(x) *)&(x) = (val);		\
 } while (0)
 
+#define WRITE_ONCE(x, val)				\
+do {							\
+	compiletime_assert_rwonce_type(x);		\
+	__WRITE_ONCE(x, val);				\
+} while (0)
+
 #ifdef CONFIG_KASAN
 /*
  * We can't declare function 'inline' because __no_sanitize_address conflicts
@@ -313,6 +330,16 @@ static inline void *offset_to_ptr(const int *off)
 	compiletime_assert(__native_word(t),				\
 		"Need native word sized stores/loads for atomicity.")
 
+/*
+ * Yes, this permits 64-bit accesses on 32-bit architectures. These will
+ * actually be atomic in many cases (namely x86), but for others we rely on
+ * the access being split into 2x32-bit accesses for a 32-bit quantity (e.g.
+ * a virtual address) and a strong prevailing wind.
+ */
+#define compiletime_assert_rwonce_type(t)					\
+	compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),	\
+		"Unsupported access size for {READ,WRITE}_ONCE().")
+
 /* &a[0] degrades to a pointer: a different type from an array */
 #define __must_be_array(a)	BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
 
-- 
2.26.1.301.g55bc3eb7cb9-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 08/11] READ_ONCE: Drop pointer qualifiers when reading from scalar types
  2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
                   ` (6 preceding siblings ...)
  2020-04-21 15:15 ` [PATCH v4 07/11] READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses Will Deacon
@ 2020-04-21 15:15 ` Will Deacon
  2020-04-22 10:25   ` Rasmus Villemoes
  2020-04-22 14:54   ` Will Deacon
  2020-04-21 15:15 ` [PATCH v4 09/11] locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros Will Deacon
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 35+ messages in thread
From: Will Deacon @ 2020-04-21 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-team, Will Deacon, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

Passing a volatile-qualified pointer to READ_ONCE() is an absolute
trainwreck for code generation: the use of 'typeof()' to define a
temporary variable inside the macro means that the final evaluation in
macro scope ends up forcing a read back from the stack. When stack
protector is enabled (the default for arm64, at least), this causes
the compiler to vomit up all sorts of junk.

Unfortunately, dropping pointer qualifiers inside the macro poses quite
a challenge, especially since the pointed-to type is permitted to be an
aggregate, and this is relied upon by mm/ code accessing things like
'pmd_t'. Based on numerous hacks and discussions on the mailing list,
this is the best I've managed to come up with.

Introduce '__unqual_scalar_typeof()' which takes an expression and, if
the expression is an optionally qualified 8, 16, 32 or 64-bit scalar
type, evaluates to the unqualified type. Other input types, including
aggregates, remain unchanged. Hopefully READ_ONCE() on volatile aggregate
pointers isn't something we do on a fast-path.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/linux/compiler.h       |  6 +++---
 include/linux/compiler_types.h | 21 +++++++++++++++++++++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 50bb2461648f..c363d8debc43 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -203,13 +203,13 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
  * atomicity or dependency ordering guarantees. Note that this may result
  * in tears!
  */
-#define __READ_ONCE(x)	(*(const volatile typeof(x) *)&(x))
+#define __READ_ONCE(x)	(*(const volatile __unqual_scalar_typeof(x) *)&(x))
 
 #define __READ_ONCE_SCALAR(x)						\
 ({									\
-	typeof(x) __x = __READ_ONCE(x);					\
+	__unqual_scalar_typeof(x) __x = __READ_ONCE(x);			\
 	smp_read_barrier_depends();					\
-	__x;								\
+	(typeof(x))__x;							\
 })
 
 #define READ_ONCE(x)							\
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index e970f97a7fcb..233066c92f6f 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -210,6 +210,27 @@ struct ftrace_likely_data {
 /* Are two types/vars the same type (ignoring qualifiers)? */
 #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
 
+/*
+ * __unqual_scalar_typeof(x) - Declare an unqualified scalar type, leaving
+ *			       non-scalar types unchanged.
+ *
+ * We build this out of a couple of helper macros in a vain attempt to
+ * help you keep your lunch down while reading it.
+ */
+#define __pick_scalar_type(x, type, otherwise)					\
+	__builtin_choose_expr(__same_type(x, type), (type)0, otherwise)
+
+#define __pick_integer_type(x, type, otherwise)					\
+	__pick_scalar_type(x, unsigned type,					\
+		__pick_scalar_type(x, signed type, otherwise))
+
+#define __unqual_scalar_typeof(x) typeof(					\
+	__pick_integer_type(x, char,						\
+		__pick_integer_type(x, short,					\
+			__pick_integer_type(x, int,				\
+				__pick_integer_type(x, long,			\
+					__pick_integer_type(x, long long, x))))))
+
 /* Is this type a native word size -- useful for atomic operations */
 #define __native_word(t) \
 	(sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
-- 
2.26.1.301.g55bc3eb7cb9-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 09/11] locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros
  2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
                   ` (7 preceding siblings ...)
  2020-04-21 15:15 ` [PATCH v4 08/11] READ_ONCE: Drop pointer qualifiers when reading from scalar types Will Deacon
@ 2020-04-21 15:15 ` Will Deacon
  2020-04-21 15:15 ` [PATCH v4 10/11] arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros Will Deacon
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Will Deacon @ 2020-04-21 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-team, Will Deacon, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

Passing volatile-qualified pointers to the asm-generic implementations of
the load-acquire macros results in a re-load from the stack due to the
temporary result variable inheriting the volatile semantics thanks to the
use of 'typeof()'.

Define these temporary variables using 'unqual_scalar_typeof' to drop
the volatile qualifier in the case that they are scalar types.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/asm-generic/barrier.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 85b28eb80b11..2eacaf7d62f6 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -128,10 +128,10 @@ do {									\
 #ifndef __smp_load_acquire
 #define __smp_load_acquire(p)						\
 ({									\
-	typeof(*p) ___p1 = READ_ONCE(*p);				\
+	__unqual_scalar_typeof(*p) ___p1 = READ_ONCE(*p);		\
 	compiletime_assert_atomic_type(*p);				\
 	__smp_mb();							\
-	___p1;								\
+	(typeof(*p))___p1;						\
 })
 #endif
 
@@ -183,10 +183,10 @@ do {									\
 #ifndef smp_load_acquire
 #define smp_load_acquire(p)						\
 ({									\
-	typeof(*p) ___p1 = READ_ONCE(*p);				\
+	__unqual_scalar_typeof(*p) ___p1 = READ_ONCE(*p);		\
 	compiletime_assert_atomic_type(*p);				\
 	barrier();							\
-	___p1;								\
+	(typeof(*p))___p1;						\
 })
 #endif
 
@@ -229,14 +229,14 @@ do {									\
 #ifndef smp_cond_load_relaxed
 #define smp_cond_load_relaxed(ptr, cond_expr) ({		\
 	typeof(ptr) __PTR = (ptr);				\
-	typeof(*ptr) VAL;					\
+	__unqual_scalar_typeof(*ptr) VAL;			\
 	for (;;) {						\
 		VAL = READ_ONCE(*__PTR);			\
 		if (cond_expr)					\
 			break;					\
 		cpu_relax();					\
 	}							\
-	VAL;							\
+	(typeof(*ptr))VAL;					\
 })
 #endif
 
@@ -250,10 +250,10 @@ do {									\
  */
 #ifndef smp_cond_load_acquire
 #define smp_cond_load_acquire(ptr, cond_expr) ({		\
-	typeof(*ptr) _val;					\
+	__unqual_scalar_typeof(*ptr) _val;			\
 	_val = smp_cond_load_relaxed(ptr, cond_expr);		\
 	smp_acquire__after_ctrl_dep();				\
-	_val;							\
+	(typeof(*ptr))_val;					\
 })
 #endif
 
-- 
2.26.1.301.g55bc3eb7cb9-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 10/11] arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros
  2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
                   ` (8 preceding siblings ...)
  2020-04-21 15:15 ` [PATCH v4 09/11] locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros Will Deacon
@ 2020-04-21 15:15 ` Will Deacon
  2020-04-21 15:15 ` [PATCH v4 11/11] gcov: Remove old GCC 3.4 support Will Deacon
  2020-04-21 18:42 ` [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Linus Torvalds
  11 siblings, 0 replies; 35+ messages in thread
From: Will Deacon @ 2020-04-21 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-team, Will Deacon, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

Passing volatile-qualified pointers to the arm64 implementations of the
load-acquire/store-release macros results in a re-load from the stack
and a bunch of associated stack-protector churn due to the temporary
result variable inheriting the volatile semantics thanks to the use of
'typeof()'.

Define these temporary variables using 'unqual_scalar_typeof' to drop
the volatile qualifier in the case that they are scalar types.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/barrier.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 7d9cc5ec4971..fb4c27506ef4 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -76,8 +76,8 @@ static inline unsigned long array_index_mask_nospec(unsigned long idx,
 #define __smp_store_release(p, v)					\
 do {									\
 	typeof(p) __p = (p);						\
-	union { typeof(*p) __val; char __c[1]; } __u =			\
-		{ .__val = (__force typeof(*p)) (v) };			\
+	union { __unqual_scalar_typeof(*p) __val; char __c[1]; } __u =	\
+		{ .__val = (__force __unqual_scalar_typeof(*p)) (v) };	\
 	compiletime_assert_atomic_type(*p);				\
 	kasan_check_write(__p, sizeof(*p));				\
 	switch (sizeof(*p)) {						\
@@ -110,7 +110,7 @@ do {									\
 
 #define __smp_load_acquire(p)						\
 ({									\
-	union { typeof(*p) __val; char __c[1]; } __u;			\
+	union { __unqual_scalar_typeof(*p) __val; char __c[1]; } __u;	\
 	typeof(p) __p = (p);						\
 	compiletime_assert_atomic_type(*p);				\
 	kasan_check_read(__p, sizeof(*p));				\
@@ -136,33 +136,33 @@ do {									\
 			: "Q" (*__p) : "memory");			\
 		break;							\
 	}								\
-	__u.__val;							\
+	(typeof(*p))__u.__val;						\
 })
 
 #define smp_cond_load_relaxed(ptr, cond_expr)				\
 ({									\
 	typeof(ptr) __PTR = (ptr);					\
-	typeof(*ptr) VAL;						\
+	__unqual_scalar_typeof(*ptr) VAL;				\
 	for (;;) {							\
 		VAL = READ_ONCE(*__PTR);				\
 		if (cond_expr)						\
 			break;						\
 		__cmpwait_relaxed(__PTR, VAL);				\
 	}								\
-	VAL;								\
+	(typeof(*ptr))VAL;						\
 })
 
 #define smp_cond_load_acquire(ptr, cond_expr)				\
 ({									\
 	typeof(ptr) __PTR = (ptr);					\
-	typeof(*ptr) VAL;						\
+	__unqual_scalar_typeof(*ptr) VAL;				\
 	for (;;) {							\
 		VAL = smp_load_acquire(__PTR);				\
 		if (cond_expr)						\
 			break;						\
 		__cmpwait_relaxed(__PTR, VAL);				\
 	}								\
-	VAL;								\
+	(typeof(*ptr))VAL;						\
 })
 
 #include <asm-generic/barrier.h>
-- 
2.26.1.301.g55bc3eb7cb9-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v4 11/11] gcov: Remove old GCC 3.4 support
  2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
                   ` (9 preceding siblings ...)
  2020-04-21 15:15 ` [PATCH v4 10/11] arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros Will Deacon
@ 2020-04-21 15:15 ` Will Deacon
  2020-04-21 17:19   ` Masahiro Yamada
  2020-04-21 18:42 ` [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Linus Torvalds
  11 siblings, 1 reply; 35+ messages in thread
From: Will Deacon @ 2020-04-21 15:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-team, Will Deacon, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

The kernel requires at least GCC 4.8 in order to build, and so there is
no need to cater for the pre-4.7 gcov format.

Remove the obsolete code.

Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 kernel/gcov/Kconfig   |  24 --
 kernel/gcov/Makefile  |   3 +-
 kernel/gcov/gcc_3_4.c | 573 ------------------------------------------
 3 files changed, 1 insertion(+), 599 deletions(-)
 delete mode 100644 kernel/gcov/gcc_3_4.c

diff --git a/kernel/gcov/Kconfig b/kernel/gcov/Kconfig
index 3941a9c48f83..feaad597b3f4 100644
--- a/kernel/gcov/Kconfig
+++ b/kernel/gcov/Kconfig
@@ -51,28 +51,4 @@ config GCOV_PROFILE_ALL
 	larger and run slower. Also be sure to exclude files from profiling
 	which are not linked to the kernel image to prevent linker errors.
 
-choice
-	prompt "Specify GCOV format"
-	depends on GCOV_KERNEL
-	depends on CC_IS_GCC
-	---help---
-	The gcov format is usually determined by the GCC version, and the
-	default is chosen according to your GCC version. However, there are
-	exceptions where format changes are integrated in lower-version GCCs.
-	In such a case, change this option to adjust the format used in the
-	kernel accordingly.
-
-config GCOV_FORMAT_3_4
-	bool "GCC 3.4 format"
-	depends on GCC_VERSION < 40700
-	---help---
-	Select this option to use the format defined by GCC 3.4.
-
-config GCOV_FORMAT_4_7
-	bool "GCC 4.7 format"
-	---help---
-	Select this option to use the format defined by GCC 4.7.
-
-endchoice
-
 endmenu
diff --git a/kernel/gcov/Makefile b/kernel/gcov/Makefile
index d66a74b0f100..16f8ecc7d882 100644
--- a/kernel/gcov/Makefile
+++ b/kernel/gcov/Makefile
@@ -2,6 +2,5 @@
 ccflags-y := -DSRCTREE='"$(srctree)"' -DOBJTREE='"$(objtree)"'
 
 obj-y := base.o fs.o
-obj-$(CONFIG_GCOV_FORMAT_3_4) += gcc_base.o gcc_3_4.o
-obj-$(CONFIG_GCOV_FORMAT_4_7) += gcc_base.o gcc_4_7.o
+obj-$(CONFIG_CC_IS_GCC) += gcc_base.o gcc_4_7.o
 obj-$(CONFIG_CC_IS_CLANG) += clang.o
diff --git a/kernel/gcov/gcc_3_4.c b/kernel/gcov/gcc_3_4.c
deleted file mode 100644
index acb83558e5df..000000000000
--- a/kernel/gcov/gcc_3_4.c
+++ /dev/null
@@ -1,573 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- *  This code provides functions to handle gcc's profiling data format
- *  introduced with gcc 3.4. Future versions of gcc may change the gcov
- *  format (as happened before), so all format-specific information needs
- *  to be kept modular and easily exchangeable.
- *
- *  This file is based on gcc-internal definitions. Functions and data
- *  structures are defined to be compatible with gcc counterparts.
- *  For a better understanding, refer to gcc source: gcc/gcov-io.h.
- *
- *    Copyright IBM Corp. 2009
- *    Author(s): Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
- *
- *    Uses gcc-internal data definitions.
- */
-
-#include <linux/errno.h>
-#include <linux/slab.h>
-#include <linux/string.h>
-#include <linux/seq_file.h>
-#include <linux/vmalloc.h>
-#include "gcov.h"
-
-#define GCOV_COUNTERS		5
-
-static struct gcov_info *gcov_info_head;
-
-/**
- * struct gcov_fn_info - profiling meta data per function
- * @ident: object file-unique function identifier
- * @checksum: function checksum
- * @n_ctrs: number of values per counter type belonging to this function
- *
- * This data is generated by gcc during compilation and doesn't change
- * at run-time.
- */
-struct gcov_fn_info {
-	unsigned int ident;
-	unsigned int checksum;
-	unsigned int n_ctrs[];
-};
-
-/**
- * struct gcov_ctr_info - profiling data per counter type
- * @num: number of counter values for this type
- * @values: array of counter values for this type
- * @merge: merge function for counter values of this type (unused)
- *
- * This data is generated by gcc during compilation and doesn't change
- * at run-time with the exception of the values array.
- */
-struct gcov_ctr_info {
-	unsigned int	num;
-	gcov_type	*values;
-	void		(*merge)(gcov_type *, unsigned int);
-};
-
-/**
- * struct gcov_info - profiling data per object file
- * @version: gcov version magic indicating the gcc version used for compilation
- * @next: list head for a singly-linked list
- * @stamp: time stamp
- * @filename: name of the associated gcov data file
- * @n_functions: number of instrumented functions
- * @functions: function data
- * @ctr_mask: mask specifying which counter types are active
- * @counts: counter data per counter type
- *
- * This data is generated by gcc during compilation and doesn't change
- * at run-time with the exception of the next pointer.
- */
-struct gcov_info {
-	unsigned int			version;
-	struct gcov_info		*next;
-	unsigned int			stamp;
-	const char			*filename;
-	unsigned int			n_functions;
-	const struct gcov_fn_info	*functions;
-	unsigned int			ctr_mask;
-	struct gcov_ctr_info		counts[];
-};
-
-/**
- * gcov_info_filename - return info filename
- * @info: profiling data set
- */
-const char *gcov_info_filename(struct gcov_info *info)
-{
-	return info->filename;
-}
-
-/**
- * gcov_info_version - return info version
- * @info: profiling data set
- */
-unsigned int gcov_info_version(struct gcov_info *info)
-{
-	return info->version;
-}
-
-/**
- * gcov_info_next - return next profiling data set
- * @info: profiling data set
- *
- * Returns next gcov_info following @info or first gcov_info in the chain if
- * @info is %NULL.
- */
-struct gcov_info *gcov_info_next(struct gcov_info *info)
-{
-	if (!info)
-		return gcov_info_head;
-
-	return info->next;
-}
-
-/**
- * gcov_info_link - link/add profiling data set to the list
- * @info: profiling data set
- */
-void gcov_info_link(struct gcov_info *info)
-{
-	info->next = gcov_info_head;
-	gcov_info_head = info;
-}
-
-/**
- * gcov_info_unlink - unlink/remove profiling data set from the list
- * @prev: previous profiling data set
- * @info: profiling data set
- */
-void gcov_info_unlink(struct gcov_info *prev, struct gcov_info *info)
-{
-	if (prev)
-		prev->next = info->next;
-	else
-		gcov_info_head = info->next;
-}
-
-/**
- * gcov_info_within_module - check if a profiling data set belongs to a module
- * @info: profiling data set
- * @mod: module
- *
- * Returns true if profiling data belongs module, false otherwise.
- */
-bool gcov_info_within_module(struct gcov_info *info, struct module *mod)
-{
-	return within_module((unsigned long)info, mod);
-}
-
-/* Symbolic links to be created for each profiling data file. */
-const struct gcov_link gcov_link[] = {
-	{ OBJ_TREE, "gcno" },	/* Link to .gcno file in $(objtree). */
-	{ 0, NULL},
-};
-
-/*
- * Determine whether a counter is active. Based on gcc magic. Doesn't change
- * at run-time.
- */
-static int counter_active(struct gcov_info *info, unsigned int type)
-{
-	return (1 << type) & info->ctr_mask;
-}
-
-/* Determine number of active counters. Based on gcc magic. */
-static unsigned int num_counter_active(struct gcov_info *info)
-{
-	unsigned int i;
-	unsigned int result = 0;
-
-	for (i = 0; i < GCOV_COUNTERS; i++) {
-		if (counter_active(info, i))
-			result++;
-	}
-	return result;
-}
-
-/**
- * gcov_info_reset - reset profiling data to zero
- * @info: profiling data set
- */
-void gcov_info_reset(struct gcov_info *info)
-{
-	unsigned int active = num_counter_active(info);
-	unsigned int i;
-
-	for (i = 0; i < active; i++) {
-		memset(info->counts[i].values, 0,
-		       info->counts[i].num * sizeof(gcov_type));
-	}
-}
-
-/**
- * gcov_info_is_compatible - check if profiling data can be added
- * @info1: first profiling data set
- * @info2: second profiling data set
- *
- * Returns non-zero if profiling data can be added, zero otherwise.
- */
-int gcov_info_is_compatible(struct gcov_info *info1, struct gcov_info *info2)
-{
-	return (info1->stamp == info2->stamp);
-}
-
-/**
- * gcov_info_add - add up profiling data
- * @dest: profiling data set to which data is added
- * @source: profiling data set which is added
- *
- * Adds profiling counts of @source to @dest.
- */
-void gcov_info_add(struct gcov_info *dest, struct gcov_info *source)
-{
-	unsigned int i;
-	unsigned int j;
-
-	for (i = 0; i < num_counter_active(dest); i++) {
-		for (j = 0; j < dest->counts[i].num; j++) {
-			dest->counts[i].values[j] +=
-				source->counts[i].values[j];
-		}
-	}
-}
-
-/* Get size of function info entry. Based on gcc magic. */
-static size_t get_fn_size(struct gcov_info *info)
-{
-	size_t size;
-
-	size = sizeof(struct gcov_fn_info) + num_counter_active(info) *
-	       sizeof(unsigned int);
-	if (__alignof__(struct gcov_fn_info) > sizeof(unsigned int))
-		size = ALIGN(size, __alignof__(struct gcov_fn_info));
-	return size;
-}
-
-/* Get address of function info entry. Based on gcc magic. */
-static struct gcov_fn_info *get_fn_info(struct gcov_info *info, unsigned int fn)
-{
-	return (struct gcov_fn_info *)
-		((char *) info->functions + fn * get_fn_size(info));
-}
-
-/**
- * gcov_info_dup - duplicate profiling data set
- * @info: profiling data set to duplicate
- *
- * Return newly allocated duplicate on success, %NULL on error.
- */
-struct gcov_info *gcov_info_dup(struct gcov_info *info)
-{
-	struct gcov_info *dup;
-	unsigned int i;
-	unsigned int active;
-
-	/* Duplicate gcov_info. */
-	active = num_counter_active(info);
-	dup = kzalloc(struct_size(dup, counts, active), GFP_KERNEL);
-	if (!dup)
-		return NULL;
-	dup->version		= info->version;
-	dup->stamp		= info->stamp;
-	dup->n_functions	= info->n_functions;
-	dup->ctr_mask		= info->ctr_mask;
-	/* Duplicate filename. */
-	dup->filename		= kstrdup(info->filename, GFP_KERNEL);
-	if (!dup->filename)
-		goto err_free;
-	/* Duplicate table of functions. */
-	dup->functions = kmemdup(info->functions, info->n_functions *
-				 get_fn_size(info), GFP_KERNEL);
-	if (!dup->functions)
-		goto err_free;
-	/* Duplicate counter arrays. */
-	for (i = 0; i < active ; i++) {
-		struct gcov_ctr_info *ctr = &info->counts[i];
-		size_t size = ctr->num * sizeof(gcov_type);
-
-		dup->counts[i].num = ctr->num;
-		dup->counts[i].merge = ctr->merge;
-		dup->counts[i].values = vmalloc(size);
-		if (!dup->counts[i].values)
-			goto err_free;
-		memcpy(dup->counts[i].values, ctr->values, size);
-	}
-	return dup;
-
-err_free:
-	gcov_info_free(dup);
-	return NULL;
-}
-
-/**
- * gcov_info_free - release memory for profiling data set duplicate
- * @info: profiling data set duplicate to free
- */
-void gcov_info_free(struct gcov_info *info)
-{
-	unsigned int active = num_counter_active(info);
-	unsigned int i;
-
-	for (i = 0; i < active ; i++)
-		vfree(info->counts[i].values);
-	kfree(info->functions);
-	kfree(info->filename);
-	kfree(info);
-}
-
-/**
- * struct type_info - iterator helper array
- * @ctr_type: counter type
- * @offset: index of the first value of the current function for this type
- *
- * This array is needed to convert the in-memory data format into the in-file
- * data format:
- *
- * In-memory:
- *   for each counter type
- *     for each function
- *       values
- *
- * In-file:
- *   for each function
- *     for each counter type
- *       values
- *
- * See gcc source gcc/gcov-io.h for more information on data organization.
- */
-struct type_info {
-	int ctr_type;
-	unsigned int offset;
-};
-
-/**
- * struct gcov_iterator - specifies current file position in logical records
- * @info: associated profiling data
- * @record: record type
- * @function: function number
- * @type: counter type
- * @count: index into values array
- * @num_types: number of counter types
- * @type_info: helper array to get values-array offset for current function
- */
-struct gcov_iterator {
-	struct gcov_info *info;
-
-	int record;
-	unsigned int function;
-	unsigned int type;
-	unsigned int count;
-
-	int num_types;
-	struct type_info type_info[];
-};
-
-static struct gcov_fn_info *get_func(struct gcov_iterator *iter)
-{
-	return get_fn_info(iter->info, iter->function);
-}
-
-static struct type_info *get_type(struct gcov_iterator *iter)
-{
-	return &iter->type_info[iter->type];
-}
-
-/**
- * gcov_iter_new - allocate and initialize profiling data iterator
- * @info: profiling data set to be iterated
- *
- * Return file iterator on success, %NULL otherwise.
- */
-struct gcov_iterator *gcov_iter_new(struct gcov_info *info)
-{
-	struct gcov_iterator *iter;
-
-	iter = kzalloc(struct_size(iter, type_info, num_counter_active(info)),
-		       GFP_KERNEL);
-	if (iter)
-		iter->info = info;
-
-	return iter;
-}
-
-/**
- * gcov_iter_free - release memory for iterator
- * @iter: file iterator to free
- */
-void gcov_iter_free(struct gcov_iterator *iter)
-{
-	kfree(iter);
-}
-
-/**
- * gcov_iter_get_info - return profiling data set for given file iterator
- * @iter: file iterator
- */
-struct gcov_info *gcov_iter_get_info(struct gcov_iterator *iter)
-{
-	return iter->info;
-}
-
-/**
- * gcov_iter_start - reset file iterator to starting position
- * @iter: file iterator
- */
-void gcov_iter_start(struct gcov_iterator *iter)
-{
-	int i;
-
-	iter->record = 0;
-	iter->function = 0;
-	iter->type = 0;
-	iter->count = 0;
-	iter->num_types = 0;
-	for (i = 0; i < GCOV_COUNTERS; i++) {
-		if (counter_active(iter->info, i)) {
-			iter->type_info[iter->num_types].ctr_type = i;
-			iter->type_info[iter->num_types++].offset = 0;
-		}
-	}
-}
-
-/* Mapping of logical record number to actual file content. */
-#define RECORD_FILE_MAGIC	0
-#define RECORD_GCOV_VERSION	1
-#define RECORD_TIME_STAMP	2
-#define RECORD_FUNCTION_TAG	3
-#define RECORD_FUNCTON_TAG_LEN	4
-#define RECORD_FUNCTION_IDENT	5
-#define RECORD_FUNCTION_CHECK	6
-#define RECORD_COUNT_TAG	7
-#define RECORD_COUNT_LEN	8
-#define RECORD_COUNT		9
-
-/**
- * gcov_iter_next - advance file iterator to next logical record
- * @iter: file iterator
- *
- * Return zero if new position is valid, non-zero if iterator has reached end.
- */
-int gcov_iter_next(struct gcov_iterator *iter)
-{
-	switch (iter->record) {
-	case RECORD_FILE_MAGIC:
-	case RECORD_GCOV_VERSION:
-	case RECORD_FUNCTION_TAG:
-	case RECORD_FUNCTON_TAG_LEN:
-	case RECORD_FUNCTION_IDENT:
-	case RECORD_COUNT_TAG:
-		/* Advance to next record */
-		iter->record++;
-		break;
-	case RECORD_COUNT:
-		/* Advance to next count */
-		iter->count++;
-		/* fall through */
-	case RECORD_COUNT_LEN:
-		if (iter->count < get_func(iter)->n_ctrs[iter->type]) {
-			iter->record = 9;
-			break;
-		}
-		/* Advance to next counter type */
-		get_type(iter)->offset += iter->count;
-		iter->count = 0;
-		iter->type++;
-		/* fall through */
-	case RECORD_FUNCTION_CHECK:
-		if (iter->type < iter->num_types) {
-			iter->record = 7;
-			break;
-		}
-		/* Advance to next function */
-		iter->type = 0;
-		iter->function++;
-		/* fall through */
-	case RECORD_TIME_STAMP:
-		if (iter->function < iter->info->n_functions)
-			iter->record = 3;
-		else
-			iter->record = -1;
-		break;
-	}
-	/* Check for EOF. */
-	if (iter->record == -1)
-		return -EINVAL;
-	else
-		return 0;
-}
-
-/**
- * seq_write_gcov_u32 - write 32 bit number in gcov format to seq_file
- * @seq: seq_file handle
- * @v: value to be stored
- *
- * Number format defined by gcc: numbers are recorded in the 32 bit
- * unsigned binary form of the endianness of the machine generating the
- * file.
- */
-static int seq_write_gcov_u32(struct seq_file *seq, u32 v)
-{
-	return seq_write(seq, &v, sizeof(v));
-}
-
-/**
- * seq_write_gcov_u64 - write 64 bit number in gcov format to seq_file
- * @seq: seq_file handle
- * @v: value to be stored
- *
- * Number format defined by gcc: numbers are recorded in the 32 bit
- * unsigned binary form of the endianness of the machine generating the
- * file. 64 bit numbers are stored as two 32 bit numbers, the low part
- * first.
- */
-static int seq_write_gcov_u64(struct seq_file *seq, u64 v)
-{
-	u32 data[2];
-
-	data[0] = (v & 0xffffffffUL);
-	data[1] = (v >> 32);
-	return seq_write(seq, data, sizeof(data));
-}
-
-/**
- * gcov_iter_write - write data for current pos to seq_file
- * @iter: file iterator
- * @seq: seq_file handle
- *
- * Return zero on success, non-zero otherwise.
- */
-int gcov_iter_write(struct gcov_iterator *iter, struct seq_file *seq)
-{
-	int rc = -EINVAL;
-
-	switch (iter->record) {
-	case RECORD_FILE_MAGIC:
-		rc = seq_write_gcov_u32(seq, GCOV_DATA_MAGIC);
-		break;
-	case RECORD_GCOV_VERSION:
-		rc = seq_write_gcov_u32(seq, iter->info->version);
-		break;
-	case RECORD_TIME_STAMP:
-		rc = seq_write_gcov_u32(seq, iter->info->stamp);
-		break;
-	case RECORD_FUNCTION_TAG:
-		rc = seq_write_gcov_u32(seq, GCOV_TAG_FUNCTION);
-		break;
-	case RECORD_FUNCTON_TAG_LEN:
-		rc = seq_write_gcov_u32(seq, 2);
-		break;
-	case RECORD_FUNCTION_IDENT:
-		rc = seq_write_gcov_u32(seq, get_func(iter)->ident);
-		break;
-	case RECORD_FUNCTION_CHECK:
-		rc = seq_write_gcov_u32(seq, get_func(iter)->checksum);
-		break;
-	case RECORD_COUNT_TAG:
-		rc = seq_write_gcov_u32(seq,
-			GCOV_TAG_FOR_COUNTER(get_type(iter)->ctr_type));
-		break;
-	case RECORD_COUNT_LEN:
-		rc = seq_write_gcov_u32(seq,
-				get_func(iter)->n_ctrs[iter->type] * 2);
-		break;
-	case RECORD_COUNT:
-		rc = seq_write_gcov_u64(seq,
-			iter->info->counts[iter->type].
-				values[iter->count + get_type(iter)->offset]);
-		break;
-	}
-	return rc;
-}
-- 
2.26.1.301.g55bc3eb7cb9-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 01/11] compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
  2020-04-21 15:15 ` [PATCH v4 01/11] compiler/gcc: Raise minimum GCC version for kernel builds to 4.8 Will Deacon
@ 2020-04-21 17:15   ` Masahiro Yamada
  0 siblings, 0 replies; 35+ messages in thread
From: Masahiro Yamada @ 2020-04-21 17:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linux Kernel Mailing List, linux-arch, Cc: Android Kernel,
	Mark Rutland, Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Nick Desaulniers

On Wed, Apr 22, 2020 at 12:15 AM Will Deacon <will@kernel.org> wrote:
>
> It is very rare to see versions of GCC prior to 4.8 being used to build
> the mainline kernel. These old compilers are also know to have codegen

"know" -> "known"


> issues which can lead to silent miscompilation:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145
>
> Raise the minimum GCC version for kernel build to 4.8 and remove some
> tautological Kconfig dependencies as a consequence.
>
> Cc: Masahiro Yamada <masahiroy@kernel.org>


Maybe you can also clean up mm/migrate.c
because GCC_VERSION >= 40700 is always met.
It is up to you if you want to send v5.

Please replace CC with
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>

Thank you!

> Acked-by: Arnd Bergmann <arnd@arndb.de>
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 11/11] gcov: Remove old GCC 3.4 support
  2020-04-21 15:15 ` [PATCH v4 11/11] gcov: Remove old GCC 3.4 support Will Deacon
@ 2020-04-21 17:19   ` Masahiro Yamada
  0 siblings, 0 replies; 35+ messages in thread
From: Masahiro Yamada @ 2020-04-21 17:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linux Kernel Mailing List, linux-arch, Cc: Android Kernel,
	Mark Rutland, Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Nick Desaulniers

On Wed, Apr 22, 2020 at 12:16 AM Will Deacon <will@kernel.org> wrote:
>
> The kernel requires at least GCC 4.8 in order to build, and so there is
> no need to cater for the pre-4.7 gcov format.
>
> Remove the obsolete code.
>
> Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>


Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>

Thanks!


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Rework READ_ONCE() to improve codegen
  2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
                   ` (10 preceding siblings ...)
  2020-04-21 15:15 ` [PATCH v4 11/11] gcov: Remove old GCC 3.4 support Will Deacon
@ 2020-04-21 18:42 ` Linus Torvalds
  2020-04-22  8:18   ` Will Deacon
  11 siblings, 1 reply; 35+ messages in thread
From: Linus Torvalds @ 2020-04-21 18:42 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linux Kernel Mailing List, linux-arch, Android Kernel Team,
	Mark Rutland, Michael Ellerman, Peter Zijlstra,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

On Tue, Apr 21, 2020 at 8:15 AM Will Deacon <will@kernel.org> wrote:
>
> It's me again. This is version four of the READ_ONCE() codegen improvement
> patches [...]

Let's just plan on biting the bullet and do this for 5.8. I'm assuming
that I'll juet get a pull request from you?

> (I'm interpreting the silence as monumental joy)

By now I think we can take that for granted.

Because "monumental joy" is certainly exactly what I felt re-reading
that "unqualified scalar type" macro.

Or maybe it was just my breakfast trying to say "Hi!".

                 Linus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Rework READ_ONCE() to improve codegen
  2020-04-21 18:42 ` [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Linus Torvalds
@ 2020-04-22  8:18   ` Will Deacon
  2020-04-22 11:37     ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Will Deacon @ 2020-04-22  8:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, linux-arch, Android Kernel Team,
	Mark Rutland, Michael Ellerman, Peter Zijlstra,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

On Tue, Apr 21, 2020 at 11:42:56AM -0700, Linus Torvalds wrote:
> On Tue, Apr 21, 2020 at 8:15 AM Will Deacon <will@kernel.org> wrote:
> >
> > It's me again. This is version four of the READ_ONCE() codegen improvement
> > patches [...]
> 
> Let's just plan on biting the bullet and do this for 5.8. I'm assuming
> that I'll juet get a pull request from you?

Sure thing, thanks. I'll get it into -next along with the arm64 bits for
5.8, but I'll send it as a separate pull when the time comes. I'll also
include the sparc32 changes because otherwise the build falls apart and
we'll get an army of angry robots yelling at us (they seem to form the
majority of the active sparc32 user base afaict).

> > (I'm interpreting the silence as monumental joy)
> 
> By now I think we can take that for granted.
> 
> Because "monumental joy" is certainly exactly what I felt re-reading
> that "unqualified scalar type" macro.
> 
> Or maybe it was just my breakfast trying to say "Hi!".

Haha! It's definitely best viewed on an empty stomach, but the comment
does give you ample warning.

Will

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum()
  2020-04-21 15:15 ` [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum() Will Deacon
@ 2020-04-22  9:49   ` Mark Rutland
  2020-04-22 10:41     ` Will Deacon
  0 siblings, 1 reply; 35+ messages in thread
From: Mark Rutland @ 2020-04-22  9:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arch, kernel-team, Michael Ellerman,
	Peter Zijlstra, Linus Torvalds, Segher Boessenkool,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers,
	Robin Murphy

On Tue, Apr 21, 2020 at 04:15:31PM +0100, Will Deacon wrote:
> do_csum() over-reads the source buffer and therefore abuses
> READ_ONCE_NOCHECK() to avoid tripping up KASAN. In preparation for
> READ_ONCE_NOCHECK() becoming a macro, and therefore losing its
> '__no_sanitize_address' annotation, just annotate do_csum() explicitly
> and fall back to normal loads.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Will Deacon <will@kernel.org>

From a functional perspective:

Acked-by: Mark Rutland <mark.rutland@arm.com>

I know that Robin had a concern w.r.t. how this would affect the
codegen, but I think we can follow that up after the series as a whole
is merged.

Thanks,
Mark.


> ---
>  arch/arm64/lib/csum.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/lib/csum.c b/arch/arm64/lib/csum.c
> index 60eccae2abad..78b87a64ca0a 100644
> --- a/arch/arm64/lib/csum.c
> +++ b/arch/arm64/lib/csum.c
> @@ -14,7 +14,11 @@ static u64 accumulate(u64 sum, u64 data)
>  	return tmp + (tmp >> 64);
>  }
>  
> -unsigned int do_csum(const unsigned char *buff, int len)
> +/*
> + * We over-read the buffer and this makes KASAN unhappy. Instead, disable
> + * instrumentation and call kasan explicitly.
> + */
> +unsigned int __no_sanitize_address do_csum(const unsigned char *buff, int len)
>  {
>  	unsigned int offset, shift, sum;
>  	const u64 *ptr;
> @@ -42,7 +46,7 @@ unsigned int do_csum(const unsigned char *buff, int len)
>  	 * odd/even alignment, and means we can ignore it until the very end.
>  	 */
>  	shift = offset * 8;
> -	data = READ_ONCE_NOCHECK(*ptr++);
> +	data = *ptr++;
>  #ifdef __LITTLE_ENDIAN
>  	data = (data >> shift) << shift;
>  #else
> @@ -58,10 +62,10 @@ unsigned int do_csum(const unsigned char *buff, int len)
>  	while (unlikely(len > 64)) {
>  		__uint128_t tmp1, tmp2, tmp3, tmp4;
>  
> -		tmp1 = READ_ONCE_NOCHECK(*(__uint128_t *)ptr);
> -		tmp2 = READ_ONCE_NOCHECK(*(__uint128_t *)(ptr + 2));
> -		tmp3 = READ_ONCE_NOCHECK(*(__uint128_t *)(ptr + 4));
> -		tmp4 = READ_ONCE_NOCHECK(*(__uint128_t *)(ptr + 6));
> +		tmp1 = *(__uint128_t *)ptr;
> +		tmp2 = *(__uint128_t *)(ptr + 2);
> +		tmp3 = *(__uint128_t *)(ptr + 4);
> +		tmp4 = *(__uint128_t *)(ptr + 6);
>  
>  		len -= 64;
>  		ptr += 8;
> @@ -85,7 +89,7 @@ unsigned int do_csum(const unsigned char *buff, int len)
>  		__uint128_t tmp;
>  
>  		sum64 = accumulate(sum64, data);
> -		tmp = READ_ONCE_NOCHECK(*(__uint128_t *)ptr);
> +		tmp = *(__uint128_t *)ptr;
>  
>  		len -= 16;
>  		ptr += 2;
> @@ -100,7 +104,7 @@ unsigned int do_csum(const unsigned char *buff, int len)
>  	}
>  	if (len > 0) {
>  		sum64 = accumulate(sum64, data);
> -		data = READ_ONCE_NOCHECK(*ptr);
> +		data = *ptr;
>  		len -= 8;
>  	}
>  	/*
> -- 
> 2.26.1.301.g55bc3eb7cb9-goog
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 06/11] READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE()
  2020-04-21 15:15 ` [PATCH v4 06/11] READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE() Will Deacon
@ 2020-04-22  9:51   ` Mark Rutland
  0 siblings, 0 replies; 35+ messages in thread
From: Mark Rutland @ 2020-04-22  9:51 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arch, kernel-team, Michael Ellerman,
	Peter Zijlstra, Linus Torvalds, Segher Boessenkool,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers

On Tue, Apr 21, 2020 at 04:15:32PM +0100, Will Deacon wrote:
> The implementations of {READ,WRITE}_ONCE() suffer from a significant
> amount of indirection and complexity due to a historic GCC bug:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145
> 
> which was originally worked around by 230fa253df63 ("kernel: Provide
> READ_ONCE and ASSIGN_ONCE").
> 
> Since GCC 4.8 is fairly vintage at this point and we emit a warning if
> we detect it during the build, return {READ,WRITE}_ONCE() to their former
> glory with an implementation that is easier to understand and, crucially,
> more amenable to optimisation. A side effect of this simplification is
> that WRITE_ONCE() no longer returns a value, but nobody seems to be
> relying on that and the new behaviour is aligned with smp_store_release().
> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Will Deacon <will@kernel.org>

The nocheck bits look fine to me now, so FWIW:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  include/linux/compiler.h | 118 +++++++++++++--------------------------
>  1 file changed, 39 insertions(+), 79 deletions(-)
> 
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 034b0a644efc..338111a448d0 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -177,60 +177,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  # define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __LINE__)
>  #endif
>  
> -#include <uapi/linux/types.h>
> -
> -#define __READ_ONCE_SIZE						\
> -({									\
> -	switch (size) {							\
> -	case 1: *(__u8 *)res = *(volatile __u8 *)p; break;		\
> -	case 2: *(__u16 *)res = *(volatile __u16 *)p; break;		\
> -	case 4: *(__u32 *)res = *(volatile __u32 *)p; break;		\
> -	case 8: *(__u64 *)res = *(volatile __u64 *)p; break;		\
> -	default:							\
> -		barrier();						\
> -		__builtin_memcpy((void *)res, (const void *)p, size);	\
> -		barrier();						\
> -	}								\
> -})
> -
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -	__READ_ONCE_SIZE;
> -}
> -
> -#ifdef CONFIG_KASAN
> -/*
> - * We can't declare function 'inline' because __no_sanitize_address confilcts
> - * with inlining. Attempt to inline it may cause a build failure.
> - * 	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368
> - * '__maybe_unused' allows us to avoid defined-but-not-used warnings.
> - */
> -# define __no_kasan_or_inline __no_sanitize_address notrace __maybe_unused
> -#else
> -# define __no_kasan_or_inline __always_inline
> -#endif
> -
> -static __no_kasan_or_inline
> -void __read_once_size_nocheck(const volatile void *p, void *res, int size)
> -{
> -	__READ_ONCE_SIZE;
> -}
> -
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> -{
> -	switch (size) {
> -	case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
> -	case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> -	case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
> -	case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
> -	default:
> -		barrier();
> -		__builtin_memcpy((void *)p, (const void *)res, size);
> -		barrier();
> -	}
> -}
> -
>  /*
>   * Prevent the compiler from merging or refetching reads or writes. The
>   * compiler is also forbidden from reordering successive instances of
> @@ -240,11 +186,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
>   * statements.
>   *
>   * These two macros will also work on aggregate data types like structs or
> - * unions. If the size of the accessed data type exceeds the word size of
> - * the machine (e.g., 32 bits or 64 bits) READ_ONCE() and WRITE_ONCE() will
> - * fall back to memcpy(). There's at least two memcpy()s: one for the
> - * __builtin_memcpy() and then one for the macro doing the copy of variable
> - * - '__u' allocated on the stack.
> + * unions.
>   *
>   * Their two major use cases are: (1) Mediating communication between
>   * process-level code and irq/NMI handlers, all running on the same CPU,
> @@ -256,23 +198,49 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
>  #include <asm/barrier.h>
>  #include <linux/kasan-checks.h>
>  
> -#define __READ_ONCE(x, check)						\
> +#define __READ_ONCE(x)	(*(volatile typeof(x) *)&(x))
> +
> +#define READ_ONCE(x)							\
>  ({									\
> -	union { typeof(x) __val; char __c[1]; } __u;			\
> -	if (check)							\
> -		__read_once_size(&(x), __u.__c, sizeof(x));		\
> -	else								\
> -		__read_once_size_nocheck(&(x), __u.__c, sizeof(x));	\
> -	smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
> -	__u.__val;							\
> +	typeof(x) __x = __READ_ONCE(x);					\
> +	smp_read_barrier_depends();					\
> +	__x;								\
>  })
> -#define READ_ONCE(x) __READ_ONCE(x, 1)
> +
> +#define WRITE_ONCE(x, val)				\
> +do {							\
> +	*(volatile typeof(x) *)&(x) = (val);		\
> +} while (0)
> +
> +#ifdef CONFIG_KASAN
> +/*
> + * We can't declare function 'inline' because __no_sanitize_address conflicts
> + * with inlining. Attempt to inline it may cause a build failure.
> + *     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368
> + * '__maybe_unused' allows us to avoid defined-but-not-used warnings.
> + */
> +# define __no_kasan_or_inline __no_sanitize_address notrace __maybe_unused
> +#else
> +# define __no_kasan_or_inline __always_inline
> +#endif
> +
> +static __no_kasan_or_inline
> +unsigned long __read_once_word_nocheck(const void *addr)
> +{
> +	return __READ_ONCE(*(unsigned long *)addr);
> +}
>  
>  /*
> - * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need
> - * to hide memory access from KASAN.
> + * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need to load a
> + * word from memory atomically but without telling KASAN. This is usually
> + * used by unwinding code when walking the stack of a running process.
>   */
> -#define READ_ONCE_NOCHECK(x) __READ_ONCE(x, 0)
> +#define READ_ONCE_NOCHECK(x)						\
> +({									\
> +	unsigned long __x = __read_once_word_nocheck(&(x));		\
> +	smp_read_barrier_depends();					\
> +	__x;								\
> +})
>  
>  static __no_kasan_or_inline
>  unsigned long read_word_at_a_time(const void *addr)
> @@ -281,14 +249,6 @@ unsigned long read_word_at_a_time(const void *addr)
>  	return *(unsigned long *)addr;
>  }
>  
> -#define WRITE_ONCE(x, val) \
> -({							\
> -	union { typeof(x) __val; char __c[1]; } __u =	\
> -		{ .__val = (__force typeof(x)) (val) }; \
> -	__write_once_size(&(x), __u.__c, sizeof(x));	\
> -	__u.__val;					\
> -})
> -
>  #endif /* __KERNEL__ */
>  
>  /*
> -- 
> 2.26.1.301.g55bc3eb7cb9-goog
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 08/11] READ_ONCE: Drop pointer qualifiers when reading from scalar types
  2020-04-21 15:15 ` [PATCH v4 08/11] READ_ONCE: Drop pointer qualifiers when reading from scalar types Will Deacon
@ 2020-04-22 10:25   ` Rasmus Villemoes
  2020-04-22 11:48     ` Segher Boessenkool
  2020-04-22 14:54   ` Will Deacon
  1 sibling, 1 reply; 35+ messages in thread
From: Rasmus Villemoes @ 2020-04-22 10:25 UTC (permalink / raw)
  To: Will Deacon, linux-kernel
  Cc: linux-arch, kernel-team, Mark Rutland, Michael Ellerman,
	Peter Zijlstra, Linus Torvalds, Segher Boessenkool,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers

On 21/04/2020 17.15, Will Deacon wrote:
> Passing a volatile-qualified pointer to READ_ONCE() is an absolute
> trainwreck for code generation: the use of 'typeof()' to define a
> temporary variable inside the macro means that the final evaluation in
> macro scope ends up forcing a read back from the stack. When stack
> protector is enabled (the default for arm64, at least), this causes
> the compiler to vomit up all sorts of junk.
> 
> Unfortunately, dropping pointer qualifiers inside the macro poses quite
> a challenge, especially since the pointed-to type is permitted to be an
> aggregate, and this is relied upon by mm/ code accessing things like
> 'pmd_t'. Based on numerous hacks and discussions on the mailing list,
> this is the best I've managed to come up with.

Hm, maybe this can be brought to work, only very lightly tested. It
basically abuses what -Wignored-qualifiers points out:

  warning: type qualifiers ignored on function return type

Example showing the idea:

const int c(void);
volatile int v(void);

int hack(int x, int y)
{
	typeof(c()) a = x;
	typeof(v()) b = y;

	a += b;
	b += a;
	a += b;
	return a;
}

Since that compiles, a cannot be const-qualified, and the generated code
certainly suggests that b is not volatile-qualified. So something like

#define unqual_type(x) _unqual_type(x, unique_id_dance)
#define _unqual_type(x, id) typeof( ({
  typeof(x) id(void);
  id();
}) )

and perhaps some _Pragma("GCC diagnostic push")/_Pragma("GCC diagnostic
ignored -Wignored-qualifiers")/_Pragma("GCC diagnostic pop") could
prevent the warning (which is in -Wextra, so I don't think it would
appear in a normal build anyway).

No idea how well any of this would work across gcc versions or with clang.

Rasmus

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum()
  2020-04-22  9:49   ` Mark Rutland
@ 2020-04-22 10:41     ` Will Deacon
  2020-04-22 11:01       ` Robin Murphy
  0 siblings, 1 reply; 35+ messages in thread
From: Will Deacon @ 2020-04-22 10:41 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, linux-arch, kernel-team, Michael Ellerman,
	Peter Zijlstra, Linus Torvalds, Segher Boessenkool,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers,
	Robin Murphy

On Wed, Apr 22, 2020 at 10:49:52AM +0100, Mark Rutland wrote:
> On Tue, Apr 21, 2020 at 04:15:31PM +0100, Will Deacon wrote:
> > do_csum() over-reads the source buffer and therefore abuses
> > READ_ONCE_NOCHECK() to avoid tripping up KASAN. In preparation for
> > READ_ONCE_NOCHECK() becoming a macro, and therefore losing its
> > '__no_sanitize_address' annotation, just annotate do_csum() explicitly
> > and fall back to normal loads.
> > 
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Robin Murphy <robin.murphy@arm.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> 
> From a functional perspective:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks.

> I know that Robin had a concern w.r.t. how this would affect the
> codegen, but I think we can follow that up after the series as a whole
> is merged.

Makes sense. I did look at the codegen, fwiw, and it didn't seem especially
bad. One of the LDP's gets cracked in the unlikely() path, but it didn't
look like it would be a disaster (and sprinkling barrier() around to force
the LDP felt really fragile!).

Will

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum()
  2020-04-22 10:41     ` Will Deacon
@ 2020-04-22 11:01       ` Robin Murphy
  2020-04-24  9:41         ` David Laight
  0 siblings, 1 reply; 35+ messages in thread
From: Robin Murphy @ 2020-04-22 11:01 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland
  Cc: linux-kernel, linux-arch, kernel-team, Michael Ellerman,
	Peter Zijlstra, Linus Torvalds, Segher Boessenkool,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers

On 2020-04-22 11:41 am, Will Deacon wrote:
> On Wed, Apr 22, 2020 at 10:49:52AM +0100, Mark Rutland wrote:
>> On Tue, Apr 21, 2020 at 04:15:31PM +0100, Will Deacon wrote:
>>> do_csum() over-reads the source buffer and therefore abuses
>>> READ_ONCE_NOCHECK() to avoid tripping up KASAN. In preparation for
>>> READ_ONCE_NOCHECK() becoming a macro, and therefore losing its
>>> '__no_sanitize_address' annotation, just annotate do_csum() explicitly
>>> and fall back to normal loads.
>>>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>> Signed-off-by: Will Deacon <will@kernel.org>
>>
>>  From a functional perspective:
>>
>> Acked-by: Mark Rutland <mark.rutland@arm.com>
> 
> Thanks.
> 
>> I know that Robin had a concern w.r.t. how this would affect the
>> codegen, but I think we can follow that up after the series as a whole
>> is merged.
> 
> Makes sense. I did look at the codegen, fwiw, and it didn't seem especially
> bad. One of the LDP's gets cracked in the unlikely() path, but it didn't
> look like it would be a disaster (and sprinkling barrier() around to force
> the LDP felt really fragile!).

Sure - I have a nagging feeling that it could still do better WRT 
pipelining the loads anyway, so I'm happy to come back and reconsider 
the local codegen later. It certainly doesn't deserve to stand in the 
way of cross-arch rework.

Other than dereferencing the ptr argument, this code has no cause to 
make any explicit memory accesses of its own, so I don't think we lose 
any practical KASAN coverage by moving the annotation to function level. 
Given all that,

Acked-by: Robin Murphy <robin.murphy@arm.com>

Cheers,
Robin.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Rework READ_ONCE() to improve codegen
  2020-04-22  8:18   ` Will Deacon
@ 2020-04-22 11:37     ` Peter Zijlstra
  2020-04-22 12:26       ` Will Deacon
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2020-04-22 11:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	Android Kernel Team, Mark Rutland, Michael Ellerman,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

On Wed, Apr 22, 2020 at 09:18:39AM +0100, Will Deacon wrote:
> On Tue, Apr 21, 2020 at 11:42:56AM -0700, Linus Torvalds wrote:
> > On Tue, Apr 21, 2020 at 8:15 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > It's me again. This is version four of the READ_ONCE() codegen improvement
> > > patches [...]
> > 
> > Let's just plan on biting the bullet and do this for 5.8. I'm assuming
> > that I'll juet get a pull request from you?
> 
> Sure thing, thanks. I'll get it into -next along with the arm64 bits for
> 5.8, but I'll send it as a separate pull when the time comes. I'll also
> include the sparc32 changes because otherwise the build falls apart and
> we'll get an army of angry robots yelling at us (they seem to form the
> majority of the active sparc32 user base afaict).

So I'm obviously all for these patches; do note however that it collides
most mighty with the KCSAN stuff, which I believe is still pending.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 08/11] READ_ONCE: Drop pointer qualifiers when reading from scalar types
  2020-04-22 10:25   ` Rasmus Villemoes
@ 2020-04-22 11:48     ` Segher Boessenkool
  2020-04-22 13:11       ` Will Deacon
  0 siblings, 1 reply; 35+ messages in thread
From: Segher Boessenkool @ 2020-04-22 11:48 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Will Deacon, linux-kernel, linux-arch, kernel-team, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers

Hi!

On Wed, Apr 22, 2020 at 12:25:03PM +0200, Rasmus Villemoes wrote:
> On 21/04/2020 17.15, Will Deacon wrote:
> > Unfortunately, dropping pointer qualifiers inside the macro poses quite
> > a challenge, especially since the pointed-to type is permitted to be an
> > aggregate, and this is relied upon by mm/ code accessing things like
> > 'pmd_t'. Based on numerous hacks and discussions on the mailing list,
> > this is the best I've managed to come up with.
> 
> Hm, maybe this can be brought to work, only very lightly tested. It
> basically abuses what -Wignored-qualifiers points out:
> 
>   warning: type qualifiers ignored on function return type
> 
> Example showing the idea:
> 
> const int c(void);
> volatile int v(void);
> 
> int hack(int x, int y)
> {
> 	typeof(c()) a = x;
> 	typeof(v()) b = y;
> 
> 	a += b;
> 	b += a;
> 	a += b;
> 	return a;
> }

Nasty.  I like it :-)

> Since that compiles, a cannot be const-qualified, and the generated code
> certainly suggests that b is not volatile-qualified. So something like
> 
> #define unqual_type(x) _unqual_type(x, unique_id_dance)
> #define _unqual_type(x, id) typeof( ({
>   typeof(x) id(void);
>   id();
> }) )
> 
> and perhaps some _Pragma("GCC diagnostic push")/_Pragma("GCC diagnostic
> ignored -Wignored-qualifiers")/_Pragma("GCC diagnostic pop") could
> prevent the warning (which is in -Wextra, so I don't think it would
> appear in a normal build anyway).
> 
> No idea how well any of this would work across gcc versions or with clang.

https://gcc.gnu.org/legacy-ml/gcc-patches/2016-05/msg01054.html

This is defined to work this way in ISO C since C11.

But, it doesn't work with GCC before GCC 7 :-(


Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Rework READ_ONCE() to improve codegen
  2020-04-22 11:37     ` Peter Zijlstra
@ 2020-04-22 12:26       ` Will Deacon
  2020-04-24 13:42         ` Will Deacon
  0 siblings, 1 reply; 35+ messages in thread
From: Will Deacon @ 2020-04-22 12:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	Android Kernel Team, Mark Rutland, Michael Ellerman,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

On Wed, Apr 22, 2020 at 01:37:21PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 22, 2020 at 09:18:39AM +0100, Will Deacon wrote:
> > On Tue, Apr 21, 2020 at 11:42:56AM -0700, Linus Torvalds wrote:
> > > On Tue, Apr 21, 2020 at 8:15 AM Will Deacon <will@kernel.org> wrote:
> > > >
> > > > It's me again. This is version four of the READ_ONCE() codegen improvement
> > > > patches [...]
> > > 
> > > Let's just plan on biting the bullet and do this for 5.8. I'm assuming
> > > that I'll juet get a pull request from you?
> > 
> > Sure thing, thanks. I'll get it into -next along with the arm64 bits for
> > 5.8, but I'll send it as a separate pull when the time comes. I'll also
> > include the sparc32 changes because otherwise the build falls apart and
> > we'll get an army of angry robots yelling at us (they seem to form the
> > majority of the active sparc32 user base afaict).
> 
> So I'm obviously all for these patches; do note however that it collides
> most mighty with the KCSAN stuff, which I believe is still pending.

That stuff has been pending for the last two releases afaict :/

Anyway, I'm happy to either provide a branch with this series on, or do
the merge myself, or send this again based on something else. What works
best for you? The only thing I'd obviously like to avoid is tightly
coupling this to KCSAN if there's a chance of it missing the merge window
again.

Cheers,

Will

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 08/11] READ_ONCE: Drop pointer qualifiers when reading from scalar types
  2020-04-22 11:48     ` Segher Boessenkool
@ 2020-04-22 13:11       ` Will Deacon
  0 siblings, 0 replies; 35+ messages in thread
From: Will Deacon @ 2020-04-22 13:11 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Rasmus Villemoes, linux-kernel, linux-arch, kernel-team,
	Mark Rutland, Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers

On Wed, Apr 22, 2020 at 06:48:07AM -0500, Segher Boessenkool wrote:
> On Wed, Apr 22, 2020 at 12:25:03PM +0200, Rasmus Villemoes wrote:
> > On 21/04/2020 17.15, Will Deacon wrote:
> > > Unfortunately, dropping pointer qualifiers inside the macro poses quite
> > > a challenge, especially since the pointed-to type is permitted to be an
> > > aggregate, and this is relied upon by mm/ code accessing things like
> > > 'pmd_t'. Based on numerous hacks and discussions on the mailing list,
> > > this is the best I've managed to come up with.
> > 
> > Hm, maybe this can be brought to work, only very lightly tested. It
> > basically abuses what -Wignored-qualifiers points out:
> > 
> >   warning: type qualifiers ignored on function return type
> > 
> > Example showing the idea:
> > 
> > const int c(void);
> > volatile int v(void);
> > 
> > int hack(int x, int y)
> > {
> > 	typeof(c()) a = x;
> > 	typeof(v()) b = y;
> > 
> > 	a += b;
> > 	b += a;
> > 	a += b;
> > 	return a;
> > }
> 
> Nasty.  I like it :-)
> 
> > Since that compiles, a cannot be const-qualified, and the generated code
> > certainly suggests that b is not volatile-qualified. So something like
> > 
> > #define unqual_type(x) _unqual_type(x, unique_id_dance)
> > #define _unqual_type(x, id) typeof( ({
> >   typeof(x) id(void);
> >   id();
> > }) )
> > 
> > and perhaps some _Pragma("GCC diagnostic push")/_Pragma("GCC diagnostic
> > ignored -Wignored-qualifiers")/_Pragma("GCC diagnostic pop") could
> > prevent the warning (which is in -Wextra, so I don't think it would
> > appear in a normal build anyway).
> > 
> > No idea how well any of this would work across gcc versions or with clang.
> 
> https://gcc.gnu.org/legacy-ml/gcc-patches/2016-05/msg01054.html
> 
> This is defined to work this way in ISO C since C11.
> 
> But, it doesn't work with GCC before GCC 7 :-(

Damn, that's quite a cool hack! Maybe we'll be able to implement it in a
few years time ;)

WIll

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 08/11] READ_ONCE: Drop pointer qualifiers when reading from scalar types
  2020-04-21 15:15 ` [PATCH v4 08/11] READ_ONCE: Drop pointer qualifiers when reading from scalar types Will Deacon
  2020-04-22 10:25   ` Rasmus Villemoes
@ 2020-04-22 14:54   ` Will Deacon
  1 sibling, 0 replies; 35+ messages in thread
From: Will Deacon @ 2020-04-22 14:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, kernel-team, Mark Rutland, Michael Ellerman,
	Peter Zijlstra, Linus Torvalds, Segher Boessenkool,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers, linux

On Tue, Apr 21, 2020 at 04:15:34PM +0100, Will Deacon wrote:
> diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
> index e970f97a7fcb..233066c92f6f 100644
> --- a/include/linux/compiler_types.h
> +++ b/include/linux/compiler_types.h
> @@ -210,6 +210,27 @@ struct ftrace_likely_data {
>  /* Are two types/vars the same type (ignoring qualifiers)? */
>  #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
>  
> +/*
> + * __unqual_scalar_typeof(x) - Declare an unqualified scalar type, leaving
> + *			       non-scalar types unchanged.
> + *
> + * We build this out of a couple of helper macros in a vain attempt to
> + * help you keep your lunch down while reading it.
> + */
> +#define __pick_scalar_type(x, type, otherwise)					\
> +	__builtin_choose_expr(__same_type(x, type), (type)0, otherwise)
> +
> +#define __pick_integer_type(x, type, otherwise)					\
> +	__pick_scalar_type(x, unsigned type,					\
> +		__pick_scalar_type(x, signed type, otherwise))
> +
> +#define __unqual_scalar_typeof(x) typeof(					\
> +	__pick_integer_type(x, char,						\

Rasmus pointed out to me that 'char' is not __builtin_types_compatible_p()
with either 'signed char' or 'unsigned char', so I'll roll in the diff below
to handle this case.

Will

--->8

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 233066c92f6f..6ed0612bc143 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -220,9 +220,14 @@ struct ftrace_likely_data {
 #define __pick_scalar_type(x, type, otherwise)					\
 	__builtin_choose_expr(__same_type(x, type), (type)0, otherwise)
 
+/*
+ * 'char' is not type-compatible with either 'signed char' or 'unsigned char',
+ * so we include the naked type here as well as the signed/unsigned variants.
+ */
 #define __pick_integer_type(x, type, otherwise)					\
-	__pick_scalar_type(x, unsigned type,					\
-		__pick_scalar_type(x, signed type, otherwise))
+	__pick_scalar_type(x, type,						\
+		__pick_scalar_type(x, unsigned type,				\
+			__pick_scalar_type(x, signed type, otherwise)))
 
 #define __unqual_scalar_typeof(x) typeof(					\
 	__pick_integer_type(x, char,						\

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* RE: [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum()
  2020-04-22 11:01       ` Robin Murphy
@ 2020-04-24  9:41         ` David Laight
  2020-04-24 11:00           ` Robin Murphy
  0 siblings, 1 reply; 35+ messages in thread
From: David Laight @ 2020-04-24  9:41 UTC (permalink / raw)
  To: 'Robin Murphy', Will Deacon, Mark Rutland
  Cc: linux-kernel, linux-arch, kernel-team, Michael Ellerman,
	Peter Zijlstra, Linus Torvalds, Segher Boessenkool,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers

From: Robin Murphy
> Sent: 22 April 2020 12:02
..
> Sure - I have a nagging feeling that it could still do better WRT
> pipelining the loads anyway, so I'm happy to come back and reconsider
> the local codegen later. It certainly doesn't deserve to stand in the
> way of cross-arch rework.

How fast does that loop actually run?
To my mind it seems to do a lot of operations on each 64bit value.
I'd have thought that a loop based on:
	sum64 = *ptr;
	sum64_high = *ptr++ >> 32;
and then fixing up the result would be faster.

The x86-64 code is also bad!
On intel cpu prior to haswell a simple:
	sum_64 += *ptr32++;
is faster than the current code.
(Although you can do a lot better even on ivy bridge.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum()
  2020-04-24  9:41         ` David Laight
@ 2020-04-24 11:00           ` Robin Murphy
  2020-04-24 13:04             ` David Laight
  0 siblings, 1 reply; 35+ messages in thread
From: Robin Murphy @ 2020-04-24 11:00 UTC (permalink / raw)
  To: David Laight, Will Deacon, Mark Rutland
  Cc: linux-kernel, linux-arch, kernel-team, Michael Ellerman,
	Peter Zijlstra, Linus Torvalds, Segher Boessenkool,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers

On 2020-04-24 10:41 am, David Laight wrote:
> From: Robin Murphy
>> Sent: 22 April 2020 12:02
> ..
>> Sure - I have a nagging feeling that it could still do better WRT
>> pipelining the loads anyway, so I'm happy to come back and reconsider
>> the local codegen later. It certainly doesn't deserve to stand in the
>> way of cross-arch rework.
> 
> How fast does that loop actually run?

I've not characterised it in detail, but faster than any of the other 
attempts so far ;)

> To my mind it seems to do a lot of operations on each 64bit value.
> I'd have thought that a loop based on:
> 	sum64 = *ptr;
> 	sum64_high = *ptr++ >> 32;
> and then fixing up the result would be faster.
> 
> The x86-64 code is also bad!
> On intel cpu prior to haswell a simple:
> 	sum_64 += *ptr32++;
> is faster than the current code.
> (Although you can do a lot better even on ivy bridge.)

The aim here is to minimise load bandwidth - most Arm cores can slurp 16 
bytes from L1 in a single load as quickly as any smaller amount, so 
nibbling away in little 32-bit chunks would result in up to 4x more load 
cycles. Yes, the C code looks ridiculous, but the other trick is that 
most of those operations don't actually exist. Since a __uint128_t is 
really backed by any two 64-bit GPRs - or if you're careful, one 64-bit 
GPR and the carry flag - all those shifts and rotations are in fact 
resolved by register allocation, so what we end up with is a very neat 
loop of essentially just loads and 64-bit accumulation:

...
  138:   a94030c3        ldp     x3, x12, [x6]
  13c:   a9412cc8        ldp     x8, x11, [x6, #16]
  140:   a94228c4        ldp     x4, x10, [x6, #32]
  144:   a94324c7        ldp     x7, x9, [x6, #48]
  148:   ab03018d        adds    x13, x12, x3
  14c:   510100a5        sub     w5, w5, #0x40
  150:   9a0c0063        adc     x3, x3, x12
  154:   ab08016c        adds    x12, x11, x8
  158:   9a0b0108        adc     x8, x8, x11
  15c:   ab04014b        adds    x11, x10, x4
  160:   9a0a0084        adc     x4, x4, x10
  164:   ab07012a        adds    x10, x9, x7
  168:   9a0900e7        adc     x7, x7, x9
  16c:   ab080069        adds    x9, x3, x8
  170:   9a080063        adc     x3, x3, x8
  174:   ab070088        adds    x8, x4, x7
  178:   9a070084        adc     x4, x4, x7
  17c:   910100c6        add     x6, x6, #0x40
  180:   ab040067        adds    x7, x3, x4
  184:   9a040063        adc     x3, x3, x4
  188:   ab010064        adds    x4, x3, x1
  18c:   9a030023        adc     x3, x1, x3
  190:   710100bf        cmp     w5, #0x40
  194:   aa0303e1        mov     x1, x3
  198:   54fffd0c        b.gt    138 <do_csum+0xd8>
...

Instruction-wise, that's about as good as it can get short of 
maintaining multiple accumulators and moving the pairwise folding out of 
the loop. The main thing that I think is still left on the table is that 
the load-to-use distances are pretty short and there's clearly scope to 
spread out and amortise the load cycles better, which stands to benefit 
both big and little cores.

Robin.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum()
  2020-04-24 11:00           ` Robin Murphy
@ 2020-04-24 13:04             ` David Laight
  0 siblings, 0 replies; 35+ messages in thread
From: David Laight @ 2020-04-24 13:04 UTC (permalink / raw)
  To: 'Robin Murphy', Will Deacon, Mark Rutland
  Cc: linux-kernel, linux-arch, kernel-team, Michael Ellerman,
	Peter Zijlstra, Linus Torvalds, Segher Boessenkool,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers

From: Robin Murphy
> Sent: 24 April 2020 12:01
> On 2020-04-24 10:41 am, David Laight wrote:
> > From: Robin Murphy
> >> Sent: 22 April 2020 12:02
> > ..
> >> Sure - I have a nagging feeling that it could still do better WRT
> >> pipelining the loads anyway, so I'm happy to come back and reconsider
> >> the local codegen later. It certainly doesn't deserve to stand in the
> >> way of cross-arch rework.
> >
> > How fast does that loop actually run?
> 
> I've not characterised it in detail, but faster than any of the other
> attempts so far ;)
...
> The aim here is to minimise load bandwidth - most Arm cores can slurp 16
> bytes from L1 in a single load as quickly as any smaller amount, so
> nibbling away in little 32-bit chunks would result in up to 4x more load
> cycles.

The x86 'problem' is that 'adc' takes two clocks and the carry
flag 'register chain' means you can only sum 4 bytes/clock regardless
of the memory accesses.

> Yes, the C code looks ridiculous, but the other trick is that
> most of those operations don't actually exist. Since a __uint128_t is
> really backed by any two 64-bit GPRs - or if you're careful, one 64-bit
> GPR and the carry flag - all those shifts and rotations are in fact
> resolved by register allocation, so what we end up with is a very neat
> loop of essentially just loads and 64-bit accumulation:
> 
> ...
>   138:   a94030c3        ldp     x3, x12, [x6]
>   13c:   a9412cc8        ldp     x8, x11, [x6, #16]
>   140:   a94228c4        ldp     x4, x10, [x6, #32]
>   144:   a94324c7        ldp     x7, x9, [x6, #48]
>   148:   ab03018d        adds    x13, x12, x3
>   14c:   510100a5        sub     w5, w5, #0x40
>   150:   9a0c0063        adc     x3, x3, x12
>   154:   ab08016c        adds    x12, x11, x8
>   158:   9a0b0108        adc     x8, x8, x11
>   15c:   ab04014b        adds    x11, x10, x4
>   160:   9a0a0084        adc     x4, x4, x10
>   164:   ab07012a        adds    x10, x9, x7
>   168:   9a0900e7        adc     x7, x7, x9
>   16c:   ab080069        adds    x9, x3, x8
>   170:   9a080063        adc     x3, x3, x8
>   174:   ab070088        adds    x8, x4, x7
>   178:   9a070084        adc     x4, x4, x7
>   17c:   910100c6        add     x6, x6, #0x40
>   180:   ab040067        adds    x7, x3, x4
>   184:   9a040063        adc     x3, x3, x4
>   188:   ab010064        adds    x4, x3, x1
>   18c:   9a030023        adc     x3, x1, x3
>   190:   710100bf        cmp     w5, #0x40
>   194:   aa0303e1        mov     x1, x3
>   198:   54fffd0c        b.gt    138 <do_csum+0xd8>
> ...
> 
> Instruction-wise, that's about as good as it can get short of
> maintaining multiple accumulators and moving the pairwise folding out of
> the loop. The main thing that I think is still left on the table is that
> the load-to-use distances are pretty short and there's clearly scope to
> spread out and amortise the load cycles better, which stands to benefit
> both big and little cores.

I realised most of the C would disappear - just hard to see what
the result would be.
Looking at the above there are 8 (64bit) loads and 16 adds.
(Plus 2 adds for the loop control, should only need one!
and a spare register move.)
Without multiple carry flags the best you are going to get
is one add instruction and one 'save the carry flag' instruction
for each 'word'.
The thing then is to arrange the code to avoid register dependency
chains so that the instructions can run in parallel.
I think you lose at the bottom of the above when you add to
the global sum - might be faster with 2 sums.
Actually trying to do one load and 4 adds every clock might
be possible - if the cpu can execute them.
But that would require a horrid interleaved loop.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Rework READ_ONCE() to improve codegen
  2020-04-22 12:26       ` Will Deacon
@ 2020-04-24 13:42         ` Will Deacon
  2020-04-24 15:54           ` Marco Elver
  0 siblings, 1 reply; 35+ messages in thread
From: Will Deacon @ 2020-04-24 13:42 UTC (permalink / raw)
  To: Peter Zijlstra, elver, dvyukov
  Cc: Linus Torvalds, Linux Kernel Mailing List, linux-arch,
	Android Kernel Team, Mark Rutland, Michael Ellerman,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

Hi Peter,

[+KCSAN folks]

On Wed, Apr 22, 2020 at 01:26:27PM +0100, Will Deacon wrote:
> On Wed, Apr 22, 2020 at 01:37:21PM +0200, Peter Zijlstra wrote:
> > On Wed, Apr 22, 2020 at 09:18:39AM +0100, Will Deacon wrote:
> > > On Tue, Apr 21, 2020 at 11:42:56AM -0700, Linus Torvalds wrote:
> > > > On Tue, Apr 21, 2020 at 8:15 AM Will Deacon <will@kernel.org> wrote:
> > > > >
> > > > > It's me again. This is version four of the READ_ONCE() codegen improvement
> > > > > patches [...]
> > > > 
> > > > Let's just plan on biting the bullet and do this for 5.8. I'm assuming
> > > > that I'll juet get a pull request from you?
> > > 
> > > Sure thing, thanks. I'll get it into -next along with the arm64 bits for
> > > 5.8, but I'll send it as a separate pull when the time comes. I'll also
> > > include the sparc32 changes because otherwise the build falls apart and
> > > we'll get an army of angry robots yelling at us (they seem to form the
> > > majority of the active sparc32 user base afaict).
> > 
> > So I'm obviously all for these patches; do note however that it collides
> > most mighty with the KCSAN stuff, which I believe is still pending.
> 
> That stuff has been pending for the last two releases afaict :/
> 
> Anyway, I'm happy to either provide a branch with this series on, or do
> the merge myself, or send this again based on something else. What works
> best for you? The only thing I'd obviously like to avoid is tightly
> coupling this to KCSAN if there's a chance of it missing the merge window
> again.

FWIW, I had a go at rebasing onto linux-next, just to get an idea for how
bad it is. It's fairly bad, and I don't think it's fair to inflict it on
sfr. I've included the interesting part of the resulting compiler.h below
for you and the KCSAN crowd to take a look at (yes, there's room for
subsequent cleanup, but I was focussing on the conflict resolution for now).

So, I think the best bet is either for my changes to go into -tip on top
of the KCSAN stuff, or for the KCSAN stuff to be dropped from -next (it's
been there since at least January). Do you know if they are definitely
supposed to be going in for 5.8?

Any other ideas?

Cheers,

Will

--->8

/*
 * Prevent the compiler from merging or refetching reads or writes. The
 * compiler is also forbidden from reordering successive instances of
 * READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
 * particular ordering. One way to make the compiler aware of ordering is to
 * put the two invocations of READ_ONCE or WRITE_ONCE in different C
 * statements.
 *
 * These two macros will also work on aggregate data types like structs or
 * unions.
 *
 * Their two major use cases are: (1) Mediating communication between
 * process-level code and irq/NMI handlers, all running on the same CPU,
 * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
 * mutilate accesses that either do not require ordering or that interact
 * with an explicit memory barrier or atomic instruction that provides the
 * required ordering.
 */
#include <asm/barrier.h>
#include <linux/kasan-checks.h>
#include <linux/kcsan-checks.h>

/*
 * Use __READ_ONCE() instead of READ_ONCE() if you do not require any
 * atomicity or dependency ordering guarantees. Note that this may result
 * in tears!
 */
#define __READ_ONCE(x)	(*(const volatile __unqual_scalar_typeof(x) *)&(x))

#define __READ_ONCE_SCALAR(x)						\
({									\
	typeof(x) *__xp = &(x);						\
	kcsan_check_atomic_read(__xp, sizeof(*__xp));			\
	kcsan_disable_current();					\
	({								\
		__unqual_scalar_typeof(x) __x = __READ_ONCE(*__xp);	\
		kcsan_enable_current();					\
		smp_read_barrier_depends();				\
		(typeof(x))__x;						\
	});								\
})

#define READ_ONCE(x)							\
({									\
	compiletime_assert_rwonce_type(x);				\
	__READ_ONCE_SCALAR(x);						\
})

#define __WRITE_ONCE(x, val)						\
do {									\
	*(volatile typeof(x) *)&(x) = (val);				\
} while (0)

#define __WRITE_ONCE_SCALAR(x, val)					\
do {									\
	typeof(x) *__xp = &(x);						\
	kcsan_check_atomic_write(__xp, sizeof(*__xp));			\
	kcsan_disable_current();					\
	__WRITE_ONCE(*__xp, val);					\
	kcsan_enable_current();						\
} while (0)

#define WRITE_ONCE(x, val)						\
do {									\
	compiletime_assert_rwonce_type(x);				\
	__WRITE_ONCE_SCALAR(x, val);					\
} while (0)

#ifdef CONFIG_KASAN
/*
 * We can't declare function 'inline' because __no_sanitize_address conflicts
 * with inlining. Attempt to inline it may cause a build failure.
 *     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368
 * '__maybe_unused' allows us to avoid defined-but-not-used warnings.
 */
# define __no_kasan_or_inline __no_sanitize_address notrace __maybe_unused
# define __no_sanitize_or_inline __no_kasan_or_inline
#else
# define __no_kasan_or_inline __always_inline
#endif

#define __no_kcsan __no_sanitize_thread
#ifdef __SANITIZE_THREAD__
/*
 * Rely on __SANITIZE_THREAD__ instead of CONFIG_KCSAN, to avoid not inlining in
 * compilation units where instrumentation is disabled. The attribute 'noinline'
 * is required for older compilers, where implicit inlining of very small
 * functions renders __no_sanitize_thread ineffective.
 */
# define __no_kcsan_or_inline __no_kcsan noinline notrace __maybe_unused
# define __no_sanitize_or_inline __no_kcsan_or_inline
#else
# define __no_kcsan_or_inline __always_inline
#endif

#ifndef __no_sanitize_or_inline
#define __no_sanitize_or_inline __always_inline
#endif

static __no_sanitize_or_inline
unsigned long __read_once_word_nocheck(const void *addr)
{
	return __READ_ONCE(*(unsigned long *)addr);
}

/*
 * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need to load a
 * word from memory atomically but without telling KASAN/KCSAN. This is
 * usually used by unwinding code when walking the stack of a running process.
 */
#define READ_ONCE_NOCHECK(x)						\
({									\
	unsigned long __x = __read_once_word_nocheck(&(x));		\
	smp_read_barrier_depends();					\
	__x;								\
})

static __no_kasan_or_inline
unsigned long read_word_at_a_time(const void *addr)
{
	kasan_check_read(addr, 1);
	return *(unsigned long *)addr;
}

/**
 * data_race - mark an expression as containing intentional data races
 *
 * This data_race() macro is useful for situations in which data races
 * should be forgiven.  One example is diagnostic code that accesses
 * shared variables but is not a part of the core synchronization design.
 *
 * This macro *does not* affect normal code generation, but is a hint
 * to tooling that data races here are to be ignored.
 */
#define data_race(expr)                                                        \
	({                                                                     \
		typeof(({ expr; })) __val;                                     \
		kcsan_disable_current();                                       \
		__val = ({ expr; });                                           \
		kcsan_enable_current();                                        \
		__val;                                                         \
	})

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Rework READ_ONCE() to improve codegen
  2020-04-24 13:42         ` Will Deacon
@ 2020-04-24 15:54           ` Marco Elver
  2020-04-24 16:52             ` Will Deacon
  0 siblings, 1 reply; 35+ messages in thread
From: Marco Elver @ 2020-04-24 15:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Dmitry Vyukov, Linus Torvalds,
	Linux Kernel Mailing List, linux-arch, Android Kernel Team,
	Mark Rutland, Michael Ellerman, Segher Boessenkool,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers

On Fri, 24 Apr 2020 at 15:42, Will Deacon <will@kernel.org> wrote:
>
> Hi Peter,
>
> [+KCSAN folks]
>
> On Wed, Apr 22, 2020 at 01:26:27PM +0100, Will Deacon wrote:
> > On Wed, Apr 22, 2020 at 01:37:21PM +0200, Peter Zijlstra wrote:
> > > On Wed, Apr 22, 2020 at 09:18:39AM +0100, Will Deacon wrote:
> > > > On Tue, Apr 21, 2020 at 11:42:56AM -0700, Linus Torvalds wrote:
> > > > > On Tue, Apr 21, 2020 at 8:15 AM Will Deacon <will@kernel.org> wrote:
> > > > > >
> > > > > > It's me again. This is version four of the READ_ONCE() codegen improvement
> > > > > > patches [...]
> > > > >
> > > > > Let's just plan on biting the bullet and do this for 5.8. I'm assuming
> > > > > that I'll juet get a pull request from you?
> > > >
> > > > Sure thing, thanks. I'll get it into -next along with the arm64 bits for
> > > > 5.8, but I'll send it as a separate pull when the time comes. I'll also
> > > > include the sparc32 changes because otherwise the build falls apart and
> > > > we'll get an army of angry robots yelling at us (they seem to form the
> > > > majority of the active sparc32 user base afaict).
> > >
> > > So I'm obviously all for these patches; do note however that it collides
> > > most mighty with the KCSAN stuff, which I believe is still pending.
> >
> > That stuff has been pending for the last two releases afaict :/
> >
> > Anyway, I'm happy to either provide a branch with this series on, or do
> > the merge myself, or send this again based on something else. What works
> > best for you? The only thing I'd obviously like to avoid is tightly
> > coupling this to KCSAN if there's a chance of it missing the merge window
> > again.
>
> FWIW, I had a go at rebasing onto linux-next, just to get an idea for how
> bad it is. It's fairly bad, and I don't think it's fair to inflict it on
> sfr. I've included the interesting part of the resulting compiler.h below
> for you and the KCSAN crowd to take a look at (yes, there's room for
> subsequent cleanup, but I was focussing on the conflict resolution for now).

Thanks for the heads up. From what I can tell, your proposed change
may work fine for KCSAN. However, I've had trouble compiling this:

1. kcsan_disable_current() / kcsan_enable_current() do not work as-is,
because READ_ONCE/WRITE_ONCE seems to be used from compilation units
where the KCSAN runtime is not available (e.g.
arch/x86/entry/vdso/Makefile which had to set KCSAN_SANITIZE := n for
that reason).
2. Some new uaccess whitelist entries were needed.

I think this is what's needed:
https://lkml.kernel.org/r/20200424154730.190041-1-elver@google.com

With that you can change the calls to __kcsan_disable_current() /
__kcsan_enable_current() for READ_ONCE() and WRITE_ONCE(). After that,
I was able to compile, and my test suite passed.

Thanks,
-- Marco

> So, I think the best bet is either for my changes to go into -tip on top
> of the KCSAN stuff, or for the KCSAN stuff to be dropped from -next (it's
> been there since at least January). Do you know if they are definitely
> supposed to be going in for 5.8?
>
> Any other ideas?
>
> Cheers,
>
> Will
>
> --->8
>
> /*
>  * Prevent the compiler from merging or refetching reads or writes. The
>  * compiler is also forbidden from reordering successive instances of
>  * READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
>  * particular ordering. One way to make the compiler aware of ordering is to
>  * put the two invocations of READ_ONCE or WRITE_ONCE in different C
>  * statements.
>  *
>  * These two macros will also work on aggregate data types like structs or
>  * unions.
>  *
>  * Their two major use cases are: (1) Mediating communication between
>  * process-level code and irq/NMI handlers, all running on the same CPU,
>  * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
>  * mutilate accesses that either do not require ordering or that interact
>  * with an explicit memory barrier or atomic instruction that provides the
>  * required ordering.
>  */
> #include <asm/barrier.h>
> #include <linux/kasan-checks.h>
> #include <linux/kcsan-checks.h>
>
> /*
>  * Use __READ_ONCE() instead of READ_ONCE() if you do not require any
>  * atomicity or dependency ordering guarantees. Note that this may result
>  * in tears!
>  */
> #define __READ_ONCE(x)  (*(const volatile __unqual_scalar_typeof(x) *)&(x))
>
> #define __READ_ONCE_SCALAR(x)                                           \
> ({                                                                      \
>         typeof(x) *__xp = &(x);                                         \
>         kcsan_check_atomic_read(__xp, sizeof(*__xp));                   \
>         kcsan_disable_current();                                        \
>         ({                                                              \
>                 __unqual_scalar_typeof(x) __x = __READ_ONCE(*__xp);     \
>                 kcsan_enable_current();                                 \
>                 smp_read_barrier_depends();                             \
>                 (typeof(x))__x;                                         \
>         });                                                             \
> })
>
> #define READ_ONCE(x)                                                    \
> ({                                                                      \
>         compiletime_assert_rwonce_type(x);                              \
>         __READ_ONCE_SCALAR(x);                                          \
> })
>
> #define __WRITE_ONCE(x, val)                                            \
> do {                                                                    \
>         *(volatile typeof(x) *)&(x) = (val);                            \
> } while (0)
>
> #define __WRITE_ONCE_SCALAR(x, val)                                     \
> do {                                                                    \
>         typeof(x) *__xp = &(x);                                         \
>         kcsan_check_atomic_write(__xp, sizeof(*__xp));                  \
>         kcsan_disable_current();                                        \
>         __WRITE_ONCE(*__xp, val);                                       \
>         kcsan_enable_current();                                         \
> } while (0)
>
> #define WRITE_ONCE(x, val)                                              \
> do {                                                                    \
>         compiletime_assert_rwonce_type(x);                              \
>         __WRITE_ONCE_SCALAR(x, val);                                    \
> } while (0)
>
> #ifdef CONFIG_KASAN
> /*
>  * We can't declare function 'inline' because __no_sanitize_address conflicts
>  * with inlining. Attempt to inline it may cause a build failure.
>  *     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368
>  * '__maybe_unused' allows us to avoid defined-but-not-used warnings.
>  */
> # define __no_kasan_or_inline __no_sanitize_address notrace __maybe_unused
> # define __no_sanitize_or_inline __no_kasan_or_inline
> #else
> # define __no_kasan_or_inline __always_inline
> #endif
>
> #define __no_kcsan __no_sanitize_thread
> #ifdef __SANITIZE_THREAD__
> /*
>  * Rely on __SANITIZE_THREAD__ instead of CONFIG_KCSAN, to avoid not inlining in
>  * compilation units where instrumentation is disabled. The attribute 'noinline'
>  * is required for older compilers, where implicit inlining of very small
>  * functions renders __no_sanitize_thread ineffective.
>  */
> # define __no_kcsan_or_inline __no_kcsan noinline notrace __maybe_unused
> # define __no_sanitize_or_inline __no_kcsan_or_inline
> #else
> # define __no_kcsan_or_inline __always_inline
> #endif
>
> #ifndef __no_sanitize_or_inline
> #define __no_sanitize_or_inline __always_inline
> #endif
>
> static __no_sanitize_or_inline
> unsigned long __read_once_word_nocheck(const void *addr)
> {
>         return __READ_ONCE(*(unsigned long *)addr);
> }
>
> /*
>  * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need to load a
>  * word from memory atomically but without telling KASAN/KCSAN. This is
>  * usually used by unwinding code when walking the stack of a running process.
>  */
> #define READ_ONCE_NOCHECK(x)                                            \
> ({                                                                      \
>         unsigned long __x = __read_once_word_nocheck(&(x));             \
>         smp_read_barrier_depends();                                     \
>         __x;                                                            \
> })

Unconditionally loading an unsigned long doesn't seem right, and might
also result in OOB reads.


> static __no_kasan_or_inline
> unsigned long read_word_at_a_time(const void *addr)
> {
>         kasan_check_read(addr, 1);
>         return *(unsigned long *)addr;
> }
>
> /**
>  * data_race - mark an expression as containing intentional data races
>  *
>  * This data_race() macro is useful for situations in which data races
>  * should be forgiven.  One example is diagnostic code that accesses
>  * shared variables but is not a part of the core synchronization design.
>  *
>  * This macro *does not* affect normal code generation, but is a hint
>  * to tooling that data races here are to be ignored.
>  */
> #define data_race(expr)                                                        \
>         ({                                                                     \
>                 typeof(({ expr; })) __val;                                     \
>                 kcsan_disable_current();                                       \
>                 __val = ({ expr; });                                           \
>                 kcsan_enable_current();                                        \
>                 __val;                                                         \
>         })

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 07/11] READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses
  2020-04-21 15:15 ` [PATCH v4 07/11] READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses Will Deacon
@ 2020-04-24 16:31   ` Jann Horn
  2020-04-24 17:11     ` Will Deacon
  0 siblings, 1 reply; 35+ messages in thread
From: Jann Horn @ 2020-04-24 16:31 UTC (permalink / raw)
  To: Will Deacon
  Cc: kernel list, linux-arch, kernel-team, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

On Tue, Apr 21, 2020 at 5:15 PM Will Deacon <will@kernel.org> wrote:
> {READ,WRITE}_ONCE() cannot guarantee atomicity for arbitrary data sizes.
> This can be surprising to callers that might incorrectly be expecting
> atomicity for accesses to aggregate structures, although there are other
> callers where tearing is actually permissable (e.g. if they are using
> something akin to sequence locking to protect the access).
[...]
> The slight snag is that we also have to support 64-bit accesses on 32-bit
> architectures, as these appear to be widespread and tend to work out ok
> if either the architecture supports atomic 64-bit accesses (x86, armv7)
> or if the variable being accesses represents a virtual address and
> therefore only requires 32-bit atomicity in practice.
>
> Take a step in that direction by introducing a variant of
> 'compiletime_assert_atomic_type()' and use it to check the pointer
> argument to {READ,WRITE}_ONCE(). Expose __{READ,WRITE}_ONCE() variants
> which are allowed to tear and convert the one broken caller over to the
> new macros.
[...]
> +/*
> + * Yes, this permits 64-bit accesses on 32-bit architectures. These will
> + * actually be atomic in many cases (namely x86), but for others we rely on

I don't think that's correct?


$ cat 32bit_volatile_qword_read_write.c
#include <pthread.h>
#include <err.h>
#include <stdio.h>
#include <stdint.h>

/* make sure it really has proper alignment */
__attribute__((aligned(8)))
volatile uint64_t shared_u64;

static void *thread_fn(void *dummy) {
  while (1) {
    uint64_t value = shared_u64;
    if (value == 0xaaaaaaaaaaaaaaaaUL || value == 0xbbbbbbbbbbbbbbbbUL)
      continue;
    printf("got 0x%llx\n", (unsigned long long)value);
  }
  return NULL;
}

int main(void) {
  printf("shared_u64 at %p\n", &shared_u64);
  pthread_t thread;
  if (pthread_create(&thread, NULL, thread_fn, NULL))
    err(1, "pthread_create");

  while (1) {
    shared_u64 = 0xaaaaaaaaaaaaaaaaUL;
    shared_u64 = 0xbbbbbbbbbbbbbbbbUL;
  }
}
$ gcc -o 32bit_volatile_qword_read_write
32bit_volatile_qword_read_write.c -pthread -Wall
$ ./32bit_volatile_qword_read_write | head # as 64-bit binary
^C
$ gcc -m32 -o 32bit_volatile_qword_read_write
32bit_volatile_qword_read_write.c -pthread -Wall
$ ./32bit_volatile_qword_read_write | head # as 32-bit binary
shared_u64 at 0x56567030
got 0xaaaaaaaabbbbbbbb
got 0xbbbbbbbbaaaaaaaa
got 0xaaaaaaaabbbbbbbb
got 0xbbbbbbbbaaaaaaaa
got 0xbbbbbbbbaaaaaaaa
got 0xbbbbbbbbaaaaaaaa
got 0xbbbbbbbbaaaaaaaa
got 0xbbbbbbbbaaaaaaaa
got 0xbbbbbbbbaaaaaaaa
$


AFAIK 32-bit X86 code that wants to atomically load 8 bytes of memory
has to use CMPXCHG8B; and gcc won't generate such code just based on a
volatile load/store.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 00/11] Rework READ_ONCE() to improve codegen
  2020-04-24 15:54           ` Marco Elver
@ 2020-04-24 16:52             ` Will Deacon
  0 siblings, 0 replies; 35+ messages in thread
From: Will Deacon @ 2020-04-24 16:52 UTC (permalink / raw)
  To: Marco Elver
  Cc: Peter Zijlstra, Dmitry Vyukov, Linus Torvalds,
	Linux Kernel Mailing List, linux-arch, Android Kernel Team,
	Mark Rutland, Michael Ellerman, Segher Boessenkool,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers

On Fri, Apr 24, 2020 at 05:54:10PM +0200, Marco Elver wrote:
> On Fri, 24 Apr 2020 at 15:42, Will Deacon <will@kernel.org> wrote:
> > On Wed, Apr 22, 2020 at 01:26:27PM +0100, Will Deacon wrote:
> > > On Wed, Apr 22, 2020 at 01:37:21PM +0200, Peter Zijlstra wrote:
> > > > So I'm obviously all for these patches; do note however that it collides
> > > > most mighty with the KCSAN stuff, which I believe is still pending.
> > >
> > > That stuff has been pending for the last two releases afaict :/
> > >
> > > Anyway, I'm happy to either provide a branch with this series on, or do
> > > the merge myself, or send this again based on something else. What works
> > > best for you? The only thing I'd obviously like to avoid is tightly
> > > coupling this to KCSAN if there's a chance of it missing the merge window
> > > again.
> >
> > FWIW, I had a go at rebasing onto linux-next, just to get an idea for how
> > bad it is. It's fairly bad, and I don't think it's fair to inflict it on
> > sfr. I've included the interesting part of the resulting compiler.h below
> > for you and the KCSAN crowd to take a look at (yes, there's room for
> > subsequent cleanup, but I was focussing on the conflict resolution for now).
> 
> Thanks for the heads up. From what I can tell, your proposed change
> may work fine for KCSAN. However, I've had trouble compiling this:
> 
> 1. kcsan_disable_current() / kcsan_enable_current() do not work as-is,
> because READ_ONCE/WRITE_ONCE seems to be used from compilation units
> where the KCSAN runtime is not available (e.g.
> arch/x86/entry/vdso/Makefile which had to set KCSAN_SANITIZE := n for
> that reason).
> 2. Some new uaccess whitelist entries were needed.
> 
> I think this is what's needed:
> https://lkml.kernel.org/r/20200424154730.190041-1-elver@google.com
> 
> With that you can change the calls to __kcsan_disable_current() /
> __kcsan_enable_current() for READ_ONCE() and WRITE_ONCE(). After that,
> I was able to compile, and my test suite passed.

Brill, thanks Marco! I'll take a look at your other patches, but I'm pretty
stuck until we've figured out the merging plan for all of this.

Will

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 07/11] READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses
  2020-04-24 16:31   ` Jann Horn
@ 2020-04-24 17:11     ` Will Deacon
  2020-04-24 17:43       ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Will Deacon @ 2020-04-24 17:11 UTC (permalink / raw)
  To: Jann Horn
  Cc: kernel list, linux-arch, kernel-team, Mark Rutland,
	Michael Ellerman, Peter Zijlstra, Linus Torvalds,
	Segher Boessenkool, Christian Borntraeger, Luc Van Oostenryck,
	Arnd Bergmann, Peter Oberparleiter, Masahiro Yamada,
	Nick Desaulniers

On Fri, Apr 24, 2020 at 06:31:35PM +0200, Jann Horn wrote:
> On Tue, Apr 21, 2020 at 5:15 PM Will Deacon <will@kernel.org> wrote:
> > {READ,WRITE}_ONCE() cannot guarantee atomicity for arbitrary data sizes.
> > This can be surprising to callers that might incorrectly be expecting
> > atomicity for accesses to aggregate structures, although there are other
> > callers where tearing is actually permissable (e.g. if they are using
> > something akin to sequence locking to protect the access).
> [...]
> > The slight snag is that we also have to support 64-bit accesses on 32-bit
> > architectures, as these appear to be widespread and tend to work out ok
> > if either the architecture supports atomic 64-bit accesses (x86, armv7)
> > or if the variable being accesses represents a virtual address and
> > therefore only requires 32-bit atomicity in practice.
> >
> > Take a step in that direction by introducing a variant of
> > 'compiletime_assert_atomic_type()' and use it to check the pointer
> > argument to {READ,WRITE}_ONCE(). Expose __{READ,WRITE}_ONCE() variants
> > which are allowed to tear and convert the one broken caller over to the
> > new macros.
> [...]
> > +/*
> > + * Yes, this permits 64-bit accesses on 32-bit architectures. These will
> > + * actually be atomic in many cases (namely x86), but for others we rely on
> 
> I don't think that's correct?

[...]

> AFAIK 32-bit X86 code that wants to atomically load 8 bytes of memory
> has to use CMPXCHG8B; and gcc won't generate such code just based on a
> volatile load/store.

My apologies, you're completely right. I thought that PAE mandated 64-bit
atomicity, like it does on 32-bit ARM, but that's apparently not the case
and looking at the 32-bit x86 pgtable code they have to be really careful
there.

I'll update the comment.

Will

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4 07/11] READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses
  2020-04-24 17:11     ` Will Deacon
@ 2020-04-24 17:43       ` Peter Zijlstra
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2020-04-24 17:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: Jann Horn, kernel list, linux-arch, kernel-team, Mark Rutland,
	Michael Ellerman, Linus Torvalds, Segher Boessenkool,
	Christian Borntraeger, Luc Van Oostenryck, Arnd Bergmann,
	Peter Oberparleiter, Masahiro Yamada, Nick Desaulniers

On Fri, Apr 24, 2020 at 06:11:35PM +0100, Will Deacon wrote:
> My apologies, you're completely right. I thought that PAE mandated 64-bit
> atomicity, like it does on 32-bit ARM, but that's apparently not the case
> and looking at the 32-bit x86 pgtable code they have to be really careful
> there.

They added CMPXCHG8B for PAE, but then we never used it for that because
it's dead slow (MS did use it I think). Instead we play horrible split
word games, grep for CONFIG_GUP_GET_PTE_LOW_HIGH.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2020-04-24 17:44 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-21 15:15 [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Will Deacon
2020-04-21 15:15 ` [PATCH v4 01/11] compiler/gcc: Raise minimum GCC version for kernel builds to 4.8 Will Deacon
2020-04-21 17:15   ` Masahiro Yamada
2020-04-21 15:15 ` [PATCH v4 02/11] netfilter: Avoid assigning 'const' pointer to non-const pointer Will Deacon
2020-04-21 15:15 ` [PATCH v4 03/11] net: tls: " Will Deacon
2020-04-21 15:15 ` [PATCH v4 04/11] fault_inject: Don't rely on "return value" from WRITE_ONCE() Will Deacon
2020-04-21 15:15 ` [PATCH v4 05/11] arm64: csum: Disable KASAN for do_csum() Will Deacon
2020-04-22  9:49   ` Mark Rutland
2020-04-22 10:41     ` Will Deacon
2020-04-22 11:01       ` Robin Murphy
2020-04-24  9:41         ` David Laight
2020-04-24 11:00           ` Robin Murphy
2020-04-24 13:04             ` David Laight
2020-04-21 15:15 ` [PATCH v4 06/11] READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE() Will Deacon
2020-04-22  9:51   ` Mark Rutland
2020-04-21 15:15 ` [PATCH v4 07/11] READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses Will Deacon
2020-04-24 16:31   ` Jann Horn
2020-04-24 17:11     ` Will Deacon
2020-04-24 17:43       ` Peter Zijlstra
2020-04-21 15:15 ` [PATCH v4 08/11] READ_ONCE: Drop pointer qualifiers when reading from scalar types Will Deacon
2020-04-22 10:25   ` Rasmus Villemoes
2020-04-22 11:48     ` Segher Boessenkool
2020-04-22 13:11       ` Will Deacon
2020-04-22 14:54   ` Will Deacon
2020-04-21 15:15 ` [PATCH v4 09/11] locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros Will Deacon
2020-04-21 15:15 ` [PATCH v4 10/11] arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros Will Deacon
2020-04-21 15:15 ` [PATCH v4 11/11] gcov: Remove old GCC 3.4 support Will Deacon
2020-04-21 17:19   ` Masahiro Yamada
2020-04-21 18:42 ` [PATCH v4 00/11] Rework READ_ONCE() to improve codegen Linus Torvalds
2020-04-22  8:18   ` Will Deacon
2020-04-22 11:37     ` Peter Zijlstra
2020-04-22 12:26       ` Will Deacon
2020-04-24 13:42         ` Will Deacon
2020-04-24 15:54           ` Marco Elver
2020-04-24 16:52             ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).