All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 10:00 ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato, x86, linux-kernel,
	linux-arm-kernel, linux-m68k, linux-crypto, openrisc,
	linuxppc-dev, linux-sh, sparclinux, linux-ntfs-dev, linux-block,
	linux-wireless, netdev, linux-security-module

From: Arnd Bergmann <arnd@arndb.de>

The get_unaligned()/put_unaligned() helpers are traditionally architecture
specific, with the two main variants being the "access-ok.h" version
that assumes unaligned pointer accesses always work on a particular
architecture, and the "le-struct.h" version that casts the data to a
byte aligned type before dereferencing, for architectures that cannot
always do unaligned accesses in hardware.

Based on the discussion linked below, it appears that the access-ok
version is not realiable on any architecture, but the struct version
probably has no downsides. This series changes the code to use the
same implementation on all architectures, addressing the few exceptions
separately.

I've included this version in the asm-generic tree for 5.14 already,
addressing the few issues that were pointed out in the RFC. If there
are any remaining problems, I hope those can be addressed as follow-up
patches.

        Arnd

Link: https://lore.kernel.org/lkml/75d07691-1e4f-741f-9852-38c0b4f520bc@synopsys.com/
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
Link: https://lore.kernel.org/lkml/20210507220813.365382-14-arnd@kernel.org/
Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2


Arnd Bergmann (13):
  asm-generic: use asm-generic/unaligned.h for most architectures
  openrisc: always use unaligned-struct header
  sh: remove unaligned access for sh4a
  m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  powerpc: use linux/unaligned/le_struct.h on LE power7
  asm-generic: unaligned: remove byteshift helpers
  asm-generic: unaligned always use struct helpers
  partitions: msdos: fix one-byte get_unaligned()
  apparmor: use get_unaligned() only for multi-byte words
  mwifiex: re-fix for unaligned accesses
  netpoll: avoid put_unaligned() on single character
  asm-generic: uaccess: 1-byte access is always aligned
  asm-generic: simplify asm/unaligned.h

 arch/alpha/include/asm/unaligned.h          |  12 --
 arch/arm/include/asm/unaligned.h            |  27 ---
 arch/ia64/include/asm/unaligned.h           |  12 --
 arch/m68k/Kconfig                           |   1 +
 arch/m68k/include/asm/unaligned.h           |  26 ---
 arch/microblaze/include/asm/unaligned.h     |  27 ---
 arch/mips/crypto/crc32-mips.c               |   2 +-
 arch/openrisc/include/asm/unaligned.h       |  47 -----
 arch/parisc/include/asm/unaligned.h         |   6 +-
 arch/powerpc/include/asm/unaligned.h        |  22 ---
 arch/sh/include/asm/unaligned-sh4a.h        | 199 --------------------
 arch/sh/include/asm/unaligned.h             |  13 --
 arch/sparc/include/asm/unaligned.h          |  11 --
 arch/x86/include/asm/unaligned.h            |  15 --
 arch/xtensa/include/asm/unaligned.h         |  29 ---
 block/partitions/ldm.h                      |   2 +-
 block/partitions/msdos.c                    |   2 +-
 drivers/net/wireless/marvell/mwifiex/pcie.c |  10 +-
 include/asm-generic/uaccess.h               |   4 +-
 include/asm-generic/unaligned.h             | 141 +++++++++++---
 include/linux/unaligned/access_ok.h         |  68 -------
 include/linux/unaligned/be_byteshift.h      |  71 -------
 include/linux/unaligned/be_memmove.h        |  37 ----
 include/linux/unaligned/be_struct.h         |  37 ----
 include/linux/unaligned/generic.h           | 115 -----------
 include/linux/unaligned/le_byteshift.h      |  71 -------
 include/linux/unaligned/le_memmove.h        |  37 ----
 include/linux/unaligned/le_struct.h         |  37 ----
 include/linux/unaligned/memmove.h           |  46 -----
 net/core/netpoll.c                          |   4 +-
 security/apparmor/policy_unpack.c           |   2 +-
 31 files changed, 131 insertions(+), 1002 deletions(-)
 delete mode 100644 arch/alpha/include/asm/unaligned.h
 delete mode 100644 arch/arm/include/asm/unaligned.h
 delete mode 100644 arch/ia64/include/asm/unaligned.h
 delete mode 100644 arch/m68k/include/asm/unaligned.h
 delete mode 100644 arch/microblaze/include/asm/unaligned.h
 delete mode 100644 arch/openrisc/include/asm/unaligned.h
 delete mode 100644 arch/powerpc/include/asm/unaligned.h
 delete mode 100644 arch/sh/include/asm/unaligned-sh4a.h
 delete mode 100644 arch/sh/include/asm/unaligned.h
 delete mode 100644 arch/sparc/include/asm/unaligned.h
 delete mode 100644 arch/x86/include/asm/unaligned.h
 delete mode 100644 arch/xtensa/include/asm/unaligned.h
 delete mode 100644 include/linux/unaligned/access_ok.h
 delete mode 100644 include/linux/unaligned/be_byteshift.h
 delete mode 100644 include/linux/unaligned/be_memmove.h
 delete mode 100644 include/linux/unaligned/be_struct.h
 delete mode 100644 include/linux/unaligned/generic.h
 delete mode 100644 include/linux/unaligned/le_byteshift.h
 delete mode 100644 include/linux/unaligned/le_memmove.h
 delete mode 100644 include/linux/unaligned/le_struct.h
 delete mode 100644 include/linux/unaligned/memmove.h

-- 
2.29.2

Cc: Amitkumar Karwar <amitkarwar@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ganapathi Bhat <ganapathi017@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sharvari Harisangam <sharvari.harisangam@nxp.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: Xinming Hu <huxinming820@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-crypto@vger.kernel.org
Cc: openrisc@lists.librecores.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: linux-block@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-security-module@vger.kernel.org



^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 10:00 ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Rich Felker, linux-sh, Richard Russon (FlatCap),
	Amitkumar Karwar, Russell King, Eric Dumazet, Paul Mackerras,
	H. Peter Anvin, sparclinux, Thomas Gleixner, Jonas Bonn,
	Florian Fainelli, Yoshinori Sato, x86, James Morris, Ingo Molnar,
	Geert Uytterhoeven, linux-arm-kernel, Jakub Kicinski,
	Serge E. Hallyn, Arnd Bergmann, Ganapathi Bhat, linuxppc-dev,
	Stefan Kristiansson, linux-block, linux-m68k, openrisc,
	Borislav Petkov, Stafford Horne, Kalle Valo, Jens Axboe,
	John Johansen, Xinming Hu, Vineet Gupta, linux-wireless,
	linux-kernel, Vladimir Oltean, linux-ntfs-dev,
	linux-security-module, linux-crypto, netdev, Linus Torvalds,
	Sharvari Harisangam

From: Arnd Bergmann <arnd@arndb.de>

The get_unaligned()/put_unaligned() helpers are traditionally architecture
specific, with the two main variants being the "access-ok.h" version
that assumes unaligned pointer accesses always work on a particular
architecture, and the "le-struct.h" version that casts the data to a
byte aligned type before dereferencing, for architectures that cannot
always do unaligned accesses in hardware.

Based on the discussion linked below, it appears that the access-ok
version is not realiable on any architecture, but the struct version
probably has no downsides. This series changes the code to use the
same implementation on all architectures, addressing the few exceptions
separately.

I've included this version in the asm-generic tree for 5.14 already,
addressing the few issues that were pointed out in the RFC. If there
are any remaining problems, I hope those can be addressed as follow-up
patches.

        Arnd

Link: https://lore.kernel.org/lkml/75d07691-1e4f-741f-9852-38c0b4f520bc@synopsys.com/
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
Link: https://lore.kernel.org/lkml/20210507220813.365382-14-arnd@kernel.org/
Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2


Arnd Bergmann (13):
  asm-generic: use asm-generic/unaligned.h for most architectures
  openrisc: always use unaligned-struct header
  sh: remove unaligned access for sh4a
  m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  powerpc: use linux/unaligned/le_struct.h on LE power7
  asm-generic: unaligned: remove byteshift helpers
  asm-generic: unaligned always use struct helpers
  partitions: msdos: fix one-byte get_unaligned()
  apparmor: use get_unaligned() only for multi-byte words
  mwifiex: re-fix for unaligned accesses
  netpoll: avoid put_unaligned() on single character
  asm-generic: uaccess: 1-byte access is always aligned
  asm-generic: simplify asm/unaligned.h

 arch/alpha/include/asm/unaligned.h          |  12 --
 arch/arm/include/asm/unaligned.h            |  27 ---
 arch/ia64/include/asm/unaligned.h           |  12 --
 arch/m68k/Kconfig                           |   1 +
 arch/m68k/include/asm/unaligned.h           |  26 ---
 arch/microblaze/include/asm/unaligned.h     |  27 ---
 arch/mips/crypto/crc32-mips.c               |   2 +-
 arch/openrisc/include/asm/unaligned.h       |  47 -----
 arch/parisc/include/asm/unaligned.h         |   6 +-
 arch/powerpc/include/asm/unaligned.h        |  22 ---
 arch/sh/include/asm/unaligned-sh4a.h        | 199 --------------------
 arch/sh/include/asm/unaligned.h             |  13 --
 arch/sparc/include/asm/unaligned.h          |  11 --
 arch/x86/include/asm/unaligned.h            |  15 --
 arch/xtensa/include/asm/unaligned.h         |  29 ---
 block/partitions/ldm.h                      |   2 +-
 block/partitions/msdos.c                    |   2 +-
 drivers/net/wireless/marvell/mwifiex/pcie.c |  10 +-
 include/asm-generic/uaccess.h               |   4 +-
 include/asm-generic/unaligned.h             | 141 +++++++++++---
 include/linux/unaligned/access_ok.h         |  68 -------
 include/linux/unaligned/be_byteshift.h      |  71 -------
 include/linux/unaligned/be_memmove.h        |  37 ----
 include/linux/unaligned/be_struct.h         |  37 ----
 include/linux/unaligned/generic.h           | 115 -----------
 include/linux/unaligned/le_byteshift.h      |  71 -------
 include/linux/unaligned/le_memmove.h        |  37 ----
 include/linux/unaligned/le_struct.h         |  37 ----
 include/linux/unaligned/memmove.h           |  46 -----
 net/core/netpoll.c                          |   4 +-
 security/apparmor/policy_unpack.c           |   2 +-
 31 files changed, 131 insertions(+), 1002 deletions(-)
 delete mode 100644 arch/alpha/include/asm/unaligned.h
 delete mode 100644 arch/arm/include/asm/unaligned.h
 delete mode 100644 arch/ia64/include/asm/unaligned.h
 delete mode 100644 arch/m68k/include/asm/unaligned.h
 delete mode 100644 arch/microblaze/include/asm/unaligned.h
 delete mode 100644 arch/openrisc/include/asm/unaligned.h
 delete mode 100644 arch/powerpc/include/asm/unaligned.h
 delete mode 100644 arch/sh/include/asm/unaligned-sh4a.h
 delete mode 100644 arch/sh/include/asm/unaligned.h
 delete mode 100644 arch/sparc/include/asm/unaligned.h
 delete mode 100644 arch/x86/include/asm/unaligned.h
 delete mode 100644 arch/xtensa/include/asm/unaligned.h
 delete mode 100644 include/linux/unaligned/access_ok.h
 delete mode 100644 include/linux/unaligned/be_byteshift.h
 delete mode 100644 include/linux/unaligned/be_memmove.h
 delete mode 100644 include/linux/unaligned/be_struct.h
 delete mode 100644 include/linux/unaligned/generic.h
 delete mode 100644 include/linux/unaligned/le_byteshift.h
 delete mode 100644 include/linux/unaligned/le_memmove.h
 delete mode 100644 include/linux/unaligned/le_struct.h
 delete mode 100644 include/linux/unaligned/memmove.h

-- 
2.29.2

Cc: Amitkumar Karwar <amitkarwar@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ganapathi Bhat <ganapathi017@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sharvari Harisangam <sharvari.harisangam@nxp.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: Xinming Hu <huxinming820@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-crypto@vger.kernel.org
Cc: openrisc@lists.librecores.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: linux-block@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-security-module@vger.kernel.org



^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 10:00 ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato, x86, linux-kernel,
	linux-arm-kernel, linux-m68k, linux-crypto, openrisc,
	linuxppc-dev, linux-sh, sparclinux, linux-ntfs-dev, linux-block,
	linux-wireless, netdev, linux-security-module

From: Arnd Bergmann <arnd@arndb.de>

The get_unaligned()/put_unaligned() helpers are traditionally architecture
specific, with the two main variants being the "access-ok.h" version
that assumes unaligned pointer accesses always work on a particular
architecture, and the "le-struct.h" version that casts the data to a
byte aligned type before dereferencing, for architectures that cannot
always do unaligned accesses in hardware.

Based on the discussion linked below, it appears that the access-ok
version is not realiable on any architecture, but the struct version
probably has no downsides. This series changes the code to use the
same implementation on all architectures, addressing the few exceptions
separately.

I've included this version in the asm-generic tree for 5.14 already,
addressing the few issues that were pointed out in the RFC. If there
are any remaining problems, I hope those can be addressed as follow-up
patches.

        Arnd

Link: https://lore.kernel.org/lkml/75d07691-1e4f-741f-9852-38c0b4f520bc@synopsys.com/
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
Link: https://lore.kernel.org/lkml/20210507220813.365382-14-arnd@kernel.org/
Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2


Arnd Bergmann (13):
  asm-generic: use asm-generic/unaligned.h for most architectures
  openrisc: always use unaligned-struct header
  sh: remove unaligned access for sh4a
  m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  powerpc: use linux/unaligned/le_struct.h on LE power7
  asm-generic: unaligned: remove byteshift helpers
  asm-generic: unaligned always use struct helpers
  partitions: msdos: fix one-byte get_unaligned()
  apparmor: use get_unaligned() only for multi-byte words
  mwifiex: re-fix for unaligned accesses
  netpoll: avoid put_unaligned() on single character
  asm-generic: uaccess: 1-byte access is always aligned
  asm-generic: simplify asm/unaligned.h

 arch/alpha/include/asm/unaligned.h          |  12 --
 arch/arm/include/asm/unaligned.h            |  27 ---
 arch/ia64/include/asm/unaligned.h           |  12 --
 arch/m68k/Kconfig                           |   1 +
 arch/m68k/include/asm/unaligned.h           |  26 ---
 arch/microblaze/include/asm/unaligned.h     |  27 ---
 arch/mips/crypto/crc32-mips.c               |   2 +-
 arch/openrisc/include/asm/unaligned.h       |  47 -----
 arch/parisc/include/asm/unaligned.h         |   6 +-
 arch/powerpc/include/asm/unaligned.h        |  22 ---
 arch/sh/include/asm/unaligned-sh4a.h        | 199 --------------------
 arch/sh/include/asm/unaligned.h             |  13 --
 arch/sparc/include/asm/unaligned.h          |  11 --
 arch/x86/include/asm/unaligned.h            |  15 --
 arch/xtensa/include/asm/unaligned.h         |  29 ---
 block/partitions/ldm.h                      |   2 +-
 block/partitions/msdos.c                    |   2 +-
 drivers/net/wireless/marvell/mwifiex/pcie.c |  10 +-
 include/asm-generic/uaccess.h               |   4 +-
 include/asm-generic/unaligned.h             | 141 +++++++++++---
 include/linux/unaligned/access_ok.h         |  68 -------
 include/linux/unaligned/be_byteshift.h      |  71 -------
 include/linux/unaligned/be_memmove.h        |  37 ----
 include/linux/unaligned/be_struct.h         |  37 ----
 include/linux/unaligned/generic.h           | 115 -----------
 include/linux/unaligned/le_byteshift.h      |  71 -------
 include/linux/unaligned/le_memmove.h        |  37 ----
 include/linux/unaligned/le_struct.h         |  37 ----
 include/linux/unaligned/memmove.h           |  46 -----
 net/core/netpoll.c                          |   4 +-
 security/apparmor/policy_unpack.c           |   2 +-
 31 files changed, 131 insertions(+), 1002 deletions(-)
 delete mode 100644 arch/alpha/include/asm/unaligned.h
 delete mode 100644 arch/arm/include/asm/unaligned.h
 delete mode 100644 arch/ia64/include/asm/unaligned.h
 delete mode 100644 arch/m68k/include/asm/unaligned.h
 delete mode 100644 arch/microblaze/include/asm/unaligned.h
 delete mode 100644 arch/openrisc/include/asm/unaligned.h
 delete mode 100644 arch/powerpc/include/asm/unaligned.h
 delete mode 100644 arch/sh/include/asm/unaligned-sh4a.h
 delete mode 100644 arch/sh/include/asm/unaligned.h
 delete mode 100644 arch/sparc/include/asm/unaligned.h
 delete mode 100644 arch/x86/include/asm/unaligned.h
 delete mode 100644 arch/xtensa/include/asm/unaligned.h
 delete mode 100644 include/linux/unaligned/access_ok.h
 delete mode 100644 include/linux/unaligned/be_byteshift.h
 delete mode 100644 include/linux/unaligned/be_memmove.h
 delete mode 100644 include/linux/unaligned/be_struct.h
 delete mode 100644 include/linux/unaligned/generic.h
 delete mode 100644 include/linux/unaligned/le_byteshift.h
 delete mode 100644 include/linux/unaligned/le_memmove.h
 delete mode 100644 include/linux/unaligned/le_struct.h
 delete mode 100644 include/linux/unaligned/memmove.h

-- 
2.29.2

Cc: Amitkumar Karwar <amitkarwar@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ganapathi Bhat <ganapathi017@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sharvari Harisangam <sharvari.harisangam@nxp.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: Xinming Hu <huxinming820@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-crypto@vger.kernel.org
Cc: openrisc@lists.librecores.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: linux-block@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-security-module@vger.kernel.org



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 10:00 ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: openrisc

From: Arnd Bergmann <arnd@arndb.de>

The get_unaligned()/put_unaligned() helpers are traditionally architecture
specific, with the two main variants being the "access-ok.h" version
that assumes unaligned pointer accesses always work on a particular
architecture, and the "le-struct.h" version that casts the data to a
byte aligned type before dereferencing, for architectures that cannot
always do unaligned accesses in hardware.

Based on the discussion linked below, it appears that the access-ok
version is not realiable on any architecture, but the struct version
probably has no downsides. This series changes the code to use the
same implementation on all architectures, addressing the few exceptions
separately.

I've included this version in the asm-generic tree for 5.14 already,
addressing the few issues that were pointed out in the RFC. If there
are any remaining problems, I hope those can be addressed as follow-up
patches.

        Arnd

Link: https://lore.kernel.org/lkml/75d07691-1e4f-741f-9852-38c0b4f520bc at synopsys.com/
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
Link: https://lore.kernel.org/lkml/20210507220813.365382-14-arnd at kernel.org/
Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2


Arnd Bergmann (13):
  asm-generic: use asm-generic/unaligned.h for most architectures
  openrisc: always use unaligned-struct header
  sh: remove unaligned access for sh4a
  m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  powerpc: use linux/unaligned/le_struct.h on LE power7
  asm-generic: unaligned: remove byteshift helpers
  asm-generic: unaligned always use struct helpers
  partitions: msdos: fix one-byte get_unaligned()
  apparmor: use get_unaligned() only for multi-byte words
  mwifiex: re-fix for unaligned accesses
  netpoll: avoid put_unaligned() on single character
  asm-generic: uaccess: 1-byte access is always aligned
  asm-generic: simplify asm/unaligned.h

 arch/alpha/include/asm/unaligned.h          |  12 --
 arch/arm/include/asm/unaligned.h            |  27 ---
 arch/ia64/include/asm/unaligned.h           |  12 --
 arch/m68k/Kconfig                           |   1 +
 arch/m68k/include/asm/unaligned.h           |  26 ---
 arch/microblaze/include/asm/unaligned.h     |  27 ---
 arch/mips/crypto/crc32-mips.c               |   2 +-
 arch/openrisc/include/asm/unaligned.h       |  47 -----
 arch/parisc/include/asm/unaligned.h         |   6 +-
 arch/powerpc/include/asm/unaligned.h        |  22 ---
 arch/sh/include/asm/unaligned-sh4a.h        | 199 --------------------
 arch/sh/include/asm/unaligned.h             |  13 --
 arch/sparc/include/asm/unaligned.h          |  11 --
 arch/x86/include/asm/unaligned.h            |  15 --
 arch/xtensa/include/asm/unaligned.h         |  29 ---
 block/partitions/ldm.h                      |   2 +-
 block/partitions/msdos.c                    |   2 +-
 drivers/net/wireless/marvell/mwifiex/pcie.c |  10 +-
 include/asm-generic/uaccess.h               |   4 +-
 include/asm-generic/unaligned.h             | 141 +++++++++++---
 include/linux/unaligned/access_ok.h         |  68 -------
 include/linux/unaligned/be_byteshift.h      |  71 -------
 include/linux/unaligned/be_memmove.h        |  37 ----
 include/linux/unaligned/be_struct.h         |  37 ----
 include/linux/unaligned/generic.h           | 115 -----------
 include/linux/unaligned/le_byteshift.h      |  71 -------
 include/linux/unaligned/le_memmove.h        |  37 ----
 include/linux/unaligned/le_struct.h         |  37 ----
 include/linux/unaligned/memmove.h           |  46 -----
 net/core/netpoll.c                          |   4 +-
 security/apparmor/policy_unpack.c           |   2 +-
 31 files changed, 131 insertions(+), 1002 deletions(-)
 delete mode 100644 arch/alpha/include/asm/unaligned.h
 delete mode 100644 arch/arm/include/asm/unaligned.h
 delete mode 100644 arch/ia64/include/asm/unaligned.h
 delete mode 100644 arch/m68k/include/asm/unaligned.h
 delete mode 100644 arch/microblaze/include/asm/unaligned.h
 delete mode 100644 arch/openrisc/include/asm/unaligned.h
 delete mode 100644 arch/powerpc/include/asm/unaligned.h
 delete mode 100644 arch/sh/include/asm/unaligned-sh4a.h
 delete mode 100644 arch/sh/include/asm/unaligned.h
 delete mode 100644 arch/sparc/include/asm/unaligned.h
 delete mode 100644 arch/x86/include/asm/unaligned.h
 delete mode 100644 arch/xtensa/include/asm/unaligned.h
 delete mode 100644 include/linux/unaligned/access_ok.h
 delete mode 100644 include/linux/unaligned/be_byteshift.h
 delete mode 100644 include/linux/unaligned/be_memmove.h
 delete mode 100644 include/linux/unaligned/be_struct.h
 delete mode 100644 include/linux/unaligned/generic.h
 delete mode 100644 include/linux/unaligned/le_byteshift.h
 delete mode 100644 include/linux/unaligned/le_memmove.h
 delete mode 100644 include/linux/unaligned/le_struct.h
 delete mode 100644 include/linux/unaligned/memmove.h

-- 
2.29.2

Cc: Amitkumar Karwar <amitkarwar@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ganapathi Bhat <ganapathi017@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sharvari Harisangam <sharvari.harisangam@nxp.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: Xinming Hu <huxinming820@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: x86 at kernel.org
Cc: linux-kernel at vger.kernel.org
Cc: linux-arm-kernel at lists.infradead.org
Cc: linux-m68k at lists.linux-m68k.org
Cc: linux-crypto at vger.kernel.org
Cc: openrisc at lists.librecores.org
Cc: linuxppc-dev at lists.ozlabs.org
Cc: linux-sh at vger.kernel.org
Cc: sparclinux at vger.kernel.org
Cc: linux-ntfs-dev at lists.sourceforge.net
Cc: linux-block at vger.kernel.org
Cc: linux-wireless at vger.kernel.org
Cc: netdev at vger.kernel.org
Cc: linux-arch at vger.kernel.org
Cc: linux-security-module at vger.kernel.org



^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 01/13] asm-generic: use asm-generic/unaligned.h for most architectures
  2021-05-14 10:00 ` Arnd Bergmann
                   ` (2 preceding siblings ...)
  (?)
@ 2021-05-14 10:00 ` Arnd Bergmann
  -1 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, Thomas Gleixner,
	Geert Uytterhoeven, Richard Henderson, Ivan Kokshaysky,
	Matt Turner, Michal Simek, James E.J. Bottomley, Helge Deller,
	David S. Miller, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, Chris Zankel, Max Filippov, linux-alpha,
	linux-kernel, linux-ia64, linux-m68k, linux-parisc, sparclinux,
	linux-xtensa

From: Arnd Bergmann <arnd@arndb.de>

There are several architectures that just duplicate the contents
of asm-generic/unaligned.h, so change those over to use the
file directly, to make future modifications easier.

The exceptions are:

- arm32 sets HAVE_EFFICIENT_UNALIGNED_ACCESS, but wants the
  unaligned-struct version

- ppc64le disables HAVE_EFFICIENT_UNALIGNED_ACCESS but includes
  the access-ok version

- most m68k also uses the access-ok version without setting
  HAVE_EFFICIENT_UNALIGNED_ACCESS.

- sh4a has a custom inline asm version

- openrisc is the only one using the memmove version that
  generally leads to worse code.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
 arch/alpha/include/asm/unaligned.h      | 12 ----------
 arch/ia64/include/asm/unaligned.h       | 12 ----------
 arch/m68k/include/asm/unaligned.h       |  9 +-------
 arch/microblaze/include/asm/unaligned.h | 27 -----------------------
 arch/parisc/include/asm/unaligned.h     |  6 +----
 arch/sparc/include/asm/unaligned.h      | 11 ----------
 arch/x86/include/asm/unaligned.h        | 15 -------------
 arch/xtensa/include/asm/unaligned.h     | 29 -------------------------
 8 files changed, 2 insertions(+), 119 deletions(-)
 delete mode 100644 arch/alpha/include/asm/unaligned.h
 delete mode 100644 arch/ia64/include/asm/unaligned.h
 delete mode 100644 arch/microblaze/include/asm/unaligned.h
 delete mode 100644 arch/sparc/include/asm/unaligned.h
 delete mode 100644 arch/x86/include/asm/unaligned.h
 delete mode 100644 arch/xtensa/include/asm/unaligned.h

diff --git a/arch/alpha/include/asm/unaligned.h b/arch/alpha/include/asm/unaligned.h
deleted file mode 100644
index 863c807b66f8..000000000000
--- a/arch/alpha/include/asm/unaligned.h
+++ /dev/null
@@ -1,12 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_ALPHA_UNALIGNED_H
-#define _ASM_ALPHA_UNALIGNED_H
-
-#include <linux/unaligned/le_struct.h>
-#include <linux/unaligned/be_byteshift.h>
-#include <linux/unaligned/generic.h>
-
-#define get_unaligned __get_unaligned_le
-#define put_unaligned __put_unaligned_le
-
-#endif /* _ASM_ALPHA_UNALIGNED_H */
diff --git a/arch/ia64/include/asm/unaligned.h b/arch/ia64/include/asm/unaligned.h
deleted file mode 100644
index 328942e3cbce..000000000000
--- a/arch/ia64/include/asm/unaligned.h
+++ /dev/null
@@ -1,12 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_IA64_UNALIGNED_H
-#define _ASM_IA64_UNALIGNED_H
-
-#include <linux/unaligned/le_struct.h>
-#include <linux/unaligned/be_byteshift.h>
-#include <linux/unaligned/generic.h>
-
-#define get_unaligned	__get_unaligned_le
-#define put_unaligned	__put_unaligned_le
-
-#endif /* _ASM_IA64_UNALIGNED_H */
diff --git a/arch/m68k/include/asm/unaligned.h b/arch/m68k/include/asm/unaligned.h
index 98c8930d3d35..84e437337344 100644
--- a/arch/m68k/include/asm/unaligned.h
+++ b/arch/m68k/include/asm/unaligned.h
@@ -2,15 +2,8 @@
 #ifndef _ASM_M68K_UNALIGNED_H
 #define _ASM_M68K_UNALIGNED_H
 
-
 #ifdef CONFIG_CPU_HAS_NO_UNALIGNED
-#include <linux/unaligned/be_struct.h>
-#include <linux/unaligned/le_byteshift.h>
-#include <linux/unaligned/generic.h>
-
-#define get_unaligned	__get_unaligned_be
-#define put_unaligned	__put_unaligned_be
-
+#include <asm-generic/unaligned.h>
 #else
 /*
  * The m68k can do unaligned accesses itself.
diff --git a/arch/microblaze/include/asm/unaligned.h b/arch/microblaze/include/asm/unaligned.h
deleted file mode 100644
index 448299beab69..000000000000
--- a/arch/microblaze/include/asm/unaligned.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Copyright (C) 2008 Michal Simek <monstr@monstr.eu>
- * Copyright (C) 2006 Atmark Techno, Inc.
- */
-
-#ifndef _ASM_MICROBLAZE_UNALIGNED_H
-#define _ASM_MICROBLAZE_UNALIGNED_H
-
-# ifdef __KERNEL__
-
-#  ifdef __MICROBLAZEEL__
-#   include <linux/unaligned/le_struct.h>
-#   include <linux/unaligned/be_byteshift.h>
-#   define get_unaligned	__get_unaligned_le
-#   define put_unaligned	__put_unaligned_le
-#  else
-#   include <linux/unaligned/be_struct.h>
-#   include <linux/unaligned/le_byteshift.h>
-#   define get_unaligned	__get_unaligned_be
-#   define put_unaligned	__put_unaligned_be
-#  endif
-
-# include <linux/unaligned/generic.h>
-
-# endif	/* __KERNEL__ */
-#endif /* _ASM_MICROBLAZE_UNALIGNED_H */
diff --git a/arch/parisc/include/asm/unaligned.h b/arch/parisc/include/asm/unaligned.h
index e9029c7c2a69..3bda16773ba6 100644
--- a/arch/parisc/include/asm/unaligned.h
+++ b/arch/parisc/include/asm/unaligned.h
@@ -2,11 +2,7 @@
 #ifndef _ASM_PARISC_UNALIGNED_H
 #define _ASM_PARISC_UNALIGNED_H
 
-#include <linux/unaligned/be_struct.h>
-#include <linux/unaligned/le_byteshift.h>
-#include <linux/unaligned/generic.h>
-#define get_unaligned	__get_unaligned_be
-#define put_unaligned	__put_unaligned_be
+#include <asm-generic/unaligned.h>
 
 #ifdef __KERNEL__
 struct pt_regs;
diff --git a/arch/sparc/include/asm/unaligned.h b/arch/sparc/include/asm/unaligned.h
deleted file mode 100644
index 7971d89d2f54..000000000000
--- a/arch/sparc/include/asm/unaligned.h
+++ /dev/null
@@ -1,11 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_SPARC_UNALIGNED_H
-#define _ASM_SPARC_UNALIGNED_H
-
-#include <linux/unaligned/be_struct.h>
-#include <linux/unaligned/le_byteshift.h>
-#include <linux/unaligned/generic.h>
-#define get_unaligned	__get_unaligned_be
-#define put_unaligned	__put_unaligned_be
-
-#endif /* _ASM_SPARC_UNALIGNED_H */
diff --git a/arch/x86/include/asm/unaligned.h b/arch/x86/include/asm/unaligned.h
deleted file mode 100644
index 9c754a7447aa..000000000000
--- a/arch/x86/include/asm/unaligned.h
+++ /dev/null
@@ -1,15 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_X86_UNALIGNED_H
-#define _ASM_X86_UNALIGNED_H
-
-/*
- * The x86 can do unaligned accesses itself.
- */
-
-#include <linux/unaligned/access_ok.h>
-#include <linux/unaligned/generic.h>
-
-#define get_unaligned __get_unaligned_le
-#define put_unaligned __put_unaligned_le
-
-#endif /* _ASM_X86_UNALIGNED_H */
diff --git a/arch/xtensa/include/asm/unaligned.h b/arch/xtensa/include/asm/unaligned.h
deleted file mode 100644
index 8e7ed046bfed..000000000000
--- a/arch/xtensa/include/asm/unaligned.h
+++ /dev/null
@@ -1,29 +0,0 @@
-/*
- * Xtensa doesn't handle unaligned accesses efficiently.
- *
- * This file is subject to the terms and conditions of the GNU General Public
- * License.  See the file "COPYING" in the main directory of this archive
- * for more details.
- *
- * Copyright (C) 2001 - 2005 Tensilica Inc.
- */
-#ifndef _ASM_XTENSA_UNALIGNED_H
-#define _ASM_XTENSA_UNALIGNED_H
-
-#include <asm/byteorder.h>
-
-#ifdef __LITTLE_ENDIAN
-# include <linux/unaligned/le_struct.h>
-# include <linux/unaligned/be_byteshift.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned	__get_unaligned_le
-# define put_unaligned	__put_unaligned_le
-#else
-# include <linux/unaligned/be_struct.h>
-# include <linux/unaligned/le_byteshift.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned	__get_unaligned_be
-# define put_unaligned	__put_unaligned_be
-#endif
-
-#endif	/* _ASM_XTENSA_UNALIGNED_H */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 02/13] openrisc: always use unaligned-struct header
  2021-05-14 10:00 ` Arnd Bergmann
@ 2021-05-14 10:00   ` Arnd Bergmann
  -1 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, Stafford Horne,
	Jonas Bonn, Stefan Kristiansson, openrisc, linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

openrisc is the only architecture using the linux/unaligned/*memmove
infrastructure. There is a comment saying that this version is more
efficient, but this was added in 2011 before the openrisc gcc port
was merged upstream.

I checked a couple of files to see what the actual difference is with
the mainline gcc (9.4 and 11.1), and found that the generic header
seems to produce better code now, regardless of the gcc version.

Specifically, the be_memmove leads to allocating a stack slot and
copying the data one byte at a time, then reading the whole word
from the stack:

00000000 <test_get_unaligned_memmove>:
   0:	9c 21 ff f4 	l.addi r1,r1,-12
   4:	d4 01 10 04 	l.sw 4(r1),r2
   8:	8e 63 00 00 	l.lbz r19,0(r3)
   c:	9c 41 00 0c 	l.addi r2,r1,12
  10:	8e 23 00 01 	l.lbz r17,1(r3)
  14:	db e2 9f f4 	l.sb -12(r2),r19
  18:	db e2 8f f5 	l.sb -11(r2),r17
  1c:	8e 63 00 02 	l.lbz r19,2(r3)
  20:	8e 23 00 03 	l.lbz r17,3(r3)
  24:	d4 01 48 08 	l.sw 8(r1),r9
  28:	db e2 9f f6 	l.sb -10(r2),r19
  2c:	db e2 8f f7 	l.sb -9(r2),r17
  30:	85 62 ff f4 	l.lwz r11,-12(r2)
  34:	85 21 00 08 	l.lwz r9,8(r1)
  38:	84 41 00 04 	l.lwz r2,4(r1)
  3c:	44 00 48 00 	l.jr r9
  40:	9c 21 00 0c 	l.addi r1,r1,12

while the be_struct version reads each byte into a register
and does a shift to the right position:

00000000 <test_get_unaligned_struct>:
   0:	9c 21 ff f8 	l.addi r1,r1,-8
   4:	8e 63 00 00 	l.lbz r19,0(r3)
   8:	aa 20 00 18 	l.ori r17,r0,0x18
   c:	e2 73 88 08 	l.sll r19,r19,r17
  10:	8d 63 00 01 	l.lbz r11,1(r3)
  14:	aa 20 00 10 	l.ori r17,r0,0x10
  18:	e1 6b 88 08 	l.sll r11,r11,r17
  1c:	e1 6b 98 04 	l.or r11,r11,r19
  20:	8e 23 00 02 	l.lbz r17,2(r3)
  24:	aa 60 00 08 	l.ori r19,r0,0x8
  28:	e2 31 98 08 	l.sll r17,r17,r19
  2c:	d4 01 10 00 	l.sw 0(r1),r2
  30:	d4 01 48 04 	l.sw 4(r1),r9
  34:	9c 41 00 08 	l.addi r2,r1,8
  38:	e2 31 58 04 	l.or r17,r17,r11
  3c:	8d 63 00 03 	l.lbz r11,3(r3)
  40:	e1 6b 88 04 	l.or r11,r11,r17
  44:	84 41 00 00 	l.lwz r2,0(r1)
  48:	85 21 00 04 	l.lwz r9,4(r1)
  4c:	44 00 48 00 	l.jr r9
  50:	9c 21 00 08 	l.addi r1,r1,8

According to Stafford Horne, the new version should in fact perform
better.

In the trivial example, the struct version is a few instructions longer,
but building a whole kernel shows an overall reduction in code size,
presumably because it now has to manage fewer stack slots:

   text	   data	    bss	    dec	    hex	filename
4792010	 181480	  82324	5055814	 4d2546	vmlinux-unaligned-memmove
4790642	 181480	  82324	5054446	 4d1fee	vmlinux-unaligned-struct

Remove the memmove version completely and let openrisc use the same
code as everyone else, as a simplification.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Stafford Horne <shorne@gmail.com>
---
 arch/openrisc/include/asm/unaligned.h | 47 ---------------------------
 include/linux/unaligned/be_memmove.h  | 37 ---------------------
 include/linux/unaligned/le_memmove.h  | 37 ---------------------
 include/linux/unaligned/memmove.h     | 46 --------------------------
 4 files changed, 167 deletions(-)
 delete mode 100644 arch/openrisc/include/asm/unaligned.h
 delete mode 100644 include/linux/unaligned/be_memmove.h
 delete mode 100644 include/linux/unaligned/le_memmove.h
 delete mode 100644 include/linux/unaligned/memmove.h

diff --git a/arch/openrisc/include/asm/unaligned.h b/arch/openrisc/include/asm/unaligned.h
deleted file mode 100644
index 14353f2101f2..000000000000
--- a/arch/openrisc/include/asm/unaligned.h
+++ /dev/null
@@ -1,47 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * OpenRISC Linux
- *
- * Linux architectural port borrowing liberally from similar works of
- * others.  All original copyrights apply as per the original source
- * declaration.
- *
- * OpenRISC implementation:
- * Copyright (C) 2003 Matjaz Breskvar <phoenix@bsemi.com>
- * Copyright (C) 2010-2011 Jonas Bonn <jonas@southpole.se>
- * et al.
- */
-
-#ifndef __ASM_OPENRISC_UNALIGNED_H
-#define __ASM_OPENRISC_UNALIGNED_H
-
-/*
- * This is copied from the generic implementation and the C-struct
- * variant replaced with the memmove variant.  The GCC compiler
- * for the OR32 arch optimizes too aggressively for the C-struct
- * variant to work, so use the memmove variant instead.
- *
- * It may be worth considering implementing the unaligned access
- * exception handler and allowing unaligned accesses (access_ok.h)...
- * not sure if it would be much of a performance win without further
- * investigation.
- */
-#include <asm/byteorder.h>
-
-#if defined(__LITTLE_ENDIAN)
-# include <linux/unaligned/le_memmove.h>
-# include <linux/unaligned/be_byteshift.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned	__get_unaligned_le
-# define put_unaligned	__put_unaligned_le
-#elif defined(__BIG_ENDIAN)
-# include <linux/unaligned/be_memmove.h>
-# include <linux/unaligned/le_byteshift.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned	__get_unaligned_be
-# define put_unaligned	__put_unaligned_be
-#else
-# error need to define endianess
-#endif
-
-#endif /* __ASM_OPENRISC_UNALIGNED_H */
diff --git a/include/linux/unaligned/be_memmove.h b/include/linux/unaligned/be_memmove.h
deleted file mode 100644
index 7164214a4ba1..000000000000
--- a/include/linux/unaligned/be_memmove.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_BE_MEMMOVE_H
-#define _LINUX_UNALIGNED_BE_MEMMOVE_H
-
-#include <linux/unaligned/memmove.h>
-
-static inline u16 get_unaligned_be16(const void *p)
-{
-	return __get_unaligned_memmove16((const u8 *)p);
-}
-
-static inline u32 get_unaligned_be32(const void *p)
-{
-	return __get_unaligned_memmove32((const u8 *)p);
-}
-
-static inline u64 get_unaligned_be64(const void *p)
-{
-	return __get_unaligned_memmove64((const u8 *)p);
-}
-
-static inline void put_unaligned_be16(u16 val, void *p)
-{
-	__put_unaligned_memmove16(val, p);
-}
-
-static inline void put_unaligned_be32(u32 val, void *p)
-{
-	__put_unaligned_memmove32(val, p);
-}
-
-static inline void put_unaligned_be64(u64 val, void *p)
-{
-	__put_unaligned_memmove64(val, p);
-}
-
-#endif /* _LINUX_UNALIGNED_LE_MEMMOVE_H */
diff --git a/include/linux/unaligned/le_memmove.h b/include/linux/unaligned/le_memmove.h
deleted file mode 100644
index 9202e864d026..000000000000
--- a/include/linux/unaligned/le_memmove.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_LE_MEMMOVE_H
-#define _LINUX_UNALIGNED_LE_MEMMOVE_H
-
-#include <linux/unaligned/memmove.h>
-
-static inline u16 get_unaligned_le16(const void *p)
-{
-	return __get_unaligned_memmove16((const u8 *)p);
-}
-
-static inline u32 get_unaligned_le32(const void *p)
-{
-	return __get_unaligned_memmove32((const u8 *)p);
-}
-
-static inline u64 get_unaligned_le64(const void *p)
-{
-	return __get_unaligned_memmove64((const u8 *)p);
-}
-
-static inline void put_unaligned_le16(u16 val, void *p)
-{
-	__put_unaligned_memmove16(val, p);
-}
-
-static inline void put_unaligned_le32(u32 val, void *p)
-{
-	__put_unaligned_memmove32(val, p);
-}
-
-static inline void put_unaligned_le64(u64 val, void *p)
-{
-	__put_unaligned_memmove64(val, p);
-}
-
-#endif /* _LINUX_UNALIGNED_LE_MEMMOVE_H */
diff --git a/include/linux/unaligned/memmove.h b/include/linux/unaligned/memmove.h
deleted file mode 100644
index ac71b53bc6dc..000000000000
--- a/include/linux/unaligned/memmove.h
+++ /dev/null
@@ -1,46 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_MEMMOVE_H
-#define _LINUX_UNALIGNED_MEMMOVE_H
-
-#include <linux/kernel.h>
-#include <linux/string.h>
-
-/* Use memmove here, so gcc does not insert a __builtin_memcpy. */
-
-static inline u16 __get_unaligned_memmove16(const void *p)
-{
-	u16 tmp;
-	memmove(&tmp, p, 2);
-	return tmp;
-}
-
-static inline u32 __get_unaligned_memmove32(const void *p)
-{
-	u32 tmp;
-	memmove(&tmp, p, 4);
-	return tmp;
-}
-
-static inline u64 __get_unaligned_memmove64(const void *p)
-{
-	u64 tmp;
-	memmove(&tmp, p, 8);
-	return tmp;
-}
-
-static inline void __put_unaligned_memmove16(u16 val, void *p)
-{
-	memmove(p, &val, 2);
-}
-
-static inline void __put_unaligned_memmove32(u32 val, void *p)
-{
-	memmove(p, &val, 4);
-}
-
-static inline void __put_unaligned_memmove64(u64 val, void *p)
-{
-	memmove(p, &val, 8);
-}
-
-#endif /* _LINUX_UNALIGNED_MEMMOVE_H */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 02/13] openrisc: always use unaligned-struct header
@ 2021-05-14 10:00   ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: openrisc

From: Arnd Bergmann <arnd@arndb.de>

openrisc is the only architecture using the linux/unaligned/*memmove
infrastructure. There is a comment saying that this version is more
efficient, but this was added in 2011 before the openrisc gcc port
was merged upstream.

I checked a couple of files to see what the actual difference is with
the mainline gcc (9.4 and 11.1), and found that the generic header
seems to produce better code now, regardless of the gcc version.

Specifically, the be_memmove leads to allocating a stack slot and
copying the data one byte at a time, then reading the whole word
from the stack:

00000000 <test_get_unaligned_memmove>:
   0:	9c 21 ff f4 	l.addi r1,r1,-12
   4:	d4 01 10 04 	l.sw 4(r1),r2
   8:	8e 63 00 00 	l.lbz r19,0(r3)
   c:	9c 41 00 0c 	l.addi r2,r1,12
  10:	8e 23 00 01 	l.lbz r17,1(r3)
  14:	db e2 9f f4 	l.sb -12(r2),r19
  18:	db e2 8f f5 	l.sb -11(r2),r17
  1c:	8e 63 00 02 	l.lbz r19,2(r3)
  20:	8e 23 00 03 	l.lbz r17,3(r3)
  24:	d4 01 48 08 	l.sw 8(r1),r9
  28:	db e2 9f f6 	l.sb -10(r2),r19
  2c:	db e2 8f f7 	l.sb -9(r2),r17
  30:	85 62 ff f4 	l.lwz r11,-12(r2)
  34:	85 21 00 08 	l.lwz r9,8(r1)
  38:	84 41 00 04 	l.lwz r2,4(r1)
  3c:	44 00 48 00 	l.jr r9
  40:	9c 21 00 0c 	l.addi r1,r1,12

while the be_struct version reads each byte into a register
and does a shift to the right position:

00000000 <test_get_unaligned_struct>:
   0:	9c 21 ff f8 	l.addi r1,r1,-8
   4:	8e 63 00 00 	l.lbz r19,0(r3)
   8:	aa 20 00 18 	l.ori r17,r0,0x18
   c:	e2 73 88 08 	l.sll r19,r19,r17
  10:	8d 63 00 01 	l.lbz r11,1(r3)
  14:	aa 20 00 10 	l.ori r17,r0,0x10
  18:	e1 6b 88 08 	l.sll r11,r11,r17
  1c:	e1 6b 98 04 	l.or r11,r11,r19
  20:	8e 23 00 02 	l.lbz r17,2(r3)
  24:	aa 60 00 08 	l.ori r19,r0,0x8
  28:	e2 31 98 08 	l.sll r17,r17,r19
  2c:	d4 01 10 00 	l.sw 0(r1),r2
  30:	d4 01 48 04 	l.sw 4(r1),r9
  34:	9c 41 00 08 	l.addi r2,r1,8
  38:	e2 31 58 04 	l.or r17,r17,r11
  3c:	8d 63 00 03 	l.lbz r11,3(r3)
  40:	e1 6b 88 04 	l.or r11,r11,r17
  44:	84 41 00 00 	l.lwz r2,0(r1)
  48:	85 21 00 04 	l.lwz r9,4(r1)
  4c:	44 00 48 00 	l.jr r9
  50:	9c 21 00 08 	l.addi r1,r1,8

According to Stafford Horne, the new version should in fact perform
better.

In the trivial example, the struct version is a few instructions longer,
but building a whole kernel shows an overall reduction in code size,
presumably because it now has to manage fewer stack slots:

   text	   data	    bss	    dec	    hex	filename
4792010	 181480	  82324	5055814	 4d2546	vmlinux-unaligned-memmove
4790642	 181480	  82324	5054446	 4d1fee	vmlinux-unaligned-struct

Remove the memmove version completely and let openrisc use the same
code as everyone else, as a simplification.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Stafford Horne <shorne@gmail.com>
---
 arch/openrisc/include/asm/unaligned.h | 47 ---------------------------
 include/linux/unaligned/be_memmove.h  | 37 ---------------------
 include/linux/unaligned/le_memmove.h  | 37 ---------------------
 include/linux/unaligned/memmove.h     | 46 --------------------------
 4 files changed, 167 deletions(-)
 delete mode 100644 arch/openrisc/include/asm/unaligned.h
 delete mode 100644 include/linux/unaligned/be_memmove.h
 delete mode 100644 include/linux/unaligned/le_memmove.h
 delete mode 100644 include/linux/unaligned/memmove.h

diff --git a/arch/openrisc/include/asm/unaligned.h b/arch/openrisc/include/asm/unaligned.h
deleted file mode 100644
index 14353f2101f2..000000000000
--- a/arch/openrisc/include/asm/unaligned.h
+++ /dev/null
@@ -1,47 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * OpenRISC Linux
- *
- * Linux architectural port borrowing liberally from similar works of
- * others.  All original copyrights apply as per the original source
- * declaration.
- *
- * OpenRISC implementation:
- * Copyright (C) 2003 Matjaz Breskvar <phoenix@bsemi.com>
- * Copyright (C) 2010-2011 Jonas Bonn <jonas@southpole.se>
- * et al.
- */
-
-#ifndef __ASM_OPENRISC_UNALIGNED_H
-#define __ASM_OPENRISC_UNALIGNED_H
-
-/*
- * This is copied from the generic implementation and the C-struct
- * variant replaced with the memmove variant.  The GCC compiler
- * for the OR32 arch optimizes too aggressively for the C-struct
- * variant to work, so use the memmove variant instead.
- *
- * It may be worth considering implementing the unaligned access
- * exception handler and allowing unaligned accesses (access_ok.h)...
- * not sure if it would be much of a performance win without further
- * investigation.
- */
-#include <asm/byteorder.h>
-
-#if defined(__LITTLE_ENDIAN)
-# include <linux/unaligned/le_memmove.h>
-# include <linux/unaligned/be_byteshift.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned	__get_unaligned_le
-# define put_unaligned	__put_unaligned_le
-#elif defined(__BIG_ENDIAN)
-# include <linux/unaligned/be_memmove.h>
-# include <linux/unaligned/le_byteshift.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned	__get_unaligned_be
-# define put_unaligned	__put_unaligned_be
-#else
-# error need to define endianess
-#endif
-
-#endif /* __ASM_OPENRISC_UNALIGNED_H */
diff --git a/include/linux/unaligned/be_memmove.h b/include/linux/unaligned/be_memmove.h
deleted file mode 100644
index 7164214a4ba1..000000000000
--- a/include/linux/unaligned/be_memmove.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_BE_MEMMOVE_H
-#define _LINUX_UNALIGNED_BE_MEMMOVE_H
-
-#include <linux/unaligned/memmove.h>
-
-static inline u16 get_unaligned_be16(const void *p)
-{
-	return __get_unaligned_memmove16((const u8 *)p);
-}
-
-static inline u32 get_unaligned_be32(const void *p)
-{
-	return __get_unaligned_memmove32((const u8 *)p);
-}
-
-static inline u64 get_unaligned_be64(const void *p)
-{
-	return __get_unaligned_memmove64((const u8 *)p);
-}
-
-static inline void put_unaligned_be16(u16 val, void *p)
-{
-	__put_unaligned_memmove16(val, p);
-}
-
-static inline void put_unaligned_be32(u32 val, void *p)
-{
-	__put_unaligned_memmove32(val, p);
-}
-
-static inline void put_unaligned_be64(u64 val, void *p)
-{
-	__put_unaligned_memmove64(val, p);
-}
-
-#endif /* _LINUX_UNALIGNED_LE_MEMMOVE_H */
diff --git a/include/linux/unaligned/le_memmove.h b/include/linux/unaligned/le_memmove.h
deleted file mode 100644
index 9202e864d026..000000000000
--- a/include/linux/unaligned/le_memmove.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_LE_MEMMOVE_H
-#define _LINUX_UNALIGNED_LE_MEMMOVE_H
-
-#include <linux/unaligned/memmove.h>
-
-static inline u16 get_unaligned_le16(const void *p)
-{
-	return __get_unaligned_memmove16((const u8 *)p);
-}
-
-static inline u32 get_unaligned_le32(const void *p)
-{
-	return __get_unaligned_memmove32((const u8 *)p);
-}
-
-static inline u64 get_unaligned_le64(const void *p)
-{
-	return __get_unaligned_memmove64((const u8 *)p);
-}
-
-static inline void put_unaligned_le16(u16 val, void *p)
-{
-	__put_unaligned_memmove16(val, p);
-}
-
-static inline void put_unaligned_le32(u32 val, void *p)
-{
-	__put_unaligned_memmove32(val, p);
-}
-
-static inline void put_unaligned_le64(u64 val, void *p)
-{
-	__put_unaligned_memmove64(val, p);
-}
-
-#endif /* _LINUX_UNALIGNED_LE_MEMMOVE_H */
diff --git a/include/linux/unaligned/memmove.h b/include/linux/unaligned/memmove.h
deleted file mode 100644
index ac71b53bc6dc..000000000000
--- a/include/linux/unaligned/memmove.h
+++ /dev/null
@@ -1,46 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_MEMMOVE_H
-#define _LINUX_UNALIGNED_MEMMOVE_H
-
-#include <linux/kernel.h>
-#include <linux/string.h>
-
-/* Use memmove here, so gcc does not insert a __builtin_memcpy. */
-
-static inline u16 __get_unaligned_memmove16(const void *p)
-{
-	u16 tmp;
-	memmove(&tmp, p, 2);
-	return tmp;
-}
-
-static inline u32 __get_unaligned_memmove32(const void *p)
-{
-	u32 tmp;
-	memmove(&tmp, p, 4);
-	return tmp;
-}
-
-static inline u64 __get_unaligned_memmove64(const void *p)
-{
-	u64 tmp;
-	memmove(&tmp, p, 8);
-	return tmp;
-}
-
-static inline void __put_unaligned_memmove16(u16 val, void *p)
-{
-	memmove(p, &val, 2);
-}
-
-static inline void __put_unaligned_memmove32(u32 val, void *p)
-{
-	memmove(p, &val, 4);
-}
-
-static inline void __put_unaligned_memmove64(u64 val, void *p)
-{
-	memmove(p, &val, 8);
-}
-
-#endif /* _LINUX_UNALIGNED_MEMMOVE_H */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 03/13] sh: remove unaligned access for sh4a
  2021-05-14 10:00 ` Arnd Bergmann
                   ` (4 preceding siblings ...)
  (?)
@ 2021-05-14 10:00 ` Arnd Bergmann
  2021-05-14 10:34   ` John Paul Adrian Glaubitz
  -1 siblings, 1 reply; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, Yoshinori Sato,
	Rich Felker, linux-sh, linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

Unlike every other architecture, sh4a uses an inline asm implementation
for get_unaligned(). I have shown that this produces better object
code than the asm-generic version. However, there are very few users of
arch/sh/ overall, and most of those seem to use sh4 rather than sh4a CPU
cores, so it seems not worth keeping the complexity in the architecture
independent code.

Change over to the generic version to allow simplifying that in a
follow-up patch.

If there are sh4a users that want the best performance, it would probably
be best to add support for the movua instruction in gcc itself, as this
would not just help get_unaligned() callers but any code that accesses
a __packed variable in user space or kernel.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/sh/include/asm/unaligned-sh4a.h | 199 ---------------------------
 arch/sh/include/asm/unaligned.h      |  13 --
 2 files changed, 212 deletions(-)
 delete mode 100644 arch/sh/include/asm/unaligned-sh4a.h
 delete mode 100644 arch/sh/include/asm/unaligned.h

diff --git a/arch/sh/include/asm/unaligned-sh4a.h b/arch/sh/include/asm/unaligned-sh4a.h
deleted file mode 100644
index d311f00ed530..000000000000
--- a/arch/sh/include/asm/unaligned-sh4a.h
+++ /dev/null
@@ -1,199 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_SH_UNALIGNED_SH4A_H
-#define __ASM_SH_UNALIGNED_SH4A_H
-
-/*
- * SH-4A has support for unaligned 32-bit loads, and 32-bit loads only.
- * Support for 64-bit accesses are done through shifting and masking
- * relative to the endianness. Unaligned stores are not supported by the
- * instruction encoding, so these continue to use the packed
- * struct.
- *
- * The same note as with the movli.l/movco.l pair applies here, as long
- * as the load is guaranteed to be inlined, nothing else will hook in to
- * r0 and we get the return value for free.
- *
- * NOTE: Due to the fact we require r0 encoding, care should be taken to
- * avoid mixing these heavily with other r0 consumers, such as the atomic
- * ops. Failure to adhere to this can result in the compiler running out
- * of spill registers and blowing up when building at low optimization
- * levels. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34777.
- */
-#include <linux/unaligned/packed_struct.h>
-#include <linux/types.h>
-#include <asm/byteorder.h>
-
-static inline u16 sh4a_get_unaligned_cpu16(const u8 *p)
-{
-#ifdef __LITTLE_ENDIAN
-	return p[0] | p[1] << 8;
-#else
-	return p[0] << 8 | p[1];
-#endif
-}
-
-static __always_inline u32 sh4a_get_unaligned_cpu32(const u8 *p)
-{
-	unsigned long unaligned;
-
-	__asm__ __volatile__ (
-		"movua.l	@%1, %0\n\t"
-		 : "=z" (unaligned)
-		 : "r" (p)
-	);
-
-	return unaligned;
-}
-
-/*
- * Even though movua.l supports auto-increment on the read side, it can
- * only store to r0 due to instruction encoding constraints, so just let
- * the compiler sort it out on its own.
- */
-static inline u64 sh4a_get_unaligned_cpu64(const u8 *p)
-{
-#ifdef __LITTLE_ENDIAN
-	return (u64)sh4a_get_unaligned_cpu32(p + 4) << 32 |
-		    sh4a_get_unaligned_cpu32(p);
-#else
-	return (u64)sh4a_get_unaligned_cpu32(p) << 32 |
-		    sh4a_get_unaligned_cpu32(p + 4);
-#endif
-}
-
-static inline u16 get_unaligned_le16(const void *p)
-{
-	return le16_to_cpu(sh4a_get_unaligned_cpu16(p));
-}
-
-static inline u32 get_unaligned_le32(const void *p)
-{
-	return le32_to_cpu(sh4a_get_unaligned_cpu32(p));
-}
-
-static inline u64 get_unaligned_le64(const void *p)
-{
-	return le64_to_cpu(sh4a_get_unaligned_cpu64(p));
-}
-
-static inline u16 get_unaligned_be16(const void *p)
-{
-	return be16_to_cpu(sh4a_get_unaligned_cpu16(p));
-}
-
-static inline u32 get_unaligned_be32(const void *p)
-{
-	return be32_to_cpu(sh4a_get_unaligned_cpu32(p));
-}
-
-static inline u64 get_unaligned_be64(const void *p)
-{
-	return be64_to_cpu(sh4a_get_unaligned_cpu64(p));
-}
-
-static inline void nonnative_put_le16(u16 val, u8 *p)
-{
-	*p++ = val;
-	*p++ = val >> 8;
-}
-
-static inline void nonnative_put_le32(u32 val, u8 *p)
-{
-	nonnative_put_le16(val, p);
-	nonnative_put_le16(val >> 16, p + 2);
-}
-
-static inline void nonnative_put_le64(u64 val, u8 *p)
-{
-	nonnative_put_le32(val, p);
-	nonnative_put_le32(val >> 32, p + 4);
-}
-
-static inline void nonnative_put_be16(u16 val, u8 *p)
-{
-	*p++ = val >> 8;
-	*p++ = val;
-}
-
-static inline void nonnative_put_be32(u32 val, u8 *p)
-{
-	nonnative_put_be16(val >> 16, p);
-	nonnative_put_be16(val, p + 2);
-}
-
-static inline void nonnative_put_be64(u64 val, u8 *p)
-{
-	nonnative_put_be32(val >> 32, p);
-	nonnative_put_be32(val, p + 4);
-}
-
-static inline void put_unaligned_le16(u16 val, void *p)
-{
-#ifdef __LITTLE_ENDIAN
-	__put_unaligned_cpu16(val, p);
-#else
-	nonnative_put_le16(val, p);
-#endif
-}
-
-static inline void put_unaligned_le32(u32 val, void *p)
-{
-#ifdef __LITTLE_ENDIAN
-	__put_unaligned_cpu32(val, p);
-#else
-	nonnative_put_le32(val, p);
-#endif
-}
-
-static inline void put_unaligned_le64(u64 val, void *p)
-{
-#ifdef __LITTLE_ENDIAN
-	__put_unaligned_cpu64(val, p);
-#else
-	nonnative_put_le64(val, p);
-#endif
-}
-
-static inline void put_unaligned_be16(u16 val, void *p)
-{
-#ifdef __BIG_ENDIAN
-	__put_unaligned_cpu16(val, p);
-#else
-	nonnative_put_be16(val, p);
-#endif
-}
-
-static inline void put_unaligned_be32(u32 val, void *p)
-{
-#ifdef __BIG_ENDIAN
-	__put_unaligned_cpu32(val, p);
-#else
-	nonnative_put_be32(val, p);
-#endif
-}
-
-static inline void put_unaligned_be64(u64 val, void *p)
-{
-#ifdef __BIG_ENDIAN
-	__put_unaligned_cpu64(val, p);
-#else
-	nonnative_put_be64(val, p);
-#endif
-}
-
-/*
- * While it's a bit non-obvious, even though the generic le/be wrappers
- * use the __get/put_xxx prefixing, they actually wrap in to the
- * non-prefixed get/put_xxx variants as provided above.
- */
-#include <linux/unaligned/generic.h>
-
-#ifdef __LITTLE_ENDIAN
-# define get_unaligned __get_unaligned_le
-# define put_unaligned __put_unaligned_le
-#else
-# define get_unaligned __get_unaligned_be
-# define put_unaligned __put_unaligned_be
-#endif
-
-#endif /* __ASM_SH_UNALIGNED_SH4A_H */
diff --git a/arch/sh/include/asm/unaligned.h b/arch/sh/include/asm/unaligned.h
deleted file mode 100644
index 0c92e2c73af4..000000000000
--- a/arch/sh/include/asm/unaligned.h
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_SH_UNALIGNED_H
-#define _ASM_SH_UNALIGNED_H
-
-#ifdef CONFIG_CPU_SH4A
-/* SH-4A can handle unaligned loads in a relatively neutered fashion. */
-#include <asm/unaligned-sh4a.h>
-#else
-/* Otherwise, SH can't handle unaligned accesses. */
-#include <asm-generic/unaligned.h>
-#endif
-
-#endif /* _ASM_SH_UNALIGNED_H */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 04/13] m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  2021-05-14 10:00 ` Arnd Bergmann
                   ` (5 preceding siblings ...)
  (?)
@ 2021-05-14 10:00 ` Arnd Bergmann
  -1 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, Geert Uytterhoeven,
	Thomas Gleixner, linux-m68k, linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

All supported CPUs other than the old dragonball and in theory other 68000
derivatives use the include/linux/unaligned/access_ok.h implementation
for accessing unaligned variables, so presumably this works everywhere.

However, m68k never selects CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS,
so none of the other conditionals in the kernel get the optimized
implementation.

Select this based on CPU_HAS_NO_UNALIGNED to make the two settings
always match, and then use the generic version of the header.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
 arch/m68k/Kconfig                 |  1 +
 arch/m68k/include/asm/unaligned.h | 19 -------------------
 2 files changed, 1 insertion(+), 19 deletions(-)
 delete mode 100644 arch/m68k/include/asm/unaligned.h

diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 372e4e69c43a..46089f3b9603 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -21,6 +21,7 @@ config M68K
 	select HAVE_AOUT if MMU
 	select HAVE_ASM_MODVERSIONS
 	select HAVE_DEBUG_BUGVERBOSE
+	select HAVE_EFFICIENT_UNALIGNED_ACCESS if !CPU_HAS_NO_UNALIGNED
 	select HAVE_FUTEX_CMPXCHG if MMU && FUTEX
 	select HAVE_IDE
 	select HAVE_MOD_ARCH_SPECIFIC
diff --git a/arch/m68k/include/asm/unaligned.h b/arch/m68k/include/asm/unaligned.h
deleted file mode 100644
index 84e437337344..000000000000
--- a/arch/m68k/include/asm/unaligned.h
+++ /dev/null
@@ -1,19 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_M68K_UNALIGNED_H
-#define _ASM_M68K_UNALIGNED_H
-
-#ifdef CONFIG_CPU_HAS_NO_UNALIGNED
-#include <asm-generic/unaligned.h>
-#else
-/*
- * The m68k can do unaligned accesses itself.
- */
-#include <linux/unaligned/access_ok.h>
-#include <linux/unaligned/generic.h>
-
-#define get_unaligned	__get_unaligned_be
-#define put_unaligned	__put_unaligned_be
-
-#endif
-
-#endif /* _ASM_M68K_UNALIGNED_H */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 05/13] powerpc: use linux/unaligned/le_struct.h on LE power7
  2021-05-14 10:00 ` Arnd Bergmann
@ 2021-05-14 10:00   ` Arnd Bergmann
  -1 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

Little-endian POWER7 kernels disable
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because that is not supported on
the hardware, but the kernel still uses direct load/store for explicti
get_unaligned()/put_unaligned().

I assume this is a mistake that leads to power7 having to trap and fix
up all these unaligned accesses at a noticeable performance cost.

The fix is completely trivial, just remove the file and use the
generic version that gets it right.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/powerpc/include/asm/unaligned.h | 22 ----------------------
 1 file changed, 22 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/unaligned.h

diff --git a/arch/powerpc/include/asm/unaligned.h b/arch/powerpc/include/asm/unaligned.h
deleted file mode 100644
index ce69c5eff95e..000000000000
--- a/arch/powerpc/include/asm/unaligned.h
+++ /dev/null
@@ -1,22 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_POWERPC_UNALIGNED_H
-#define _ASM_POWERPC_UNALIGNED_H
-
-#ifdef __KERNEL__
-
-/*
- * The PowerPC can do unaligned accesses itself based on its endian mode.
- */
-#include <linux/unaligned/access_ok.h>
-#include <linux/unaligned/generic.h>
-
-#ifdef __LITTLE_ENDIAN__
-#define get_unaligned	__get_unaligned_le
-#define put_unaligned	__put_unaligned_le
-#else
-#define get_unaligned	__get_unaligned_be
-#define put_unaligned	__put_unaligned_be
-#endif
-
-#endif	/* __KERNEL__ */
-#endif	/* _ASM_POWERPC_UNALIGNED_H */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 05/13] powerpc: use linux/unaligned/le_struct.h on LE power7
@ 2021-05-14 10:00   ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Arnd Bergmann, Vineet Gupta, linuxppc-dev, linux-kernel,
	Paul Mackerras, Linus Torvalds

From: Arnd Bergmann <arnd@arndb.de>

Little-endian POWER7 kernels disable
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because that is not supported on
the hardware, but the kernel still uses direct load/store for explicti
get_unaligned()/put_unaligned().

I assume this is a mistake that leads to power7 having to trap and fix
up all these unaligned accesses at a noticeable performance cost.

The fix is completely trivial, just remove the file and use the
generic version that gets it right.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/powerpc/include/asm/unaligned.h | 22 ----------------------
 1 file changed, 22 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/unaligned.h

diff --git a/arch/powerpc/include/asm/unaligned.h b/arch/powerpc/include/asm/unaligned.h
deleted file mode 100644
index ce69c5eff95e..000000000000
--- a/arch/powerpc/include/asm/unaligned.h
+++ /dev/null
@@ -1,22 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_POWERPC_UNALIGNED_H
-#define _ASM_POWERPC_UNALIGNED_H
-
-#ifdef __KERNEL__
-
-/*
- * The PowerPC can do unaligned accesses itself based on its endian mode.
- */
-#include <linux/unaligned/access_ok.h>
-#include <linux/unaligned/generic.h>
-
-#ifdef __LITTLE_ENDIAN__
-#define get_unaligned	__get_unaligned_le
-#define put_unaligned	__put_unaligned_le
-#else
-#define get_unaligned	__get_unaligned_be
-#define put_unaligned	__put_unaligned_be
-#endif
-
-#endif	/* __KERNEL__ */
-#endif	/* _ASM_POWERPC_UNALIGNED_H */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 06/13] asm-generic: unaligned: remove byteshift helpers
  2021-05-14 10:00 ` Arnd Bergmann
@ 2021-05-14 10:00   ` Arnd Bergmann
  -1 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, Russell King,
	Nathan Chancellor, Nick Desaulniers, linux-arm-kernel,
	linux-kernel, clang-built-linux

From: Arnd Bergmann <arnd@arndb.de>

In theory, compilers should be able to work this out themselves so we
can use a simpler version based on the swab() helpers.

I have verified that this works on all supported compiler versions
(gcc-4.9 and up, clang-10 and up). Looking at the object code produced by
gcc-11, I found that the impact is mostly a change in inlining decisions
that lead to slightly larger code.

In other cases, this version produces explicit byte swaps in place of
separate byte access, or comparing against pre-swapped constants.

While the source code is clearly simpler, I have not seen an indication
of the new version actually producing better code on Arm, so maybe
we want to skip this after all. From what I can tell, gcc recognizes
the byteswap pattern in the byteshift.h header and can turn it into
explicit instructions, but it does not turn a __builtin_bswap32() back
into individual bytes when that would result in better output, e.g.
when storing a byte-reversed constant.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/include/asm/unaligned.h       |  2 -
 include/asm-generic/unaligned.h        |  2 -
 include/linux/unaligned/be_byteshift.h | 71 --------------------------
 include/linux/unaligned/be_struct.h    | 30 +++++++++++
 include/linux/unaligned/le_byteshift.h | 71 --------------------------
 include/linux/unaligned/le_struct.h    | 30 +++++++++++
 6 files changed, 60 insertions(+), 146 deletions(-)
 delete mode 100644 include/linux/unaligned/be_byteshift.h
 delete mode 100644 include/linux/unaligned/le_byteshift.h

diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h
index ab905ffcf193..3c5248fb4cdc 100644
--- a/arch/arm/include/asm/unaligned.h
+++ b/arch/arm/include/asm/unaligned.h
@@ -10,13 +10,11 @@
 
 #if defined(__LITTLE_ENDIAN)
 # include <linux/unaligned/le_struct.h>
-# include <linux/unaligned/be_byteshift.h>
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_le
 # define put_unaligned	__put_unaligned_le
 #elif defined(__BIG_ENDIAN)
 # include <linux/unaligned/be_struct.h>
-# include <linux/unaligned/le_byteshift.h>
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_be
 # define put_unaligned	__put_unaligned_be
diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
index 374c940e9be1..d79df721ae60 100644
--- a/include/asm-generic/unaligned.h
+++ b/include/asm-generic/unaligned.h
@@ -16,7 +16,6 @@
 #if defined(__LITTLE_ENDIAN)
 # ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 #  include <linux/unaligned/le_struct.h>
-#  include <linux/unaligned/be_byteshift.h>
 # endif
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_le
@@ -24,7 +23,6 @@
 #elif defined(__BIG_ENDIAN)
 # ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 #  include <linux/unaligned/be_struct.h>
-#  include <linux/unaligned/le_byteshift.h>
 # endif
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_be
diff --git a/include/linux/unaligned/be_byteshift.h b/include/linux/unaligned/be_byteshift.h
deleted file mode 100644
index c43ff5918c8a..000000000000
--- a/include/linux/unaligned/be_byteshift.h
+++ /dev/null
@@ -1,71 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_BE_BYTESHIFT_H
-#define _LINUX_UNALIGNED_BE_BYTESHIFT_H
-
-#include <linux/types.h>
-
-static inline u16 __get_unaligned_be16(const u8 *p)
-{
-	return p[0] << 8 | p[1];
-}
-
-static inline u32 __get_unaligned_be32(const u8 *p)
-{
-	return p[0] << 24 | p[1] << 16 | p[2] << 8 | p[3];
-}
-
-static inline u64 __get_unaligned_be64(const u8 *p)
-{
-	return (u64)__get_unaligned_be32(p) << 32 |
-	       __get_unaligned_be32(p + 4);
-}
-
-static inline void __put_unaligned_be16(u16 val, u8 *p)
-{
-	*p++ = val >> 8;
-	*p++ = val;
-}
-
-static inline void __put_unaligned_be32(u32 val, u8 *p)
-{
-	__put_unaligned_be16(val >> 16, p);
-	__put_unaligned_be16(val, p + 2);
-}
-
-static inline void __put_unaligned_be64(u64 val, u8 *p)
-{
-	__put_unaligned_be32(val >> 32, p);
-	__put_unaligned_be32(val, p + 4);
-}
-
-static inline u16 get_unaligned_be16(const void *p)
-{
-	return __get_unaligned_be16(p);
-}
-
-static inline u32 get_unaligned_be32(const void *p)
-{
-	return __get_unaligned_be32(p);
-}
-
-static inline u64 get_unaligned_be64(const void *p)
-{
-	return __get_unaligned_be64(p);
-}
-
-static inline void put_unaligned_be16(u16 val, void *p)
-{
-	__put_unaligned_be16(val, p);
-}
-
-static inline void put_unaligned_be32(u32 val, void *p)
-{
-	__put_unaligned_be32(val, p);
-}
-
-static inline void put_unaligned_be64(u64 val, void *p)
-{
-	__put_unaligned_be64(val, p);
-}
-
-#endif /* _LINUX_UNALIGNED_BE_BYTESHIFT_H */
diff --git a/include/linux/unaligned/be_struct.h b/include/linux/unaligned/be_struct.h
index 15ea503a13fc..76d9fe297c33 100644
--- a/include/linux/unaligned/be_struct.h
+++ b/include/linux/unaligned/be_struct.h
@@ -34,4 +34,34 @@ static inline void put_unaligned_be64(u64 val, void *p)
 	__put_unaligned_cpu64(val, p);
 }
 
+static inline u16 get_unaligned_le16(const void *p)
+{
+	return swab16(__get_unaligned_cpu16((const u8 *)p));
+}
+
+static inline u32 get_unaligned_le32(const void *p)
+{
+	return swab32(__get_unaligned_cpu32((const u8 *)p));
+}
+
+static inline u64 get_unaligned_le64(const void *p)
+{
+	return swab64(__get_unaligned_cpu64((const u8 *)p));
+}
+
+static inline void put_unaligned_le16(u16 val, void *p)
+{
+	__put_unaligned_cpu16(swab16(val), p);
+}
+
+static inline void put_unaligned_le32(u32 val, void *p)
+{
+	__put_unaligned_cpu32(swab32(val), p);
+}
+
+static inline void put_unaligned_le64(u64 val, void *p)
+{
+	__put_unaligned_cpu64(swab64(val), p);
+}
+
 #endif /* _LINUX_UNALIGNED_BE_STRUCT_H */
diff --git a/include/linux/unaligned/le_byteshift.h b/include/linux/unaligned/le_byteshift.h
deleted file mode 100644
index 2248dcb0df76..000000000000
--- a/include/linux/unaligned/le_byteshift.h
+++ /dev/null
@@ -1,71 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_LE_BYTESHIFT_H
-#define _LINUX_UNALIGNED_LE_BYTESHIFT_H
-
-#include <linux/types.h>
-
-static inline u16 __get_unaligned_le16(const u8 *p)
-{
-	return p[0] | p[1] << 8;
-}
-
-static inline u32 __get_unaligned_le32(const u8 *p)
-{
-	return p[0] | p[1] << 8 | p[2] << 16 | p[3] << 24;
-}
-
-static inline u64 __get_unaligned_le64(const u8 *p)
-{
-	return (u64)__get_unaligned_le32(p + 4) << 32 |
-	       __get_unaligned_le32(p);
-}
-
-static inline void __put_unaligned_le16(u16 val, u8 *p)
-{
-	*p++ = val;
-	*p++ = val >> 8;
-}
-
-static inline void __put_unaligned_le32(u32 val, u8 *p)
-{
-	__put_unaligned_le16(val >> 16, p + 2);
-	__put_unaligned_le16(val, p);
-}
-
-static inline void __put_unaligned_le64(u64 val, u8 *p)
-{
-	__put_unaligned_le32(val >> 32, p + 4);
-	__put_unaligned_le32(val, p);
-}
-
-static inline u16 get_unaligned_le16(const void *p)
-{
-	return __get_unaligned_le16(p);
-}
-
-static inline u32 get_unaligned_le32(const void *p)
-{
-	return __get_unaligned_le32(p);
-}
-
-static inline u64 get_unaligned_le64(const void *p)
-{
-	return __get_unaligned_le64(p);
-}
-
-static inline void put_unaligned_le16(u16 val, void *p)
-{
-	__put_unaligned_le16(val, p);
-}
-
-static inline void put_unaligned_le32(u32 val, void *p)
-{
-	__put_unaligned_le32(val, p);
-}
-
-static inline void put_unaligned_le64(u64 val, void *p)
-{
-	__put_unaligned_le64(val, p);
-}
-
-#endif /* _LINUX_UNALIGNED_LE_BYTESHIFT_H */
diff --git a/include/linux/unaligned/le_struct.h b/include/linux/unaligned/le_struct.h
index 9977987883a6..22f90a4afaa5 100644
--- a/include/linux/unaligned/le_struct.h
+++ b/include/linux/unaligned/le_struct.h
@@ -34,4 +34,34 @@ static inline void put_unaligned_le64(u64 val, void *p)
 	__put_unaligned_cpu64(val, p);
 }
 
+static inline u16 get_unaligned_be16(const void *p)
+{
+	return swab16(__get_unaligned_cpu16((const u8 *)p));
+}
+
+static inline u32 get_unaligned_be32(const void *p)
+{
+	return swab32(__get_unaligned_cpu32((const u8 *)p));
+}
+
+static inline u64 get_unaligned_be64(const void *p)
+{
+	return swab64(__get_unaligned_cpu64((const u8 *)p));
+}
+
+static inline void put_unaligned_be16(u16 val, void *p)
+{
+	__put_unaligned_cpu16(swab16(val), p);
+}
+
+static inline void put_unaligned_be32(u32 val, void *p)
+{
+	__put_unaligned_cpu32(swab32(val), p);
+}
+
+static inline void put_unaligned_be64(u64 val, void *p)
+{
+	__put_unaligned_cpu64(swab64(val), p);
+}
+
 #endif /* _LINUX_UNALIGNED_LE_STRUCT_H */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 06/13] asm-generic: unaligned: remove byteshift helpers
@ 2021-05-14 10:00   ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, Russell King,
	Nathan Chancellor, Nick Desaulniers, linux-arm-kernel,
	linux-kernel, clang-built-linux

From: Arnd Bergmann <arnd@arndb.de>

In theory, compilers should be able to work this out themselves so we
can use a simpler version based on the swab() helpers.

I have verified that this works on all supported compiler versions
(gcc-4.9 and up, clang-10 and up). Looking at the object code produced by
gcc-11, I found that the impact is mostly a change in inlining decisions
that lead to slightly larger code.

In other cases, this version produces explicit byte swaps in place of
separate byte access, or comparing against pre-swapped constants.

While the source code is clearly simpler, I have not seen an indication
of the new version actually producing better code on Arm, so maybe
we want to skip this after all. From what I can tell, gcc recognizes
the byteswap pattern in the byteshift.h header and can turn it into
explicit instructions, but it does not turn a __builtin_bswap32() back
into individual bytes when that would result in better output, e.g.
when storing a byte-reversed constant.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm/include/asm/unaligned.h       |  2 -
 include/asm-generic/unaligned.h        |  2 -
 include/linux/unaligned/be_byteshift.h | 71 --------------------------
 include/linux/unaligned/be_struct.h    | 30 +++++++++++
 include/linux/unaligned/le_byteshift.h | 71 --------------------------
 include/linux/unaligned/le_struct.h    | 30 +++++++++++
 6 files changed, 60 insertions(+), 146 deletions(-)
 delete mode 100644 include/linux/unaligned/be_byteshift.h
 delete mode 100644 include/linux/unaligned/le_byteshift.h

diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h
index ab905ffcf193..3c5248fb4cdc 100644
--- a/arch/arm/include/asm/unaligned.h
+++ b/arch/arm/include/asm/unaligned.h
@@ -10,13 +10,11 @@
 
 #if defined(__LITTLE_ENDIAN)
 # include <linux/unaligned/le_struct.h>
-# include <linux/unaligned/be_byteshift.h>
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_le
 # define put_unaligned	__put_unaligned_le
 #elif defined(__BIG_ENDIAN)
 # include <linux/unaligned/be_struct.h>
-# include <linux/unaligned/le_byteshift.h>
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_be
 # define put_unaligned	__put_unaligned_be
diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
index 374c940e9be1..d79df721ae60 100644
--- a/include/asm-generic/unaligned.h
+++ b/include/asm-generic/unaligned.h
@@ -16,7 +16,6 @@
 #if defined(__LITTLE_ENDIAN)
 # ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 #  include <linux/unaligned/le_struct.h>
-#  include <linux/unaligned/be_byteshift.h>
 # endif
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_le
@@ -24,7 +23,6 @@
 #elif defined(__BIG_ENDIAN)
 # ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 #  include <linux/unaligned/be_struct.h>
-#  include <linux/unaligned/le_byteshift.h>
 # endif
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_be
diff --git a/include/linux/unaligned/be_byteshift.h b/include/linux/unaligned/be_byteshift.h
deleted file mode 100644
index c43ff5918c8a..000000000000
--- a/include/linux/unaligned/be_byteshift.h
+++ /dev/null
@@ -1,71 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_BE_BYTESHIFT_H
-#define _LINUX_UNALIGNED_BE_BYTESHIFT_H
-
-#include <linux/types.h>
-
-static inline u16 __get_unaligned_be16(const u8 *p)
-{
-	return p[0] << 8 | p[1];
-}
-
-static inline u32 __get_unaligned_be32(const u8 *p)
-{
-	return p[0] << 24 | p[1] << 16 | p[2] << 8 | p[3];
-}
-
-static inline u64 __get_unaligned_be64(const u8 *p)
-{
-	return (u64)__get_unaligned_be32(p) << 32 |
-	       __get_unaligned_be32(p + 4);
-}
-
-static inline void __put_unaligned_be16(u16 val, u8 *p)
-{
-	*p++ = val >> 8;
-	*p++ = val;
-}
-
-static inline void __put_unaligned_be32(u32 val, u8 *p)
-{
-	__put_unaligned_be16(val >> 16, p);
-	__put_unaligned_be16(val, p + 2);
-}
-
-static inline void __put_unaligned_be64(u64 val, u8 *p)
-{
-	__put_unaligned_be32(val >> 32, p);
-	__put_unaligned_be32(val, p + 4);
-}
-
-static inline u16 get_unaligned_be16(const void *p)
-{
-	return __get_unaligned_be16(p);
-}
-
-static inline u32 get_unaligned_be32(const void *p)
-{
-	return __get_unaligned_be32(p);
-}
-
-static inline u64 get_unaligned_be64(const void *p)
-{
-	return __get_unaligned_be64(p);
-}
-
-static inline void put_unaligned_be16(u16 val, void *p)
-{
-	__put_unaligned_be16(val, p);
-}
-
-static inline void put_unaligned_be32(u32 val, void *p)
-{
-	__put_unaligned_be32(val, p);
-}
-
-static inline void put_unaligned_be64(u64 val, void *p)
-{
-	__put_unaligned_be64(val, p);
-}
-
-#endif /* _LINUX_UNALIGNED_BE_BYTESHIFT_H */
diff --git a/include/linux/unaligned/be_struct.h b/include/linux/unaligned/be_struct.h
index 15ea503a13fc..76d9fe297c33 100644
--- a/include/linux/unaligned/be_struct.h
+++ b/include/linux/unaligned/be_struct.h
@@ -34,4 +34,34 @@ static inline void put_unaligned_be64(u64 val, void *p)
 	__put_unaligned_cpu64(val, p);
 }
 
+static inline u16 get_unaligned_le16(const void *p)
+{
+	return swab16(__get_unaligned_cpu16((const u8 *)p));
+}
+
+static inline u32 get_unaligned_le32(const void *p)
+{
+	return swab32(__get_unaligned_cpu32((const u8 *)p));
+}
+
+static inline u64 get_unaligned_le64(const void *p)
+{
+	return swab64(__get_unaligned_cpu64((const u8 *)p));
+}
+
+static inline void put_unaligned_le16(u16 val, void *p)
+{
+	__put_unaligned_cpu16(swab16(val), p);
+}
+
+static inline void put_unaligned_le32(u32 val, void *p)
+{
+	__put_unaligned_cpu32(swab32(val), p);
+}
+
+static inline void put_unaligned_le64(u64 val, void *p)
+{
+	__put_unaligned_cpu64(swab64(val), p);
+}
+
 #endif /* _LINUX_UNALIGNED_BE_STRUCT_H */
diff --git a/include/linux/unaligned/le_byteshift.h b/include/linux/unaligned/le_byteshift.h
deleted file mode 100644
index 2248dcb0df76..000000000000
--- a/include/linux/unaligned/le_byteshift.h
+++ /dev/null
@@ -1,71 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_LE_BYTESHIFT_H
-#define _LINUX_UNALIGNED_LE_BYTESHIFT_H
-
-#include <linux/types.h>
-
-static inline u16 __get_unaligned_le16(const u8 *p)
-{
-	return p[0] | p[1] << 8;
-}
-
-static inline u32 __get_unaligned_le32(const u8 *p)
-{
-	return p[0] | p[1] << 8 | p[2] << 16 | p[3] << 24;
-}
-
-static inline u64 __get_unaligned_le64(const u8 *p)
-{
-	return (u64)__get_unaligned_le32(p + 4) << 32 |
-	       __get_unaligned_le32(p);
-}
-
-static inline void __put_unaligned_le16(u16 val, u8 *p)
-{
-	*p++ = val;
-	*p++ = val >> 8;
-}
-
-static inline void __put_unaligned_le32(u32 val, u8 *p)
-{
-	__put_unaligned_le16(val >> 16, p + 2);
-	__put_unaligned_le16(val, p);
-}
-
-static inline void __put_unaligned_le64(u64 val, u8 *p)
-{
-	__put_unaligned_le32(val >> 32, p + 4);
-	__put_unaligned_le32(val, p);
-}
-
-static inline u16 get_unaligned_le16(const void *p)
-{
-	return __get_unaligned_le16(p);
-}
-
-static inline u32 get_unaligned_le32(const void *p)
-{
-	return __get_unaligned_le32(p);
-}
-
-static inline u64 get_unaligned_le64(const void *p)
-{
-	return __get_unaligned_le64(p);
-}
-
-static inline void put_unaligned_le16(u16 val, void *p)
-{
-	__put_unaligned_le16(val, p);
-}
-
-static inline void put_unaligned_le32(u32 val, void *p)
-{
-	__put_unaligned_le32(val, p);
-}
-
-static inline void put_unaligned_le64(u64 val, void *p)
-{
-	__put_unaligned_le64(val, p);
-}
-
-#endif /* _LINUX_UNALIGNED_LE_BYTESHIFT_H */
diff --git a/include/linux/unaligned/le_struct.h b/include/linux/unaligned/le_struct.h
index 9977987883a6..22f90a4afaa5 100644
--- a/include/linux/unaligned/le_struct.h
+++ b/include/linux/unaligned/le_struct.h
@@ -34,4 +34,34 @@ static inline void put_unaligned_le64(u64 val, void *p)
 	__put_unaligned_cpu64(val, p);
 }
 
+static inline u16 get_unaligned_be16(const void *p)
+{
+	return swab16(__get_unaligned_cpu16((const u8 *)p));
+}
+
+static inline u32 get_unaligned_be32(const void *p)
+{
+	return swab32(__get_unaligned_cpu32((const u8 *)p));
+}
+
+static inline u64 get_unaligned_be64(const void *p)
+{
+	return swab64(__get_unaligned_cpu64((const u8 *)p));
+}
+
+static inline void put_unaligned_be16(u16 val, void *p)
+{
+	__put_unaligned_cpu16(swab16(val), p);
+}
+
+static inline void put_unaligned_be32(u32 val, void *p)
+{
+	__put_unaligned_cpu32(swab32(val), p);
+}
+
+static inline void put_unaligned_be64(u64 val, void *p)
+{
+	__put_unaligned_cpu64(swab64(val), p);
+}
+
 #endif /* _LINUX_UNALIGNED_LE_STRUCT_H */
-- 
2.29.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
  2021-05-14 10:00 ` Arnd Bergmann
@ 2021-05-14 10:00   ` Arnd Bergmann
  -1 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, Russell King,
	Herbert Xu, David S. Miller, Thomas Bogendoerfer,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-mips

From: Arnd Bergmann <arnd@arndb.de>

As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected
behavior when faced with overlapping unaligned pointers. The kernel's
unaligned/access-ok.h header technically invokes undefined behavior
that happens to usually work on the architectures using it, but if the
compiler optimizes code based on the assumption that undefined behavior
doesn't happen, it can create output that actually causes data corruption.

A related problem was previously found on 32-bit ARMv7, where most
instructions can be used on unaligned data, but 64-bit ldrd/strd causes
an exception. The workaround was to always use the unaligned/le_struct.h
helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM:
8715/1: add a private asm/unaligned.h").

The same solution should work on all other architectures as well, so
remove the access-ok.h variant and use the other one unconditionally on
all architectures, picking either the big-endian or little-endian version.

With this, the arm specific header can be removed as well, and the
only file including linux/unaligned/access_ok.h gets moved to including
the normal file.

Fortunately, this made almost no difference to the object code produced
by gcc-11. On x86, s390, powerpc, and arc, the resulting binary appears
to be identical to the previous version, while on arm64 and m68k there
are minimal differences that looks like an optimization pass went into
a different direction, usually using fewer stack spills on the new
version.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
---
 arch/arm/include/asm/unaligned.h    | 25 -----------
 arch/mips/crypto/crc32-mips.c       |  2 +-
 include/asm-generic/unaligned.h     | 13 +-----
 include/linux/unaligned/access_ok.h | 68 -----------------------------
 4 files changed, 3 insertions(+), 105 deletions(-)
 delete mode 100644 arch/arm/include/asm/unaligned.h
 delete mode 100644 include/linux/unaligned/access_ok.h

diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h
deleted file mode 100644
index 3c5248fb4cdc..000000000000
--- a/arch/arm/include/asm/unaligned.h
+++ /dev/null
@@ -1,25 +0,0 @@
-#ifndef __ASM_ARM_UNALIGNED_H
-#define __ASM_ARM_UNALIGNED_H
-
-/*
- * We generally want to set CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS on ARMv6+,
- * but we don't want to use linux/unaligned/access_ok.h since that can lead
- * to traps on unaligned stm/ldm or strd/ldrd.
- */
-#include <asm/byteorder.h>
-
-#if defined(__LITTLE_ENDIAN)
-# include <linux/unaligned/le_struct.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned	__get_unaligned_le
-# define put_unaligned	__put_unaligned_le
-#elif defined(__BIG_ENDIAN)
-# include <linux/unaligned/be_struct.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned	__get_unaligned_be
-# define put_unaligned	__put_unaligned_be
-#else
-# error need to define endianess
-#endif
-
-#endif /* __ASM_ARM_UNALIGNED_H */
diff --git a/arch/mips/crypto/crc32-mips.c b/arch/mips/crypto/crc32-mips.c
index faa88a6a74c0..0a03529cf317 100644
--- a/arch/mips/crypto/crc32-mips.c
+++ b/arch/mips/crypto/crc32-mips.c
@@ -8,13 +8,13 @@
  * Copyright (C) 2018 MIPS Tech, LLC
  */
 
-#include <linux/unaligned/access_ok.h>
 #include <linux/cpufeature.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/string.h>
 #include <asm/mipsregs.h>
+#include <asm/unaligned.h>
 
 #include <crypto/internal/hash.h>
 
diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
index d79df721ae60..36bf03aaa674 100644
--- a/include/asm-generic/unaligned.h
+++ b/include/asm-generic/unaligned.h
@@ -8,22 +8,13 @@
  */
 #include <asm/byteorder.h>
 
-/* Set by the arch if it can handle unaligned accesses in hardware. */
-#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-# include <linux/unaligned/access_ok.h>
-#endif
-
 #if defined(__LITTLE_ENDIAN)
-# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-#  include <linux/unaligned/le_struct.h>
-# endif
+# include <linux/unaligned/le_struct.h>
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_le
 # define put_unaligned	__put_unaligned_le
 #elif defined(__BIG_ENDIAN)
-# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-#  include <linux/unaligned/be_struct.h>
-# endif
+# include <linux/unaligned/be_struct.h>
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_be
 # define put_unaligned	__put_unaligned_be
diff --git a/include/linux/unaligned/access_ok.h b/include/linux/unaligned/access_ok.h
deleted file mode 100644
index 167aa849c0ce..000000000000
--- a/include/linux/unaligned/access_ok.h
+++ /dev/null
@@ -1,68 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_ACCESS_OK_H
-#define _LINUX_UNALIGNED_ACCESS_OK_H
-
-#include <linux/kernel.h>
-#include <asm/byteorder.h>
-
-static __always_inline u16 get_unaligned_le16(const void *p)
-{
-	return le16_to_cpup((__le16 *)p);
-}
-
-static __always_inline u32 get_unaligned_le32(const void *p)
-{
-	return le32_to_cpup((__le32 *)p);
-}
-
-static __always_inline u64 get_unaligned_le64(const void *p)
-{
-	return le64_to_cpup((__le64 *)p);
-}
-
-static __always_inline u16 get_unaligned_be16(const void *p)
-{
-	return be16_to_cpup((__be16 *)p);
-}
-
-static __always_inline u32 get_unaligned_be32(const void *p)
-{
-	return be32_to_cpup((__be32 *)p);
-}
-
-static __always_inline u64 get_unaligned_be64(const void *p)
-{
-	return be64_to_cpup((__be64 *)p);
-}
-
-static __always_inline void put_unaligned_le16(u16 val, void *p)
-{
-	*((__le16 *)p) = cpu_to_le16(val);
-}
-
-static __always_inline void put_unaligned_le32(u32 val, void *p)
-{
-	*((__le32 *)p) = cpu_to_le32(val);
-}
-
-static __always_inline void put_unaligned_le64(u64 val, void *p)
-{
-	*((__le64 *)p) = cpu_to_le64(val);
-}
-
-static __always_inline void put_unaligned_be16(u16 val, void *p)
-{
-	*((__be16 *)p) = cpu_to_be16(val);
-}
-
-static __always_inline void put_unaligned_be32(u32 val, void *p)
-{
-	*((__be32 *)p) = cpu_to_be32(val);
-}
-
-static __always_inline void put_unaligned_be64(u64 val, void *p)
-{
-	*((__be64 *)p) = cpu_to_be64(val);
-}
-
-#endif /* _LINUX_UNALIGNED_ACCESS_OK_H */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
@ 2021-05-14 10:00   ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, Russell King,
	Herbert Xu, David S. Miller, Thomas Bogendoerfer,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-mips

From: Arnd Bergmann <arnd@arndb.de>

As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected
behavior when faced with overlapping unaligned pointers. The kernel's
unaligned/access-ok.h header technically invokes undefined behavior
that happens to usually work on the architectures using it, but if the
compiler optimizes code based on the assumption that undefined behavior
doesn't happen, it can create output that actually causes data corruption.

A related problem was previously found on 32-bit ARMv7, where most
instructions can be used on unaligned data, but 64-bit ldrd/strd causes
an exception. The workaround was to always use the unaligned/le_struct.h
helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM:
8715/1: add a private asm/unaligned.h").

The same solution should work on all other architectures as well, so
remove the access-ok.h variant and use the other one unconditionally on
all architectures, picking either the big-endian or little-endian version.

With this, the arm specific header can be removed as well, and the
only file including linux/unaligned/access_ok.h gets moved to including
the normal file.

Fortunately, this made almost no difference to the object code produced
by gcc-11. On x86, s390, powerpc, and arc, the resulting binary appears
to be identical to the previous version, while on arm64 and m68k there
are minimal differences that looks like an optimization pass went into
a different direction, usually using fewer stack spills on the new
version.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
---
 arch/arm/include/asm/unaligned.h    | 25 -----------
 arch/mips/crypto/crc32-mips.c       |  2 +-
 include/asm-generic/unaligned.h     | 13 +-----
 include/linux/unaligned/access_ok.h | 68 -----------------------------
 4 files changed, 3 insertions(+), 105 deletions(-)
 delete mode 100644 arch/arm/include/asm/unaligned.h
 delete mode 100644 include/linux/unaligned/access_ok.h

diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h
deleted file mode 100644
index 3c5248fb4cdc..000000000000
--- a/arch/arm/include/asm/unaligned.h
+++ /dev/null
@@ -1,25 +0,0 @@
-#ifndef __ASM_ARM_UNALIGNED_H
-#define __ASM_ARM_UNALIGNED_H
-
-/*
- * We generally want to set CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS on ARMv6+,
- * but we don't want to use linux/unaligned/access_ok.h since that can lead
- * to traps on unaligned stm/ldm or strd/ldrd.
- */
-#include <asm/byteorder.h>
-
-#if defined(__LITTLE_ENDIAN)
-# include <linux/unaligned/le_struct.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned	__get_unaligned_le
-# define put_unaligned	__put_unaligned_le
-#elif defined(__BIG_ENDIAN)
-# include <linux/unaligned/be_struct.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned	__get_unaligned_be
-# define put_unaligned	__put_unaligned_be
-#else
-# error need to define endianess
-#endif
-
-#endif /* __ASM_ARM_UNALIGNED_H */
diff --git a/arch/mips/crypto/crc32-mips.c b/arch/mips/crypto/crc32-mips.c
index faa88a6a74c0..0a03529cf317 100644
--- a/arch/mips/crypto/crc32-mips.c
+++ b/arch/mips/crypto/crc32-mips.c
@@ -8,13 +8,13 @@
  * Copyright (C) 2018 MIPS Tech, LLC
  */
 
-#include <linux/unaligned/access_ok.h>
 #include <linux/cpufeature.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/string.h>
 #include <asm/mipsregs.h>
+#include <asm/unaligned.h>
 
 #include <crypto/internal/hash.h>
 
diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
index d79df721ae60..36bf03aaa674 100644
--- a/include/asm-generic/unaligned.h
+++ b/include/asm-generic/unaligned.h
@@ -8,22 +8,13 @@
  */
 #include <asm/byteorder.h>
 
-/* Set by the arch if it can handle unaligned accesses in hardware. */
-#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-# include <linux/unaligned/access_ok.h>
-#endif
-
 #if defined(__LITTLE_ENDIAN)
-# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-#  include <linux/unaligned/le_struct.h>
-# endif
+# include <linux/unaligned/le_struct.h>
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_le
 # define put_unaligned	__put_unaligned_le
 #elif defined(__BIG_ENDIAN)
-# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-#  include <linux/unaligned/be_struct.h>
-# endif
+# include <linux/unaligned/be_struct.h>
 # include <linux/unaligned/generic.h>
 # define get_unaligned	__get_unaligned_be
 # define put_unaligned	__put_unaligned_be
diff --git a/include/linux/unaligned/access_ok.h b/include/linux/unaligned/access_ok.h
deleted file mode 100644
index 167aa849c0ce..000000000000
--- a/include/linux/unaligned/access_ok.h
+++ /dev/null
@@ -1,68 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_ACCESS_OK_H
-#define _LINUX_UNALIGNED_ACCESS_OK_H
-
-#include <linux/kernel.h>
-#include <asm/byteorder.h>
-
-static __always_inline u16 get_unaligned_le16(const void *p)
-{
-	return le16_to_cpup((__le16 *)p);
-}
-
-static __always_inline u32 get_unaligned_le32(const void *p)
-{
-	return le32_to_cpup((__le32 *)p);
-}
-
-static __always_inline u64 get_unaligned_le64(const void *p)
-{
-	return le64_to_cpup((__le64 *)p);
-}
-
-static __always_inline u16 get_unaligned_be16(const void *p)
-{
-	return be16_to_cpup((__be16 *)p);
-}
-
-static __always_inline u32 get_unaligned_be32(const void *p)
-{
-	return be32_to_cpup((__be32 *)p);
-}
-
-static __always_inline u64 get_unaligned_be64(const void *p)
-{
-	return be64_to_cpup((__be64 *)p);
-}
-
-static __always_inline void put_unaligned_le16(u16 val, void *p)
-{
-	*((__le16 *)p) = cpu_to_le16(val);
-}
-
-static __always_inline void put_unaligned_le32(u32 val, void *p)
-{
-	*((__le32 *)p) = cpu_to_le32(val);
-}
-
-static __always_inline void put_unaligned_le64(u64 val, void *p)
-{
-	*((__le64 *)p) = cpu_to_le64(val);
-}
-
-static __always_inline void put_unaligned_be16(u16 val, void *p)
-{
-	*((__be16 *)p) = cpu_to_be16(val);
-}
-
-static __always_inline void put_unaligned_be32(u32 val, void *p)
-{
-	*((__be32 *)p) = cpu_to_be32(val);
-}
-
-static __always_inline void put_unaligned_be64(u64 val, void *p)
-{
-	*((__be64 *)p) = cpu_to_be64(val);
-}
-
-#endif /* _LINUX_UNALIGNED_ACCESS_OK_H */
-- 
2.29.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 08/13] partitions: msdos: fix one-byte get_unaligned()
  2021-05-14 10:00 ` Arnd Bergmann
                   ` (9 preceding siblings ...)
  (?)
@ 2021-05-14 10:00 ` Arnd Bergmann
  2021-05-17 10:28   ` Christoph Hellwig
  -1 siblings, 1 reply; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann,
	Richard Russon (FlatCap),
	Jens Axboe, linux-ntfs-dev, linux-block, linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

A simplification of get_unaligned() clashes with callers that pass
in a character pointer, causing a harmless warning like:

block/partitions/msdos.c: In function 'msdos_partition':
include/asm-generic/unaligned.h:13:22: warning: 'packed' attribute ignored for field of type 'u8' {aka 'unsigned char'} [-Wattributes]

Remove the get_unaligned() call and just use the byte directly.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 block/partitions/ldm.h   | 2 +-
 block/partitions/msdos.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/partitions/ldm.h b/block/partitions/ldm.h
index d8d6beaa72c4..1a77ff09cc5f 100644
--- a/block/partitions/ldm.h
+++ b/block/partitions/ldm.h
@@ -85,7 +85,7 @@ struct parsed_partitions;
 #define TOC_BITMAP2		"log"		/* bitmaps in the TOCBLOCK. */
 
 /* Borrowed from msdos.c */
-#define SYS_IND(p)		(get_unaligned(&(p)->sys_ind))
+#define SYS_IND(p)		((p)->sys_ind)
 
 struct frag {				/* VBLK Fragment handling */
 	struct list_head list;
diff --git a/block/partitions/msdos.c b/block/partitions/msdos.c
index 8f2fcc080264..d78549d7619d 100644
--- a/block/partitions/msdos.c
+++ b/block/partitions/msdos.c
@@ -38,7 +38,7 @@
  */
 #include <asm/unaligned.h>
 
-#define SYS_IND(p)	get_unaligned(&p->sys_ind)
+#define SYS_IND(p)	(p->sys_ind)
 
 static inline sector_t nr_sects(struct msdos_partition *p)
 {
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 09/13] apparmor: use get_unaligned() only for multi-byte words
  2021-05-14 10:00 ` Arnd Bergmann
                   ` (10 preceding siblings ...)
  (?)
@ 2021-05-14 10:00 ` Arnd Bergmann
  -1 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, John Johansen,
	James Morris, Serge E. Hallyn, linux-security-module,
	linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

Using get_unaligned() on a u8 pointer is pointless, and will
result in a compiler warning after a planned cleanup:

In file included from arch/x86/include/generated/asm/unaligned.h:1,
                 from security/apparmor/policy_unpack.c:16:
security/apparmor/policy_unpack.c: In function 'unpack_u8':
include/asm-generic/unaligned.h:13:15: error: 'packed' attribute ignored for field of type 'u8' {aka 'unsigned char'} [-Werror=attributes]
   13 |  const struct { type x __packed; } *__pptr = (typeof(__pptr))(ptr); \
      |               ^

Simply dereference this pointer directly.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: John Johansen <john.johansen@canonical.com>
---
 security/apparmor/policy_unpack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/apparmor/policy_unpack.c b/security/apparmor/policy_unpack.c
index dc345ac93205..4e1f96b216a8 100644
--- a/security/apparmor/policy_unpack.c
+++ b/security/apparmor/policy_unpack.c
@@ -304,7 +304,7 @@ static bool unpack_u8(struct aa_ext *e, u8 *data, const char *name)
 		if (!inbounds(e, sizeof(u8)))
 			goto fail;
 		if (data)
-			*data = get_unaligned((u8 *)e->pos);
+			*data = *((u8 *)e->pos);
 		e->pos += sizeof(u8);
 		return true;
 	}
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 10/13] mwifiex: re-fix for unaligned accesses
  2021-05-14 10:00 ` Arnd Bergmann
                   ` (11 preceding siblings ...)
  (?)
@ 2021-05-14 10:00 ` Arnd Bergmann
  2021-05-15  6:22   ` Kalle Valo
  -1 siblings, 1 reply; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, Amitkumar Karwar,
	Ganapathi Bhat, Sharvari Harisangam, Xinming Hu, Kalle Valo,
	David S. Miller, Jakub Kicinski, Devidas Puranik, linux-wireless,
	netdev, linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

A patch from 2017 changed some accesses to DMA memory to use
get_unaligned_le32() and similar interfaces, to avoid problems
with doing unaligned accesson uncached memory.

However, the change in the mwifiex_pcie_alloc_sleep_cookie_buf()
function ended up changing the size of the access instead,
as it operates on a pointer to u8.

Change this function back to actually access the entire 32 bits.
Note that the pointer is aligned by definition because it came
from dma_alloc_coherent().

Fixes: 92c70a958b0b ("mwifiex: fix for unaligned reads")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index 94228b316df1..46517515ba72 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -1231,7 +1231,7 @@ static int mwifiex_pcie_delete_cmdrsp_buf(struct mwifiex_adapter *adapter)
 static int mwifiex_pcie_alloc_sleep_cookie_buf(struct mwifiex_adapter *adapter)
 {
 	struct pcie_service_card *card = adapter->card;
-	u32 tmp;
+	u32 *cookie;
 
 	card->sleep_cookie_vbase = dma_alloc_coherent(&card->dev->dev,
 						      sizeof(u32),
@@ -1242,13 +1242,11 @@ static int mwifiex_pcie_alloc_sleep_cookie_buf(struct mwifiex_adapter *adapter)
 			    "dma_alloc_coherent failed!\n");
 		return -ENOMEM;
 	}
+	cookie = (u32 *)card->sleep_cookie_vbase;
 	/* Init val of Sleep Cookie */
-	tmp = FW_AWAKE_COOKIE;
-	put_unaligned(tmp, card->sleep_cookie_vbase);
+	*cookie = FW_AWAKE_COOKIE;
 
-	mwifiex_dbg(adapter, INFO,
-		    "alloc_scook: sleep cookie=0x%x\n",
-		    get_unaligned(card->sleep_cookie_vbase));
+	mwifiex_dbg(adapter, INFO, "alloc_scook: sleep cookie=0x%x\n", *cookie);
 
 	return 0;
 }
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 11/13] netpoll: avoid put_unaligned() on single character
  2021-05-14 10:00 ` Arnd Bergmann
                   ` (12 preceding siblings ...)
  (?)
@ 2021-05-14 10:00 ` Arnd Bergmann
  -1 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:00 UTC (permalink / raw)
  To: linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, David S. Miller,
	Jakub Kicinski, Florian Fainelli, Andrew Lunn, Vladimir Oltean,
	netdev, linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

With a planned cleanup, using put_unaligned() on a single character
results in a harmless warning:

In file included from ./arch/x86/include/generated/asm/unaligned.h:1,
                 from include/linux/etherdevice.h:24,
                 from net/core/netpoll.c:18:
net/core/netpoll.c: In function 'netpoll_send_udp':
include/asm-generic/unaligned.h:23:9: error: 'packed' attribute ignored for field of type 'unsigned char' [-Werror=attributes]
net/core/netpoll.c:431:3: note: in expansion of macro 'put_unaligned'
  431 |   put_unaligned(0x60, (unsigned char *)ip6h);
      |   ^~~~~~~~~~~~~
include/asm-generic/unaligned.h:23:9: error: 'packed' attribute ignored for field of type 'unsigned char' [-Werror=attributes]
net/core/netpoll.c:459:3: note: in expansion of macro 'put_unaligned'
  459 |   put_unaligned(0x45, (unsigned char *)iph);
      |   ^~~~~~~~~~~~~

Replace this with an open-coded pointer dereference.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 net/core/netpoll.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index c310c7c1cef7..9c49a38fa315 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -428,7 +428,7 @@ void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
 		ip6h = ipv6_hdr(skb);
 
 		/* ip6h->version = 6; ip6h->priority = 0; */
-		put_unaligned(0x60, (unsigned char *)ip6h);
+		*(unsigned char *)ip6h = 0x60;
 		ip6h->flow_lbl[0] = 0;
 		ip6h->flow_lbl[1] = 0;
 		ip6h->flow_lbl[2] = 0;
@@ -456,7 +456,7 @@ void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
 		iph = ip_hdr(skb);
 
 		/* iph->version = 4; iph->ihl = 5; */
-		put_unaligned(0x45, (unsigned char *)iph);
+		*(unsigned char *)iph = 0x45;
 		iph->tos      = 0;
 		put_unaligned(htons(ip_len), &(iph->tot_len));
 		iph->id       = htons(atomic_inc_return(&ip_ident));
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 12/13] asm-generic: uaccess: 1-byte access is always aligned
  2021-05-14 10:00 ` Arnd Bergmann
                   ` (13 preceding siblings ...)
  (?)
@ 2021-05-14 10:01 ` Arnd Bergmann
  2021-05-15 18:41   ` Randy Dunlap
  -1 siblings, 1 reply; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:01 UTC (permalink / raw)
  To: linux-arch; +Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

With the cleaned up version of asm-generic/unaligned.h,
there is a warning about the get_user/put_user helpers using
unaligned access for single-byte variables:

include/asm-generic/uaccess.h: In function ‘__get_user_fn’:
include/asm-generic/unaligned.h:13:15: warning: ‘packed’ attribute ignored for field of type ‘u8’ {aka ‘unsigned char’} [-Wattributes]
  const struct { type x __packed; } *__pptr = (typeof(__pptr))(ptr); \

Change these to use a direct pointer dereference to avoid the
warnings.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 include/asm-generic/uaccess.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/uaccess.h b/include/asm-generic/uaccess.h
index 4973328f3c6e..7e903e450659 100644
--- a/include/asm-generic/uaccess.h
+++ b/include/asm-generic/uaccess.h
@@ -19,7 +19,7 @@ __get_user_fn(size_t size, const void __user *from, void *to)
 
 	switch (size) {
 	case 1:
-		*(u8 *)to = get_unaligned((u8 __force *)from);
+		*(u8 *)to = *((u8 __force *)from);
 		return 0;
 	case 2:
 		*(u16 *)to = get_unaligned((u16 __force *)from);
@@ -45,7 +45,7 @@ __put_user_fn(size_t size, void __user *to, void *from)
 
 	switch (size) {
 	case 1:
-		put_unaligned(*(u8 *)from, (u8 __force *)to);
+		*(*(u8 *)from, (u8 __force *)to);
 		return 0;
 	case 2:
 		put_unaligned(*(u16 *)from, (u16 __force *)to);
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH v2 13/13] asm-generic: simplify asm/unaligned.h
  2021-05-14 10:00 ` Arnd Bergmann
                   ` (14 preceding siblings ...)
  (?)
@ 2021-05-14 10:01 ` Arnd Bergmann
  2021-05-14 10:35   ` David Laight
  -1 siblings, 1 reply; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 10:01 UTC (permalink / raw)
  To: linux-arch; +Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

The get_unaligned()/put_unaligned() implementations are much more complex
than necessary, now that all architectures use the same code.

Move everything into one file and use a much more compact way to express
the same logic.

I've compared the binary output using gcc-11 across defconfig builds for
all architectures and found this patch to make no difference, except for
a single function on powerpc that needs two additional register moves
because of random differences in register allocation.

There are a handful of callers of the low-level __get_unaligned_cpu32,
so leave that in place for the time being even though the common code
no longer uses it.

This adds a warning for any caller of get_unaligned()/put_unaligned()
that passes in a single-byte pointer, but I've sent patches for all
instances that show up in x86 and randconfig builds. It would be nice
to change the arguments of the endian-specific accessors to take the
matching __be16/__be32/__be64/__le16/__le32/__le64 arguments instead of
a void pointer, but that requires more changes to the rest of the kernel.

This new version does allow aggregate types into get_unaligned(), which
was not the original goal but might come in handy.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 include/asm-generic/unaligned.h     | 130 +++++++++++++++++++++++++---
 include/linux/unaligned/be_struct.h |  67 --------------
 include/linux/unaligned/generic.h   | 115 ------------------------
 include/linux/unaligned/le_struct.h |  67 --------------
 4 files changed, 117 insertions(+), 262 deletions(-)
 delete mode 100644 include/linux/unaligned/be_struct.h
 delete mode 100644 include/linux/unaligned/generic.h
 delete mode 100644 include/linux/unaligned/le_struct.h

diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
index 36bf03aaa674..1c4242416c9f 100644
--- a/include/asm-generic/unaligned.h
+++ b/include/asm-generic/unaligned.h
@@ -6,20 +6,124 @@
  * This is the most generic implementation of unaligned accesses
  * and should work almost anywhere.
  */
+#include <linux/unaligned/packed_struct.h>
 #include <asm/byteorder.h>
 
-#if defined(__LITTLE_ENDIAN)
-# include <linux/unaligned/le_struct.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned	__get_unaligned_le
-# define put_unaligned	__put_unaligned_le
-#elif defined(__BIG_ENDIAN)
-# include <linux/unaligned/be_struct.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned	__get_unaligned_be
-# define put_unaligned	__put_unaligned_be
-#else
-# error need to define endianess
-#endif
+#define __get_unaligned_t(type, ptr) ({						\
+	const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr);	\
+	__pptr->x;								\
+})
+
+#define __put_unaligned_t(type, val, ptr) do {					\
+	struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr);		\
+	__pptr->x = (val);							\
+} while (0)
+
+#define get_unaligned(ptr)	__get_unaligned_t(typeof(*(ptr)), (ptr))
+#define put_unaligned(val, ptr) __put_unaligned_t(typeof(*(ptr)), (val), (ptr))
+
+static inline u16 get_unaligned_le16(const void *p)
+{
+	return le16_to_cpu(__get_unaligned_t(__le16, p));
+}
+
+static inline u32 get_unaligned_le32(const void *p)
+{
+	return le32_to_cpu(__get_unaligned_t(__le32, p));
+}
+
+static inline u64 get_unaligned_le64(const void *p)
+{
+	return le64_to_cpu(__get_unaligned_t(__le64, p));
+}
+
+static inline void put_unaligned_le16(u16 val, void *p)
+{
+	__put_unaligned_t(__le16, cpu_to_le16(val), p);
+}
+
+static inline void put_unaligned_le32(u32 val, void *p)
+{
+	__put_unaligned_t(__le32, cpu_to_le32(val), p);
+}
+
+static inline void put_unaligned_le64(u64 val, void *p)
+{
+	__put_unaligned_t(__le64, cpu_to_le64(val), p);
+}
+
+static inline u16 get_unaligned_be16(const void *p)
+{
+	return be16_to_cpu(__get_unaligned_t(__be16, p));
+}
+
+static inline u32 get_unaligned_be32(const void *p)
+{
+	return be32_to_cpu(__get_unaligned_t(__be32, p));
+}
+
+static inline u64 get_unaligned_be64(const void *p)
+{
+	return be64_to_cpu(__get_unaligned_t(__be64, p));
+}
+
+static inline void put_unaligned_be16(u16 val, void *p)
+{
+	__put_unaligned_t(__be16, cpu_to_be16(val), p);
+}
+
+static inline void put_unaligned_be32(u32 val, void *p)
+{
+	__put_unaligned_t(__be32, cpu_to_be32(val), p);
+}
+
+static inline void put_unaligned_be64(u64 val, void *p)
+{
+	__put_unaligned_t(__be64, cpu_to_be64(val), p);
+}
+
+static inline u32 __get_unaligned_be24(const u8 *p)
+{
+	return p[0] << 16 | p[1] << 8 | p[2];
+}
+
+static inline u32 get_unaligned_be24(const void *p)
+{
+	return __get_unaligned_be24(p);
+}
+
+static inline u32 __get_unaligned_le24(const u8 *p)
+{
+	return p[0] | p[1] << 8 | p[2] << 16;
+}
+
+static inline u32 get_unaligned_le24(const void *p)
+{
+	return __get_unaligned_le24(p);
+}
+
+static inline void __put_unaligned_be24(const u32 val, u8 *p)
+{
+	*p++ = val >> 16;
+	*p++ = val >> 8;
+	*p++ = val;
+}
+
+static inline void put_unaligned_be24(const u32 val, void *p)
+{
+	__put_unaligned_be24(val, p);
+}
+
+static inline void __put_unaligned_le24(const u32 val, u8 *p)
+{
+	*p++ = val;
+	*p++ = val >> 8;
+	*p++ = val >> 16;
+}
+
+static inline void put_unaligned_le24(const u32 val, void *p)
+{
+	__put_unaligned_le24(val, p);
+}
 
 #endif /* __ASM_GENERIC_UNALIGNED_H */
diff --git a/include/linux/unaligned/be_struct.h b/include/linux/unaligned/be_struct.h
deleted file mode 100644
index 76d9fe297c33..000000000000
--- a/include/linux/unaligned/be_struct.h
+++ /dev/null
@@ -1,67 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_BE_STRUCT_H
-#define _LINUX_UNALIGNED_BE_STRUCT_H
-
-#include <linux/unaligned/packed_struct.h>
-
-static inline u16 get_unaligned_be16(const void *p)
-{
-	return __get_unaligned_cpu16((const u8 *)p);
-}
-
-static inline u32 get_unaligned_be32(const void *p)
-{
-	return __get_unaligned_cpu32((const u8 *)p);
-}
-
-static inline u64 get_unaligned_be64(const void *p)
-{
-	return __get_unaligned_cpu64((const u8 *)p);
-}
-
-static inline void put_unaligned_be16(u16 val, void *p)
-{
-	__put_unaligned_cpu16(val, p);
-}
-
-static inline void put_unaligned_be32(u32 val, void *p)
-{
-	__put_unaligned_cpu32(val, p);
-}
-
-static inline void put_unaligned_be64(u64 val, void *p)
-{
-	__put_unaligned_cpu64(val, p);
-}
-
-static inline u16 get_unaligned_le16(const void *p)
-{
-	return swab16(__get_unaligned_cpu16((const u8 *)p));
-}
-
-static inline u32 get_unaligned_le32(const void *p)
-{
-	return swab32(__get_unaligned_cpu32((const u8 *)p));
-}
-
-static inline u64 get_unaligned_le64(const void *p)
-{
-	return swab64(__get_unaligned_cpu64((const u8 *)p));
-}
-
-static inline void put_unaligned_le16(u16 val, void *p)
-{
-	__put_unaligned_cpu16(swab16(val), p);
-}
-
-static inline void put_unaligned_le32(u32 val, void *p)
-{
-	__put_unaligned_cpu32(swab32(val), p);
-}
-
-static inline void put_unaligned_le64(u64 val, void *p)
-{
-	__put_unaligned_cpu64(swab64(val), p);
-}
-
-#endif /* _LINUX_UNALIGNED_BE_STRUCT_H */
diff --git a/include/linux/unaligned/generic.h b/include/linux/unaligned/generic.h
deleted file mode 100644
index 303289492859..000000000000
--- a/include/linux/unaligned/generic.h
+++ /dev/null
@@ -1,115 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_GENERIC_H
-#define _LINUX_UNALIGNED_GENERIC_H
-
-#include <linux/types.h>
-
-/*
- * Cause a link-time error if we try an unaligned access other than
- * 1,2,4 or 8 bytes long
- */
-extern void __bad_unaligned_access_size(void);
-
-#define __get_unaligned_le(ptr) ((__force typeof(*(ptr)))({			\
-	__builtin_choose_expr(sizeof(*(ptr)) == 1, *(ptr),			\
-	__builtin_choose_expr(sizeof(*(ptr)) == 2, get_unaligned_le16((ptr)),	\
-	__builtin_choose_expr(sizeof(*(ptr)) == 4, get_unaligned_le32((ptr)),	\
-	__builtin_choose_expr(sizeof(*(ptr)) == 8, get_unaligned_le64((ptr)),	\
-	__bad_unaligned_access_size()))));					\
-	}))
-
-#define __get_unaligned_be(ptr) ((__force typeof(*(ptr)))({			\
-	__builtin_choose_expr(sizeof(*(ptr)) == 1, *(ptr),			\
-	__builtin_choose_expr(sizeof(*(ptr)) == 2, get_unaligned_be16((ptr)),	\
-	__builtin_choose_expr(sizeof(*(ptr)) == 4, get_unaligned_be32((ptr)),	\
-	__builtin_choose_expr(sizeof(*(ptr)) == 8, get_unaligned_be64((ptr)),	\
-	__bad_unaligned_access_size()))));					\
-	}))
-
-#define __put_unaligned_le(val, ptr) ({					\
-	void *__gu_p = (ptr);						\
-	switch (sizeof(*(ptr))) {					\
-	case 1:								\
-		*(u8 *)__gu_p = (__force u8)(val);			\
-		break;							\
-	case 2:								\
-		put_unaligned_le16((__force u16)(val), __gu_p);		\
-		break;							\
-	case 4:								\
-		put_unaligned_le32((__force u32)(val), __gu_p);		\
-		break;							\
-	case 8:								\
-		put_unaligned_le64((__force u64)(val), __gu_p);		\
-		break;							\
-	default:							\
-		__bad_unaligned_access_size();				\
-		break;							\
-	}								\
-	(void)0; })
-
-#define __put_unaligned_be(val, ptr) ({					\
-	void *__gu_p = (ptr);						\
-	switch (sizeof(*(ptr))) {					\
-	case 1:								\
-		*(u8 *)__gu_p = (__force u8)(val);			\
-		break;							\
-	case 2:								\
-		put_unaligned_be16((__force u16)(val), __gu_p);		\
-		break;							\
-	case 4:								\
-		put_unaligned_be32((__force u32)(val), __gu_p);		\
-		break;							\
-	case 8:								\
-		put_unaligned_be64((__force u64)(val), __gu_p);		\
-		break;							\
-	default:							\
-		__bad_unaligned_access_size();				\
-		break;							\
-	}								\
-	(void)0; })
-
-static inline u32 __get_unaligned_be24(const u8 *p)
-{
-	return p[0] << 16 | p[1] << 8 | p[2];
-}
-
-static inline u32 get_unaligned_be24(const void *p)
-{
-	return __get_unaligned_be24(p);
-}
-
-static inline u32 __get_unaligned_le24(const u8 *p)
-{
-	return p[0] | p[1] << 8 | p[2] << 16;
-}
-
-static inline u32 get_unaligned_le24(const void *p)
-{
-	return __get_unaligned_le24(p);
-}
-
-static inline void __put_unaligned_be24(const u32 val, u8 *p)
-{
-	*p++ = val >> 16;
-	*p++ = val >> 8;
-	*p++ = val;
-}
-
-static inline void put_unaligned_be24(const u32 val, void *p)
-{
-	__put_unaligned_be24(val, p);
-}
-
-static inline void __put_unaligned_le24(const u32 val, u8 *p)
-{
-	*p++ = val;
-	*p++ = val >> 8;
-	*p++ = val >> 16;
-}
-
-static inline void put_unaligned_le24(const u32 val, void *p)
-{
-	__put_unaligned_le24(val, p);
-}
-
-#endif /* _LINUX_UNALIGNED_GENERIC_H */
diff --git a/include/linux/unaligned/le_struct.h b/include/linux/unaligned/le_struct.h
deleted file mode 100644
index 22f90a4afaa5..000000000000
--- a/include/linux/unaligned/le_struct.h
+++ /dev/null
@@ -1,67 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_LE_STRUCT_H
-#define _LINUX_UNALIGNED_LE_STRUCT_H
-
-#include <linux/unaligned/packed_struct.h>
-
-static inline u16 get_unaligned_le16(const void *p)
-{
-	return __get_unaligned_cpu16((const u8 *)p);
-}
-
-static inline u32 get_unaligned_le32(const void *p)
-{
-	return __get_unaligned_cpu32((const u8 *)p);
-}
-
-static inline u64 get_unaligned_le64(const void *p)
-{
-	return __get_unaligned_cpu64((const u8 *)p);
-}
-
-static inline void put_unaligned_le16(u16 val, void *p)
-{
-	__put_unaligned_cpu16(val, p);
-}
-
-static inline void put_unaligned_le32(u32 val, void *p)
-{
-	__put_unaligned_cpu32(val, p);
-}
-
-static inline void put_unaligned_le64(u64 val, void *p)
-{
-	__put_unaligned_cpu64(val, p);
-}
-
-static inline u16 get_unaligned_be16(const void *p)
-{
-	return swab16(__get_unaligned_cpu16((const u8 *)p));
-}
-
-static inline u32 get_unaligned_be32(const void *p)
-{
-	return swab32(__get_unaligned_cpu32((const u8 *)p));
-}
-
-static inline u64 get_unaligned_be64(const void *p)
-{
-	return swab64(__get_unaligned_cpu64((const u8 *)p));
-}
-
-static inline void put_unaligned_be16(u16 val, void *p)
-{
-	__put_unaligned_cpu16(swab16(val), p);
-}
-
-static inline void put_unaligned_be32(u32 val, void *p)
-{
-	__put_unaligned_cpu32(swab32(val), p);
-}
-
-static inline void put_unaligned_be64(u64 val, void *p)
-{
-	__put_unaligned_cpu64(swab64(val), p);
-}
-
-#endif /* _LINUX_UNALIGNED_LE_STRUCT_H */
-- 
2.29.2


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 03/13] sh: remove unaligned access for sh4a
  2021-05-14 10:00 ` [PATCH v2 03/13] sh: remove unaligned access for sh4a Arnd Bergmann
@ 2021-05-14 10:34   ` John Paul Adrian Glaubitz
  2021-05-14 12:22     ` Arnd Bergmann
  0 siblings, 1 reply; 96+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-05-14 10:34 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, Yoshinori Sato,
	Rich Felker, linux-sh, linux-kernel

Hi Arnd!

On 5/14/21 12:00 PM, Arnd Bergmann wrote:
> Unlike every other architecture, sh4a uses an inline asm implementation
> for get_unaligned(). I have shown that this produces better object
> code than the asm-generic version. However, there are very few users of
> arch/sh/ overall, and most of those seem to use sh4 rather than sh4a CPU
> cores, so it seems not worth keeping the complexity in the architecture
> independent code.

My Renesas SH4-Boards actually run an sh4a-Kernel, not an sh4-Kernel:

root@tirpitz:~> uname -a
Linux tirpitz 5.11.0-rc4-00012-g10c03c5bf422 #161 PREEMPT Mon Jan 18 21:10:17 CET 2021 sh4a GNU/Linux
root@tirpitz:~>

So, if this change reduces performance on sh4a, I would rather not merge it.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [PATCH v2 13/13] asm-generic: simplify asm/unaligned.h
  2021-05-14 10:01 ` [PATCH v2 13/13] asm-generic: simplify asm/unaligned.h Arnd Bergmann
@ 2021-05-14 10:35   ` David Laight
  0 siblings, 0 replies; 96+ messages in thread
From: David Laight @ 2021-05-14 10:35 UTC (permalink / raw)
  To: 'Arnd Bergmann', linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, linux-kernel

From: Arnd Bergmann
> Sent: 14 May 2021 11:01
> 
> The get_unaligned()/put_unaligned() implementations are much more complex
> than necessary, now that all architectures use the same code.
...
> This new version does allow aggregate types into get_unaligned(), which
> was not the original goal but might come in handy.

Adding '* 1' to the value would stop that and shouldn't add any code.
Although you might want to cast back to the original type to
avoid 'short' being converted to 'int'.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 05/13] powerpc: use linux/unaligned/le_struct.h on LE power7
  2021-05-14 10:00   ` Arnd Bergmann
@ 2021-05-14 11:48     ` Segher Boessenkool
  -1 siblings, 0 replies; 96+ messages in thread
From: Segher Boessenkool @ 2021-05-14 11:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, Arnd Bergmann, Vineet Gupta, linuxppc-dev,
	linux-kernel, Paul Mackerras, Linus Torvalds

Hi Arnd,

On Fri, May 14, 2021 at 12:00:53PM +0200, Arnd Bergmann wrote:
> Little-endian POWER7 kernels disable
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because that is not supported on
> the hardware, but the kernel still uses direct load/store for explicti
> get_unaligned()/put_unaligned().
> 
> I assume this is a mistake that leads to power7 having to trap and fix
> up all these unaligned accesses at a noticeable performance cost.
> 
> The fix is completely trivial, just remove the file and use the
> generic version that gets it right.

LE p7 isn't supported (it requires special firmware), and no one uses it
anymore, also not for development.  It was used for powerpc64le-linux
development before p8 was widely available.


Segher

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 05/13] powerpc: use linux/unaligned/le_struct.h on LE power7
@ 2021-05-14 11:48     ` Segher Boessenkool
  0 siblings, 0 replies; 96+ messages in thread
From: Segher Boessenkool @ 2021-05-14 11:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, Arnd Bergmann, Vineet Gupta, linux-kernel,
	Linus Torvalds, Paul Mackerras, linuxppc-dev

Hi Arnd,

On Fri, May 14, 2021 at 12:00:53PM +0200, Arnd Bergmann wrote:
> Little-endian POWER7 kernels disable
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because that is not supported on
> the hardware, but the kernel still uses direct load/store for explicti
> get_unaligned()/put_unaligned().
> 
> I assume this is a mistake that leads to power7 having to trap and fix
> up all these unaligned accesses at a noticeable performance cost.
> 
> The fix is completely trivial, just remove the file and use the
> generic version that gets it right.

LE p7 isn't supported (it requires special firmware), and no one uses it
anymore, also not for development.  It was used for powerpc64le-linux
development before p8 was widely available.


Segher

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 03/13] sh: remove unaligned access for sh4a
  2021-05-14 10:34   ` John Paul Adrian Glaubitz
@ 2021-05-14 12:22     ` Arnd Bergmann
  2021-05-15 15:36       ` John Paul Adrian Glaubitz
  0 siblings, 1 reply; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 12:22 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz
  Cc: linux-arch, Linus Torvalds, Vineet Gupta, Yoshinori Sato,
	Rich Felker, Linux-sh list, Linux Kernel Mailing List

On Fri, May 14, 2021 at 12:34 PM John Paul Adrian Glaubitz
<glaubitz@physik.fu-berlin.de> wrote:
>
> Hi Arnd!
>
> On 5/14/21 12:00 PM, Arnd Bergmann wrote:
> > Unlike every other architecture, sh4a uses an inline asm implementation
> > for get_unaligned(). I have shown that this produces better object
> > code than the asm-generic version. However, there are very few users of
> > arch/sh/ overall, and most of those seem to use sh4 rather than sh4a CPU
> > cores, so it seems not worth keeping the complexity in the architecture
> > independent code.
>
> My Renesas SH4-Boards actually run an sh4a-Kernel, not an sh4-Kernel:
>
> root@tirpitz:~> uname -a
> Linux tirpitz 5.11.0-rc4-00012-g10c03c5bf422 #161 PREEMPT Mon Jan 18 21:10:17 CET 2021 sh4a GNU/Linux
> root@tirpitz:~>
>
> So, if this change reduces performance on sh4a, I would rather not merge it.

It only makes a difference in very specific scenarios in which unaligned
accesses are done in a fast path, e.g. when forwarding network packet
at a high rate on a big-endian kernel (little-endian kernels wouldn't run into
this on IP headers). If you have a use case for this machine on which the
you can show a performance regression, I can add a patch on top to put
the optimized sh4a get_unaligned_le32() back. Dropping this patch
altogether would make the series much more complex because most of
the associated code gets removed in the end.

As I mentioned, supporting "movua" in the compiler likely has a much
larger impact on performance, as it would also help in user space, and
it should improve the networking case on little-endian kernels by replacing
the four separate byte loads/shift pairs with a movua plus a byteswap.

Not sure if there are gcc developers that have an active interest in sh4a
support any more.

      Arnd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 05/13] powerpc: use linux/unaligned/le_struct.h on LE power7
  2021-05-14 11:48     ` Segher Boessenkool
  (?)
@ 2021-05-14 13:02     ` Arnd Bergmann
  -1 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 13:02 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: linux-arch, Vineet Gupta, linuxppc-dev,
	Linux Kernel Mailing List, Paul Mackerras, Linus Torvalds

On Fri, May 14, 2021 at 1:48 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Fri, May 14, 2021 at 12:00:53PM +0200, Arnd Bergmann wrote:
> > Little-endian POWER7 kernels disable
> > CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because that is not supported on
> > the hardware, but the kernel still uses direct load/store for explicti
> > get_unaligned()/put_unaligned().
> >
> > I assume this is a mistake that leads to power7 having to trap and fix
> > up all these unaligned accesses at a noticeable performance cost.
> >
> > The fix is completely trivial, just remove the file and use the
> > generic version that gets it right.
>
> LE p7 isn't supported (it requires special firmware), and no one uses it
> anymore, also not for development.  It was used for powerpc64le-linux
> development before p8 was widely available.

Ok, thanks for the clarification.

Should we just remove the Kconfig option for it then as further cleanup?
Is there any other code such as alignment trap handling that could be
removed if LE POWER7 gets dropped?

      Arnd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
  2021-05-14 10:00 ` Arnd Bergmann
  (?)
  (?)
@ 2021-05-14 17:32   ` Linus Torvalds
  -1 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-14 17:32 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, Vineet Gupta, Arnd Bergmann, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux ARM,
	linux-m68k, Linux Crypto Mailing List, openrisc, linuxppc-dev,
	Linux-sh list, linux-sparc, linux-ntfs-dev, linux-block,
	linux-wireless, Netdev, LSM List

On Fri, May 14, 2021 at 3:02 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> I've included this version in the asm-generic tree for 5.14 already,
> addressing the few issues that were pointed out in the RFC. If there
> are any remaining problems, I hope those can be addressed as follow-up
> patches.

This continues to look great to me, and now has the even simpler
remaining implementation.

I'd be tempted to just pull it in for 5.13, but I guess we don't
actually have any _outstanding_ bug in this area (the bug was in our
zlib code, required -O3 to trigger, has been fixed now, and the biggy
case didn't even use "get_unaligned()").

So I guess your 5.14 timing is the right thing to do.

        Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 17:32   ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-14 17:32 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Rich Felker, Linux-sh list, Richard Russon (FlatCap),
	Amitkumar Karwar, Russell King, Eric Dumazet, Paul Mackerras,
	H. Peter Anvin, linux-sparc, Thomas Gleixner, linux-arch,
	Florian Fainelli, Yoshinori Sato, the arch/x86 maintainers,
	James Morris, Ingo Molnar, Geert Uytterhoeven, Linux ARM,
	Jakub Kicinski, Serge E. Hallyn, Jonas Bonn, Arnd Bergmann,
	Ganapathi Bhat, Stefan Kristiansson, linux-block, linux-m68k,
	openrisc, Borislav Petkov, Stafford Horne, Kalle Valo,
	Jens Axboe, John Johansen, Xinming Hu, Vineet Gupta,
	linux-wireless, Linux Kernel Mailing List, Vladimir Oltean,
	linux-ntfs-dev, LSM List, Linux Crypto Mailing List, Netdev,
	linuxppc-dev, Sharvari Harisangam

On Fri, May 14, 2021 at 3:02 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> I've included this version in the asm-generic tree for 5.14 already,
> addressing the few issues that were pointed out in the RFC. If there
> are any remaining problems, I hope those can be addressed as follow-up
> patches.

This continues to look great to me, and now has the even simpler
remaining implementation.

I'd be tempted to just pull it in for 5.13, but I guess we don't
actually have any _outstanding_ bug in this area (the bug was in our
zlib code, required -O3 to trigger, has been fixed now, and the biggy
case didn't even use "get_unaligned()").

So I guess your 5.14 timing is the right thing to do.

        Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 17:32   ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-14 17:32 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, Vineet Gupta, Arnd Bergmann, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux ARM,
	linux-m68k, Linux Crypto Mailing List, openrisc, linuxppc-dev,
	Linux-sh list, linux-sparc, linux-ntfs-dev, linux-block,
	linux-wireless, Netdev, LSM List

On Fri, May 14, 2021 at 3:02 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> I've included this version in the asm-generic tree for 5.14 already,
> addressing the few issues that were pointed out in the RFC. If there
> are any remaining problems, I hope those can be addressed as follow-up
> patches.

This continues to look great to me, and now has the even simpler
remaining implementation.

I'd be tempted to just pull it in for 5.13, but I guess we don't
actually have any _outstanding_ bug in this area (the bug was in our
zlib code, required -O3 to trigger, has been fixed now, and the biggy
case didn't even use "get_unaligned()").

So I guess your 5.14 timing is the right thing to do.

        Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 17:32   ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-14 17:32 UTC (permalink / raw)
  To: openrisc

On Fri, May 14, 2021 at 3:02 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> I've included this version in the asm-generic tree for 5.14 already,
> addressing the few issues that were pointed out in the RFC. If there
> are any remaining problems, I hope those can be addressed as follow-up
> patches.

This continues to look great to me, and now has the even simpler
remaining implementation.

I'd be tempted to just pull it in for 5.13, but I guess we don't
actually have any _outstanding_ bug in this area (the bug was in our
zlib code, required -O3 to trigger, has been fixed now, and the biggy
case didn't even use "get_unaligned()").

So I guess your 5.14 timing is the right thing to do.

        Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
  2021-05-14 17:32   ` Linus Torvalds
  (?)
  (?)
@ 2021-05-14 18:51     ` Vineet Gupta
  -1 siblings, 0 replies; 96+ messages in thread
From: Vineet Gupta @ 2021-05-14 18:51 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann
  Cc: linux-arch, Arnd Bergmann, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux ARM,
	linux-m68k, Linux Crypto Mailing List, openrisc, linuxppc-dev,
	Linux-sh list, linux-sparc, linux-ntfs-dev, linux-block,
	linux-wireless, Netdev, LSM List

On 5/14/21 10:32 AM, Linus Torvalds wrote:
> On Fri, May 14, 2021 at 3:02 AM Arnd Bergmann <arnd@kernel.org> wrote:
>> I've included this version in the asm-generic tree for 5.14 already,
>> addressing the few issues that were pointed out in the RFC. If there
>> are any remaining problems, I hope those can be addressed as follow-up
>> patches.
> This continues to look great to me, and now has the even simpler
> remaining implementation.
>
> I'd be tempted to just pull it in for 5.13, but I guess we don't
> actually have any _outstanding_ bug in this area (the bug was in our
> zlib code, required -O3 to trigger, has been fixed now,

Wasn't the new zlib code slated for 5.14. I don't see it in your master yet

>   and the biggy
> case didn't even use "get_unaligned()").

Indeed this series is sort of orthogonal to that bug, but IMO that bug 
still exists in 5.13 for -O3 build, granted that is not enabled for !ARC.

-Vineet

>
> So I guess your 5.14 timing is the right thing to do.
>
>          Linus


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 18:51     ` Vineet Gupta
  0 siblings, 0 replies; 96+ messages in thread
From: Vineet Gupta @ 2021-05-14 18:51 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann
  Cc: Rich Felker, Linux-sh list, Richard Russon (FlatCap),
	Amitkumar Karwar, Russell King, Eric Dumazet, Paul Mackerras,
	H. Peter Anvin, linux-sparc, Thomas Gleixner, linux-arch,
	Florian Fainelli, Yoshinori Sato, the arch/x86 maintainers,
	James Morris, Ingo Molnar, Geert Uytterhoeven, Linux ARM,
	Jakub Kicinski, Serge E. Hallyn, Jonas Bonn, Arnd Bergmann,
	Ganapathi Bhat, Stefan Kristiansson, linux-block, linux-m68k,
	openrisc, Borislav Petkov, Stafford Horne, Kalle Valo,
	Jens Axboe, John Johansen, Xinming Hu, Netdev, linux-wireless,
	Linux Kernel Mailing List, Vladimir Oltean, linux-ntfs-dev,
	LSM List, Linux Crypto Mailing List, linuxppc-dev,
	Sharvari Harisangam

On 5/14/21 10:32 AM, Linus Torvalds wrote:
> On Fri, May 14, 2021 at 3:02 AM Arnd Bergmann <arnd@kernel.org> wrote:
>> I've included this version in the asm-generic tree for 5.14 already,
>> addressing the few issues that were pointed out in the RFC. If there
>> are any remaining problems, I hope those can be addressed as follow-up
>> patches.
> This continues to look great to me, and now has the even simpler
> remaining implementation.
>
> I'd be tempted to just pull it in for 5.13, but I guess we don't
> actually have any _outstanding_ bug in this area (the bug was in our
> zlib code, required -O3 to trigger, has been fixed now,

Wasn't the new zlib code slated for 5.14. I don't see it in your master yet

>   and the biggy
> case didn't even use "get_unaligned()").

Indeed this series is sort of orthogonal to that bug, but IMO that bug 
still exists in 5.13 for -O3 build, granted that is not enabled for !ARC.

-Vineet

>
> So I guess your 5.14 timing is the right thing to do.
>
>          Linus


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 18:51     ` Vineet Gupta
  0 siblings, 0 replies; 96+ messages in thread
From: Vineet Gupta @ 2021-05-14 18:51 UTC (permalink / raw)
  To: Linus Torvalds, Arnd Bergmann
  Cc: linux-arch, Arnd Bergmann, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux ARM,
	linux-m68k, Linux Crypto Mailing List, openrisc, linuxppc-dev,
	Linux-sh list, linux-sparc, linux-ntfs-dev, linux-block,
	linux-wireless, Netdev, LSM List

On 5/14/21 10:32 AM, Linus Torvalds wrote:
> On Fri, May 14, 2021 at 3:02 AM Arnd Bergmann <arnd@kernel.org> wrote:
>> I've included this version in the asm-generic tree for 5.14 already,
>> addressing the few issues that were pointed out in the RFC. If there
>> are any remaining problems, I hope those can be addressed as follow-up
>> patches.
> This continues to look great to me, and now has the even simpler
> remaining implementation.
>
> I'd be tempted to just pull it in for 5.13, but I guess we don't
> actually have any _outstanding_ bug in this area (the bug was in our
> zlib code, required -O3 to trigger, has been fixed now,

Wasn't the new zlib code slated for 5.14. I don't see it in your master yet

>   and the biggy
> case didn't even use "get_unaligned()").

Indeed this series is sort of orthogonal to that bug, but IMO that bug 
still exists in 5.13 for -O3 build, granted that is not enabled for !ARC.

-Vineet

>
> So I guess your 5.14 timing is the right thing to do.
>
>          Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 18:51     ` Vineet Gupta
  0 siblings, 0 replies; 96+ messages in thread
From: Vineet Gupta @ 2021-05-14 18:51 UTC (permalink / raw)
  To: openrisc

On 5/14/21 10:32 AM, Linus Torvalds wrote:
> On Fri, May 14, 2021 at 3:02 AM Arnd Bergmann <arnd@kernel.org> wrote:
>> I've included this version in the asm-generic tree for 5.14 already,
>> addressing the few issues that were pointed out in the RFC. If there
>> are any remaining problems, I hope those can be addressed as follow-up
>> patches.
> This continues to look great to me, and now has the even simpler
> remaining implementation.
>
> I'd be tempted to just pull it in for 5.13, but I guess we don't
> actually have any _outstanding_ bug in this area (the bug was in our
> zlib code, required -O3 to trigger, has been fixed now,

Wasn't the new zlib code slated for 5.14. I don't see it in your master yet

>   and the biggy
> case didn't even use "get_unaligned()").

Indeed this series is sort of orthogonal to that bug, but IMO that bug 
still exists in 5.13 for -O3 build, granted that is not enabled for !ARC.

-Vineet

>
> So I guess your 5.14 timing is the right thing to do.
>
>          Linus


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
  2021-05-14 18:51     ` Vineet Gupta
  (?)
  (?)
@ 2021-05-14 19:22       ` Linus Torvalds
  -1 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-14 19:22 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Arnd Bergmann, linux-arch, Arnd Bergmann, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux ARM,
	linux-m68k, Linux Crypto Mailing List, openrisc, linuxppc-dev,
	Linux-sh list, linux-sparc, linux-ntfs-dev, linux-block,
	linux-wireless, Netdev, LSM List

On Fri, May 14, 2021 at 11:52 AM Vineet Gupta
<Vineet.Gupta1@synopsys.com> wrote:
>
> Wasn't the new zlib code slated for 5.14. I don't see it in your master yet

You're right, I never actually committed it, since it was specific to
ARC and -O3 and I wasn't entirely happy with the amount of testing it
got (with Heiko pointing out that the s390 stuff needed more fixes for
the change).

So in fact it's not even queued up for 5.14 due to this all, I just dropped it.

> >   and the biggy
> > case didn't even use "get_unaligned()").
>
> Indeed this series is sort of orthogonal to that bug, but IMO that bug
> still exists in 5.13 for -O3 build, granted that is not enabled for !ARC.

Right, the zlib bug is still there.

But Arnd's series wouldn't even fix it: right now inffast has its own
- ugly and slow - special 2-byte-only version of "get_unaligned()",
called "get_unaligned16()".

And because it's ugly and slow, it's not actually used for
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.

Vineet - maybe the fix is to not take my patch to update to a newer
zlib, but to just fix inffast to use the proper get_unaligned(). Then
Arnd's series _would_ actually fix all this..

              Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 19:22       ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-14 19:22 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Rich Felker, Linux-sh list, Richard Russon (FlatCap),
	Amitkumar Karwar, Russell King, Eric Dumazet, Paul Mackerras,
	H. Peter Anvin, linux-sparc, Thomas Gleixner, linux-arch,
	Florian Fainelli, Yoshinori Sato, the arch/x86 maintainers,
	James Morris, Ingo Molnar, Geert Uytterhoeven, Linux ARM,
	Jakub Kicinski, Serge E. Hallyn, Jonas Bonn, Arnd Bergmann,
	Ganapathi Bhat, Stefan Kristiansson, linux-block, linux-m68k,
	openrisc, Borislav Petkov, Stafford Horne, Kalle Valo,
	Jens Axboe, Arnd Bergmann, John Johansen, Xinming Hu, Netdev,
	linux-wireless, Linux Kernel Mailing List, Vladimir Oltean,
	linux-ntfs-dev, LSM List, Linux Crypto Mailing List,
	linuxppc-dev, Sharvari Harisangam

On Fri, May 14, 2021 at 11:52 AM Vineet Gupta
<Vineet.Gupta1@synopsys.com> wrote:
>
> Wasn't the new zlib code slated for 5.14. I don't see it in your master yet

You're right, I never actually committed it, since it was specific to
ARC and -O3 and I wasn't entirely happy with the amount of testing it
got (with Heiko pointing out that the s390 stuff needed more fixes for
the change).

So in fact it's not even queued up for 5.14 due to this all, I just dropped it.

> >   and the biggy
> > case didn't even use "get_unaligned()").
>
> Indeed this series is sort of orthogonal to that bug, but IMO that bug
> still exists in 5.13 for -O3 build, granted that is not enabled for !ARC.

Right, the zlib bug is still there.

But Arnd's series wouldn't even fix it: right now inffast has its own
- ugly and slow - special 2-byte-only version of "get_unaligned()",
called "get_unaligned16()".

And because it's ugly and slow, it's not actually used for
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.

Vineet - maybe the fix is to not take my patch to update to a newer
zlib, but to just fix inffast to use the proper get_unaligned(). Then
Arnd's series _would_ actually fix all this..

              Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 19:22       ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-14 19:22 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Arnd Bergmann, linux-arch, Arnd Bergmann, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux ARM,
	linux-m68k, Linux Crypto Mailing List, openrisc, linuxppc-dev,
	Linux-sh list, linux-sparc, linux-ntfs-dev, linux-block,
	linux-wireless, Netdev, LSM List

On Fri, May 14, 2021 at 11:52 AM Vineet Gupta
<Vineet.Gupta1@synopsys.com> wrote:
>
> Wasn't the new zlib code slated for 5.14. I don't see it in your master yet

You're right, I never actually committed it, since it was specific to
ARC and -O3 and I wasn't entirely happy with the amount of testing it
got (with Heiko pointing out that the s390 stuff needed more fixes for
the change).

So in fact it's not even queued up for 5.14 due to this all, I just dropped it.

> >   and the biggy
> > case didn't even use "get_unaligned()").
>
> Indeed this series is sort of orthogonal to that bug, but IMO that bug
> still exists in 5.13 for -O3 build, granted that is not enabled for !ARC.

Right, the zlib bug is still there.

But Arnd's series wouldn't even fix it: right now inffast has its own
- ugly and slow - special 2-byte-only version of "get_unaligned()",
called "get_unaligned16()".

And because it's ugly and slow, it's not actually used for
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.

Vineet - maybe the fix is to not take my patch to update to a newer
zlib, but to just fix inffast to use the proper get_unaligned(). Then
Arnd's series _would_ actually fix all this..

              Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 19:22       ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-14 19:22 UTC (permalink / raw)
  To: openrisc

On Fri, May 14, 2021 at 11:52 AM Vineet Gupta
<Vineet.Gupta1@synopsys.com> wrote:
>
> Wasn't the new zlib code slated for 5.14. I don't see it in your master yet

You're right, I never actually committed it, since it was specific to
ARC and -O3 and I wasn't entirely happy with the amount of testing it
got (with Heiko pointing out that the s390 stuff needed more fixes for
the change).

So in fact it's not even queued up for 5.14 due to this all, I just dropped it.

> >   and the biggy
> > case didn't even use "get_unaligned()").
>
> Indeed this series is sort of orthogonal to that bug, but IMO that bug
> still exists in 5.13 for -O3 build, granted that is not enabled for !ARC.

Right, the zlib bug is still there.

But Arnd's series wouldn't even fix it: right now inffast has its own
- ugly and slow - special 2-byte-only version of "get_unaligned()",
called "get_unaligned16()".

And because it's ugly and slow, it's not actually used for
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.

Vineet - maybe the fix is to not take my patch to update to a newer
zlib, but to just fix inffast to use the proper get_unaligned(). Then
Arnd's series _would_ actually fix all this..

              Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
  2021-05-14 17:32   ` Linus Torvalds
  (?)
  (?)
@ 2021-05-14 19:31     ` Arnd Bergmann
  -1 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 19:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch, Vineet Gupta, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux ARM,
	linux-m68k, Linux Crypto Mailing List, Openrisc, linuxppc-dev,
	Linux-sh list, linux-sparc, linux-ntfs-dev, linux-block,
	linux-wireless, Netdev, LSM List

On Fri, May 14, 2021 at 7:32 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, May 14, 2021 at 3:02 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > I've included this version in the asm-generic tree for 5.14 already,
> > addressing the few issues that were pointed out in the RFC. If there
> > are any remaining problems, I hope those can be addressed as follow-up
> > patches.
>
> This continues to look great to me, and now has the even simpler
> remaining implementation.
>
> I'd be tempted to just pull it in for 5.13, but I guess we don't
> actually have any _outstanding_ bug in this area (the bug was in our
> zlib code, required -O3 to trigger, has been fixed now, and the biggy
> case didn't even use "get_unaligned()").
>
> So I guess your 5.14 timing is the right thing to do.

Yes, I think that's best, just in case something does come up. While all the
object code I looked at does appear better, this is one of those areas that
can be hard to pinpoint if we hit a regression in a particular combination of
architecture+compiler+source file.

I have pushed a signed tag to
https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git
asm-generic-unaligned-5.14

and plan to send that in the 5.14 merge window unless you decide to
take it now after all.

        Arnd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 19:31     ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 19:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rich Felker, Linux-sh list, Richard Russon (FlatCap),
	Amitkumar Karwar, Russell King, Eric Dumazet, Paul Mackerras,
	H. Peter Anvin, linux-sparc, Thomas Gleixner, linux-arch,
	Florian Fainelli, Yoshinori Sato, the arch/x86 maintainers,
	James Morris, Ingo Molnar, Geert Uytterhoeven, Linux ARM,
	Jakub Kicinski, Serge E. Hallyn, Jonas Bonn, Ganapathi Bhat,
	Stefan Kristiansson, linux-block, linux-m68k, Openrisc,
	Borislav Petkov, Stafford Horne, Kalle Valo, Jens Axboe,
	John Johansen, Xinming Hu, Vineet Gupta, linux-wireless,
	Linux Kernel Mailing List, Vladimir Oltean, linux-ntfs-dev,
	LSM List, Linux Crypto Mailing List, Netdev, linuxppc-dev,
	Sharvari Harisangam

On Fri, May 14, 2021 at 7:32 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, May 14, 2021 at 3:02 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > I've included this version in the asm-generic tree for 5.14 already,
> > addressing the few issues that were pointed out in the RFC. If there
> > are any remaining problems, I hope those can be addressed as follow-up
> > patches.
>
> This continues to look great to me, and now has the even simpler
> remaining implementation.
>
> I'd be tempted to just pull it in for 5.13, but I guess we don't
> actually have any _outstanding_ bug in this area (the bug was in our
> zlib code, required -O3 to trigger, has been fixed now, and the biggy
> case didn't even use "get_unaligned()").
>
> So I guess your 5.14 timing is the right thing to do.

Yes, I think that's best, just in case something does come up. While all the
object code I looked at does appear better, this is one of those areas that
can be hard to pinpoint if we hit a regression in a particular combination of
architecture+compiler+source file.

I have pushed a signed tag to
https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git
asm-generic-unaligned-5.14

and plan to send that in the 5.14 merge window unless you decide to
take it now after all.

        Arnd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 19:31     ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 19:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch, Vineet Gupta, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux ARM,
	linux-m68k, Linux Crypto Mailing List, Openrisc, linuxppc-dev,
	Linux-sh list, linux-sparc, linux-ntfs-dev, linux-block,
	linux-wireless, Netdev, LSM List

On Fri, May 14, 2021 at 7:32 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, May 14, 2021 at 3:02 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > I've included this version in the asm-generic tree for 5.14 already,
> > addressing the few issues that were pointed out in the RFC. If there
> > are any remaining problems, I hope those can be addressed as follow-up
> > patches.
>
> This continues to look great to me, and now has the even simpler
> remaining implementation.
>
> I'd be tempted to just pull it in for 5.13, but I guess we don't
> actually have any _outstanding_ bug in this area (the bug was in our
> zlib code, required -O3 to trigger, has been fixed now, and the biggy
> case didn't even use "get_unaligned()").
>
> So I guess your 5.14 timing is the right thing to do.

Yes, I think that's best, just in case something does come up. While all the
object code I looked at does appear better, this is one of those areas that
can be hard to pinpoint if we hit a regression in a particular combination of
architecture+compiler+source file.

I have pushed a signed tag to
https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git
asm-generic-unaligned-5.14

and plan to send that in the 5.14 merge window unless you decide to
take it now after all.

        Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 19:31     ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-14 19:31 UTC (permalink / raw)
  To: openrisc

On Fri, May 14, 2021 at 7:32 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, May 14, 2021 at 3:02 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > I've included this version in the asm-generic tree for 5.14 already,
> > addressing the few issues that were pointed out in the RFC. If there
> > are any remaining problems, I hope those can be addressed as follow-up
> > patches.
>
> This continues to look great to me, and now has the even simpler
> remaining implementation.
>
> I'd be tempted to just pull it in for 5.13, but I guess we don't
> actually have any _outstanding_ bug in this area (the bug was in our
> zlib code, required -O3 to trigger, has been fixed now, and the biggy
> case didn't even use "get_unaligned()").
>
> So I guess your 5.14 timing is the right thing to do.

Yes, I think that's best, just in case something does come up. While all the
object code I looked at does appear better, this is one of those areas that
can be hard to pinpoint if we hit a regression in a particular combination of
architecture+compiler+source file.

I have pushed a signed tag to
https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git
asm-generic-unaligned-5.14

and plan to send that in the 5.14 merge window unless you decide to
take it now after all.

        Arnd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
  2021-05-14 19:22       ` Linus Torvalds
  (?)
  (?)
@ 2021-05-14 19:45         ` Vineet Gupta
  -1 siblings, 0 replies; 96+ messages in thread
From: Vineet Gupta @ 2021-05-14 19:45 UTC (permalink / raw)
  To: Linus Torvalds, Vineet Gupta
  Cc: Arnd Bergmann, linux-arch, Arnd Bergmann, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux ARM,
	linux-m68k, Linux Crypto Mailing List, openrisc, linuxppc-dev,
	Linux-sh list, linux-sparc, linux-ntfs-dev, linux-block,
	linux-wireless, Netdev, LSM List

On 5/14/21 12:22 PM, Linus Torvalds wrote:
> On Fri, May 14, 2021 at 11:52 AM Vineet Gupta
> <Vineet.Gupta1@synopsys.com> wrote:
>> Wasn't the new zlib code slated for 5.14. I don't see it in your master yet
> You're right, I never actually committed it, since it was specific to
> ARC and -O3

Well, not really, the issue manifested in ARC O3 testing, but I showed 
the problem existed for arm64 gcc too.

> and I wasn't entirely happy with the amount of testing it
> got (with Heiko pointing out that the s390 stuff needed more fixes for
> the change).

With his addon patch everything seemed hunky dory.

> The patch below is required on top of your patch to make it compile
> for s390 as well.
> Tested with kernel image decompression, and also btrfs with file
> compression; both software and hardware compression.
> Everything seems to work.

> So in fact it's not even queued up for 5.14 due to this all, I just dropped it.

But Why. Can't we throw it in linux-next for 5.14. I promise to test it 
- and will likely hit any corner cases. Also for the time being we could 
force just that file/files to build for -O3 to stress test the aspects 
that were fragile.

>>>    and the biggy
>>> case didn't even use "get_unaligned()").
>> Indeed this series is sort of orthogonal to that bug, but IMO that bug
>> still exists in 5.13 for -O3 build, granted that is not enabled for !ARC.
> Right, the zlib bug is still there.
>
> But Arnd's series wouldn't even fix it: right now inffast has its own
> - ugly and slow - special 2-byte-only version of "get_unaligned()",
> called "get_unaligned16()".

I know that's why said they are orthogonal.


> And because it's ugly and slow, it's not actually used for
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.
>
> Vineet - maybe the fix is to not take my patch to update to a newer
> zlib, but to just fix inffast to use the proper get_unaligned(). Then
> Arnd's series _would_ actually fix all this..

OK if you say so.

-Vineet

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 19:45         ` Vineet Gupta
  0 siblings, 0 replies; 96+ messages in thread
From: Vineet Gupta @ 2021-05-14 19:45 UTC (permalink / raw)
  To: Linus Torvalds, Vineet Gupta
  Cc: Rich Felker, Linux-sh list, Richard Russon (FlatCap),
	Amitkumar Karwar, Russell King, Eric Dumazet, Paul Mackerras,
	H. Peter Anvin, linux-sparc, Thomas Gleixner, linux-arch,
	Florian Fainelli, Yoshinori Sato, the arch/x86 maintainers,
	James Morris, Ingo Molnar, Geert Uytterhoeven, Linux ARM,
	Jakub Kicinski, Serge E. Hallyn, Jonas Bonn, Arnd Bergmann,
	Ganapathi Bhat, Stefan Kristiansson, linux-block, linux-m68k,
	openrisc, Borislav Petkov, Stafford Horne, Kalle Valo,
	Jens Axboe, Arnd Bergmann, John Johansen, Xinming Hu, Netdev,
	linux-wireless, Linux Kernel Mailing List, Vladimir Oltean,
	linux-ntfs-dev, LSM List, Linux Crypto Mailing List,
	linuxppc-dev, Sharvari Harisangam

On 5/14/21 12:22 PM, Linus Torvalds wrote:
> On Fri, May 14, 2021 at 11:52 AM Vineet Gupta
> <Vineet.Gupta1@synopsys.com> wrote:
>> Wasn't the new zlib code slated for 5.14. I don't see it in your master yet
> You're right, I never actually committed it, since it was specific to
> ARC and -O3

Well, not really, the issue manifested in ARC O3 testing, but I showed 
the problem existed for arm64 gcc too.

> and I wasn't entirely happy with the amount of testing it
> got (with Heiko pointing out that the s390 stuff needed more fixes for
> the change).

With his addon patch everything seemed hunky dory.

> The patch below is required on top of your patch to make it compile
> for s390 as well.
> Tested with kernel image decompression, and also btrfs with file
> compression; both software and hardware compression.
> Everything seems to work.

> So in fact it's not even queued up for 5.14 due to this all, I just dropped it.

But Why. Can't we throw it in linux-next for 5.14. I promise to test it 
- and will likely hit any corner cases. Also for the time being we could 
force just that file/files to build for -O3 to stress test the aspects 
that were fragile.

>>>    and the biggy
>>> case didn't even use "get_unaligned()").
>> Indeed this series is sort of orthogonal to that bug, but IMO that bug
>> still exists in 5.13 for -O3 build, granted that is not enabled for !ARC.
> Right, the zlib bug is still there.
>
> But Arnd's series wouldn't even fix it: right now inffast has its own
> - ugly and slow - special 2-byte-only version of "get_unaligned()",
> called "get_unaligned16()".

I know that's why said they are orthogonal.


> And because it's ugly and slow, it's not actually used for
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.
>
> Vineet - maybe the fix is to not take my patch to update to a newer
> zlib, but to just fix inffast to use the proper get_unaligned(). Then
> Arnd's series _would_ actually fix all this..

OK if you say so.

-Vineet

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 19:45         ` Vineet Gupta
  0 siblings, 0 replies; 96+ messages in thread
From: Vineet Gupta @ 2021-05-14 19:45 UTC (permalink / raw)
  To: Linus Torvalds, Vineet Gupta
  Cc: Arnd Bergmann, linux-arch, Arnd Bergmann, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux ARM,
	linux-m68k, Linux Crypto Mailing List, openrisc, linuxppc-dev,
	Linux-sh list, linux-sparc, linux-ntfs-dev, linux-block,
	linux-wireless, Netdev, LSM List

On 5/14/21 12:22 PM, Linus Torvalds wrote:
> On Fri, May 14, 2021 at 11:52 AM Vineet Gupta
> <Vineet.Gupta1@synopsys.com> wrote:
>> Wasn't the new zlib code slated for 5.14. I don't see it in your master yet
> You're right, I never actually committed it, since it was specific to
> ARC and -O3

Well, not really, the issue manifested in ARC O3 testing, but I showed 
the problem existed for arm64 gcc too.

> and I wasn't entirely happy with the amount of testing it
> got (with Heiko pointing out that the s390 stuff needed more fixes for
> the change).

With his addon patch everything seemed hunky dory.

> The patch below is required on top of your patch to make it compile
> for s390 as well.
> Tested with kernel image decompression, and also btrfs with file
> compression; both software and hardware compression.
> Everything seems to work.

> So in fact it's not even queued up for 5.14 due to this all, I just dropped it.

But Why. Can't we throw it in linux-next for 5.14. I promise to test it 
- and will likely hit any corner cases. Also for the time being we could 
force just that file/files to build for -O3 to stress test the aspects 
that were fragile.

>>>    and the biggy
>>> case didn't even use "get_unaligned()").
>> Indeed this series is sort of orthogonal to that bug, but IMO that bug
>> still exists in 5.13 for -O3 build, granted that is not enabled for !ARC.
> Right, the zlib bug is still there.
>
> But Arnd's series wouldn't even fix it: right now inffast has its own
> - ugly and slow - special 2-byte-only version of "get_unaligned()",
> called "get_unaligned16()".

I know that's why said they are orthogonal.


> And because it's ugly and slow, it's not actually used for
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.
>
> Vineet - maybe the fix is to not take my patch to update to a newer
> zlib, but to just fix inffast to use the proper get_unaligned(). Then
> Arnd's series _would_ actually fix all this..

OK if you say so.

-Vineet
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 19:45         ` Vineet Gupta
  0 siblings, 0 replies; 96+ messages in thread
From: Vineet Gupta @ 2021-05-14 19:45 UTC (permalink / raw)
  To: openrisc

On 5/14/21 12:22 PM, Linus Torvalds wrote:
> On Fri, May 14, 2021 at 11:52 AM Vineet Gupta
> <Vineet.Gupta1@synopsys.com> wrote:
>> Wasn't the new zlib code slated for 5.14. I don't see it in your master yet
> You're right, I never actually committed it, since it was specific to
> ARC and -O3

Well, not really, the issue manifested in ARC O3 testing, but I showed 
the problem existed for arm64 gcc too.

> and I wasn't entirely happy with the amount of testing it
> got (with Heiko pointing out that the s390 stuff needed more fixes for
> the change).

With his addon patch everything seemed hunky dory.

> The patch below is required on top of your patch to make it compile
> for s390 as well.
> Tested with kernel image decompression, and also btrfs with file
> compression; both software and hardware compression.
> Everything seems to work.

> So in fact it's not even queued up for 5.14 due to this all, I just dropped it.

But Why. Can't we throw it in linux-next for 5.14. I promise to test it 
- and will likely hit any corner cases. Also for the time being we could 
force just that file/files to build for -O3 to stress test the aspects 
that were fragile.

>>>    and the biggy
>>> case didn't even use "get_unaligned()").
>> Indeed this series is sort of orthogonal to that bug, but IMO that bug
>> still exists in 5.13 for -O3 build, granted that is not enabled for !ARC.
> Right, the zlib bug is still there.
>
> But Arnd's series wouldn't even fix it: right now inffast has its own
> - ugly and slow - special 2-byte-only version of "get_unaligned()",
> called "get_unaligned16()".

I know that's why said they are orthogonal.


> And because it's ugly and slow, it's not actually used for
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.
>
> Vineet - maybe the fix is to not take my patch to update to a newer
> zlib, but to just fix inffast to use the proper get_unaligned(). Then
> Arnd's series _would_ actually fix all this..

OK if you say so.

-Vineet

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
  2021-05-14 19:45         ` Vineet Gupta
  (?)
  (?)
@ 2021-05-14 20:19           ` Linus Torvalds
  -1 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-14 20:19 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Arnd Bergmann, linux-arch, Arnd Bergmann, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux ARM,
	linux-m68k, Linux Crypto Mailing List, openrisc, linuxppc-dev,
	Linux-sh list, linux-sparc, linux-ntfs-dev, linux-block,
	linux-wireless, Netdev, LSM List

On Fri, May 14, 2021 at 12:45 PM Vineet Gupta
<Vineet.Gupta1@synopsys.com> wrote:
>
> Well, not really, the issue manifested in ARC O3 testing, but I showed
> the problem existed for arm64 gcc too.

.. but not with a supported kernel configuration.

> > So in fact it's not even queued up for 5.14 due to this all, I just dropped it.
>
> But Why.

I just didn't have time to deal with it during the merge window. If
you keep it alive, that's all fine and good.

                Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 20:19           ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-14 20:19 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Rich Felker, Linux-sh list, Richard Russon (FlatCap),
	Amitkumar Karwar, Russell King, Eric Dumazet, Paul Mackerras,
	H. Peter Anvin, linux-sparc, Thomas Gleixner, linux-arch,
	Florian Fainelli, Yoshinori Sato, the arch/x86 maintainers,
	James Morris, Ingo Molnar, Geert Uytterhoeven, Linux ARM,
	Jakub Kicinski, Serge E. Hallyn, Jonas Bonn, Arnd Bergmann,
	Ganapathi Bhat, Stefan Kristiansson, linux-block, linux-m68k,
	openrisc, Borislav Petkov, Stafford Horne, Kalle Valo,
	Jens Axboe, Arnd Bergmann, John Johansen, Xinming Hu, Netdev,
	linux-wireless, Linux Kernel Mailing List, Vladimir Oltean,
	linux-ntfs-dev, LSM List, Linux Crypto Mailing List,
	linuxppc-dev, Sharvari Harisangam

On Fri, May 14, 2021 at 12:45 PM Vineet Gupta
<Vineet.Gupta1@synopsys.com> wrote:
>
> Well, not really, the issue manifested in ARC O3 testing, but I showed
> the problem existed for arm64 gcc too.

.. but not with a supported kernel configuration.

> > So in fact it's not even queued up for 5.14 due to this all, I just dropped it.
>
> But Why.

I just didn't have time to deal with it during the merge window. If
you keep it alive, that's all fine and good.

                Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 20:19           ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-14 20:19 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Arnd Bergmann, linux-arch, Arnd Bergmann, Amitkumar Karwar,
	Benjamin Herrenschmidt, Borislav Petkov, Eric Dumazet,
	Florian Fainelli, Ganapathi Bhat, Geert Uytterhoeven,
	H. Peter Anvin, Ingo Molnar, Jakub Kicinski, James Morris,
	Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux ARM,
	linux-m68k, Linux Crypto Mailing List, openrisc, linuxppc-dev,
	Linux-sh list, linux-sparc, linux-ntfs-dev, linux-block,
	linux-wireless, Netdev, LSM List

On Fri, May 14, 2021 at 12:45 PM Vineet Gupta
<Vineet.Gupta1@synopsys.com> wrote:
>
> Well, not really, the issue manifested in ARC O3 testing, but I showed
> the problem existed for arm64 gcc too.

.. but not with a supported kernel configuration.

> > So in fact it's not even queued up for 5.14 due to this all, I just dropped it.
>
> But Why.

I just didn't have time to deal with it during the merge window. If
you keep it alive, that's all fine and good.

                Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-05-14 20:19           ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-14 20:19 UTC (permalink / raw)
  To: openrisc

On Fri, May 14, 2021 at 12:45 PM Vineet Gupta
<Vineet.Gupta1@synopsys.com> wrote:
>
> Well, not really, the issue manifested in ARC O3 testing, but I showed
> the problem existed for arm64 gcc too.

.. but not with a supported kernel configuration.

> > So in fact it's not even queued up for 5.14 due to this all, I just dropped it.
>
> But Why.

I just didn't have time to deal with it during the merge window. If
you keep it alive, that's all fine and good.

                Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 10/13] mwifiex: re-fix for unaligned accesses
  2021-05-14 10:00 ` [PATCH v2 10/13] mwifiex: re-fix for unaligned accesses Arnd Bergmann
@ 2021-05-15  6:22   ` Kalle Valo
  2021-05-15  9:01     ` Arnd Bergmann
  0 siblings, 1 reply; 96+ messages in thread
From: Kalle Valo @ 2021-05-15  6:22 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, Linus Torvalds, Vineet Gupta, Arnd Bergmann,
	Amitkumar Karwar, Ganapathi Bhat, Sharvari Harisangam,
	Xinming Hu, David S. Miller, Jakub Kicinski, Devidas Puranik,
	linux-wireless, netdev, linux-kernel

Arnd Bergmann <arnd@kernel.org> writes:

> From: Arnd Bergmann <arnd@arndb.de>
>
> A patch from 2017 changed some accesses to DMA memory to use
> get_unaligned_le32() and similar interfaces, to avoid problems
> with doing unaligned accesson uncached memory.
>
> However, the change in the mwifiex_pcie_alloc_sleep_cookie_buf()
> function ended up changing the size of the access instead,
> as it operates on a pointer to u8.
>
> Change this function back to actually access the entire 32 bits.
> Note that the pointer is aligned by definition because it came
> from dma_alloc_coherent().
>
> Fixes: 92c70a958b0b ("mwifiex: fix for unaligned reads")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Via which tree should this go? I assume it will go via some other tree
so:

Acked-by: Kalle Valo <kvalo@codeaurora.org>

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 10/13] mwifiex: re-fix for unaligned accesses
  2021-05-15  6:22   ` Kalle Valo
@ 2021-05-15  9:01     ` Arnd Bergmann
  2021-05-15 18:23       ` Kalle Valo
  0 siblings, 1 reply; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-15  9:01 UTC (permalink / raw)
  To: Kalle Valo
  Cc: linux-arch, Linus Torvalds, Vineet Gupta, Amitkumar Karwar,
	Ganapathi Bhat, Sharvari Harisangam, Xinming Hu, David S. Miller,
	Jakub Kicinski, Devidas Puranik, linux-wireless, Networking,
	Linux Kernel Mailing List, devidas.puranik

On Sat, May 15, 2021 at 8:22 AM Kalle Valo <kvalo@codeaurora.org> wrote:
> Arnd Bergmann <arnd@kernel.org> writes:
> > From: Arnd Bergmann <arnd@arndb.de>
> >
> > A patch from 2017 changed some accesses to DMA memory to use
> > get_unaligned_le32() and similar interfaces, to avoid problems
> > with doing unaligned accesson uncached memory.
> >
> > However, the change in the mwifiex_pcie_alloc_sleep_cookie_buf()
> > function ended up changing the size of the access instead,
> > as it operates on a pointer to u8.
> >
> > Change this function back to actually access the entire 32 bits.
> > Note that the pointer is aligned by definition because it came
> > from dma_alloc_coherent().
> >
> > Fixes: 92c70a958b0b ("mwifiex: fix for unaligned reads")
> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> Via which tree should this go? I assume it will go via some other tree
> so:
>
> Acked-by: Kalle Valo <kvalo@codeaurora.org>

I have queued the series in the asm-generic tree for 5.14, as the patches
that depend on this one are a little too invasive for 5.13 at this point.

If you think this fix should be in 5.13, please take it through your tree.

        Arnd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 03/13] sh: remove unaligned access for sh4a
  2021-05-14 12:22     ` Arnd Bergmann
@ 2021-05-15 15:36       ` John Paul Adrian Glaubitz
  2021-05-15 20:10         ` Arnd Bergmann
  0 siblings, 1 reply; 96+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-05-15 15:36 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, Linus Torvalds, Vineet Gupta, Yoshinori Sato,
	Rich Felker, Linux-sh list, Linux Kernel Mailing List

Hi Arnd!

On 5/14/21 2:22 PM, Arnd Bergmann wrote:
>> My Renesas SH4-Boards actually run an sh4a-Kernel, not an sh4-Kernel:
>>
>> root@tirpitz:~> uname -a
>> Linux tirpitz 5.11.0-rc4-00012-g10c03c5bf422 #161 PREEMPT Mon Jan 18 21:10:17 CET 2021 sh4a GNU/Linux
>> root@tirpitz:~>
>>
>> So, if this change reduces performance on sh4a, I would rather not merge it.
> 
> It only makes a difference in very specific scenarios in which unaligned
> accesses are done in a fast path, e.g. when forwarding network packet
> at a high rate on a big-endian kernel (little-endian kernels wouldn't run into
> this on IP headers). If you have a use case for this machine on which the
> you can show a performance regression, I can add a patch on top to put
> the optimized sh4a get_unaligned_le32() back. Dropping this patch
> altogether would make the series much more complex because most of
> the associated code gets removed in the end.

Hmm, okay. But why does code which sits below arch/sh have to be removed anyway?

I don't fully understand why it poses any maintenance burden/

> As I mentioned, supporting "movua" in the compiler likely has a much
> larger impact on performance, as it would also help in user space, and
> it should improve the networking case on little-endian kernels by replacing
> the four separate byte loads/shift pairs with a movua plus a byteswap.

The problem is that - at least in Debian - we use the sh4 baseline while the kernel
supports both sh4 and sh4a, so we can't use any of these instructions in userland at
the moment.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 10/13] mwifiex: re-fix for unaligned accesses
  2021-05-15  9:01     ` Arnd Bergmann
@ 2021-05-15 18:23       ` Kalle Valo
  0 siblings, 0 replies; 96+ messages in thread
From: Kalle Valo @ 2021-05-15 18:23 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, Linus Torvalds, Vineet Gupta, Amitkumar Karwar,
	Ganapathi Bhat, Sharvari Harisangam, Xinming Hu, David S. Miller,
	Jakub Kicinski, Devidas Puranik, linux-wireless, Networking,
	Linux Kernel Mailing List, devidas.puranik

Arnd Bergmann <arnd@kernel.org> writes:

> On Sat, May 15, 2021 at 8:22 AM Kalle Valo <kvalo@codeaurora.org> wrote:
>> Arnd Bergmann <arnd@kernel.org> writes:
>> > From: Arnd Bergmann <arnd@arndb.de>
>> >
>> > A patch from 2017 changed some accesses to DMA memory to use
>> > get_unaligned_le32() and similar interfaces, to avoid problems
>> > with doing unaligned accesson uncached memory.
>> >
>> > However, the change in the mwifiex_pcie_alloc_sleep_cookie_buf()
>> > function ended up changing the size of the access instead,
>> > as it operates on a pointer to u8.
>> >
>> > Change this function back to actually access the entire 32 bits.
>> > Note that the pointer is aligned by definition because it came
>> > from dma_alloc_coherent().
>> >
>> > Fixes: 92c70a958b0b ("mwifiex: fix for unaligned reads")
>> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>>
>> Via which tree should this go? I assume it will go via some other tree
>> so:
>>
>> Acked-by: Kalle Valo <kvalo@codeaurora.org>
>
> I have queued the series in the asm-generic tree for 5.14, as the patches
> that depend on this one are a little too invasive for 5.13 at this point.
>
> If you think this fix should be in 5.13, please take it through your tree.

I think v5.14 is more approriate, so please take this via your tree.
Thanks.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 12/13] asm-generic: uaccess: 1-byte access is always aligned
  2021-05-14 10:01 ` [PATCH v2 12/13] asm-generic: uaccess: 1-byte access is always aligned Arnd Bergmann
@ 2021-05-15 18:41   ` Randy Dunlap
  2021-05-15 20:16     ` Arnd Bergmann
  0 siblings, 1 reply; 96+ messages in thread
From: Randy Dunlap @ 2021-05-15 18:41 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arch
  Cc: Linus Torvalds, Vineet Gupta, Arnd Bergmann, linux-kernel

On 5/14/21 3:01 AM, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> With the cleaned up version of asm-generic/unaligned.h,
> there is a warning about the get_user/put_user helpers using
> unaligned access for single-byte variables:
> 
> include/asm-generic/uaccess.h: In function ‘__get_user_fn’:
> include/asm-generic/unaligned.h:13:15: warning: ‘packed’ attribute ignored for field of type ‘u8’ {aka ‘unsigned char’} [-Wattributes]
>   const struct { type x __packed; } *__pptr = (typeof(__pptr))(ptr); \
> 
> Change these to use a direct pointer dereference to avoid the
> warnings.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  include/asm-generic/uaccess.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/asm-generic/uaccess.h b/include/asm-generic/uaccess.h
> index 4973328f3c6e..7e903e450659 100644
> --- a/include/asm-generic/uaccess.h
> +++ b/include/asm-generic/uaccess.h
> @@ -19,7 +19,7 @@ __get_user_fn(size_t size, const void __user *from, void *to)
>  
>  	switch (size) {
>  	case 1:
> -		*(u8 *)to = get_unaligned((u8 __force *)from);
> +		*(u8 *)to = *((u8 __force *)from);
>  		return 0;
>  	case 2:
>  		*(u16 *)to = get_unaligned((u16 __force *)from);
> @@ -45,7 +45,7 @@ __put_user_fn(size_t size, void __user *to, void *from)
>  
>  	switch (size) {
>  	case 1:
> -		put_unaligned(*(u8 *)from, (u8 __force *)to);
> +		*(*(u8 *)from, (u8 __force *)to);

Should that be           from = 
?

>  		return 0;
>  	case 2:
>  		put_unaligned(*(u16 *)from, (u16 __force *)to);
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 03/13] sh: remove unaligned access for sh4a
  2021-05-15 15:36       ` John Paul Adrian Glaubitz
@ 2021-05-15 20:10         ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-15 20:10 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz
  Cc: linux-arch, Linus Torvalds, Vineet Gupta, Yoshinori Sato,
	Rich Felker, Linux-sh list, Linux Kernel Mailing List

On Sat, May 15, 2021 at 5:36 PM John Paul Adrian Glaubitz
<glaubitz@physik.fu-berlin.de> wrote:
> On 5/14/21 2:22 PM, Arnd Bergmann wrote:
> >> My Renesas SH4-Boards actually run an sh4a-Kernel, not an sh4-Kernel:
> >>
> >> root@tirpitz:~> uname -a
> >> Linux tirpitz 5.11.0-rc4-00012-g10c03c5bf422 #161 PREEMPT Mon Jan 18 21:10:17 CET 2021 sh4a GNU/Linux
> >> root@tirpitz:~>
> >>
> >> So, if this change reduces performance on sh4a, I would rather not merge it.
> >
> > It only makes a difference in very specific scenarios in which unaligned
> > accesses are done in a fast path, e.g. when forwarding network packet
> > at a high rate on a big-endian kernel (little-endian kernels wouldn't run into
> > this on IP headers). If you have a use case for this machine on which the
> > you can show a performance regression, I can add a patch on top to put
> > the optimized sh4a get_unaligned_le32() back. Dropping this patch
> > altogether would make the series much more complex because most of
> > the associated code gets removed in the end.
>
> Hmm, okay. But why does code which sits below arch/sh have to be removed anyway?
>
> I don't fully understand why it poses any maintenance burden/

What  I'm removing is the part that lets architectures override the
generic version.

> > As I mentioned, supporting "movua" in the compiler likely has a much
> > larger impact on performance, as it would also help in user space, and
> > it should improve the networking case on little-endian kernels by replacing
> > the four separate byte loads/shift pairs with a movua plus a byteswap.
>
> The problem is that - at least in Debian - we use the sh4 baseline while the kernel
> supports both sh4 and sh4a, so we can't use any of these instructions in userland at
> the moment.

I tried building an sh7785lcr_defconfig with and without the patch,
and found that
the only affected files are:

- in-kernel nfs client
- crc32c/sha1/sha256 hash functions
- device probing for libata, scsi-core, scsi-disk, hid, r8168
  (should not matter after boot)
- msdos partition parsing

Any nfs client performance difference is probably not even measurable even
at gigabit ethernet speed.
I see that the hash functions are notably different, but I don't know if the
output from the new generic code is actually better or worse than the
original. If you do think this is important, please try the version from

https://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git
unaligned-sh4a

against the version without the last change in that series. If you can find
a relevant test case that exercises it, you may want to add a custom
implementation of the hash functions as well.

       Arnd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 12/13] asm-generic: uaccess: 1-byte access is always aligned
  2021-05-15 18:41   ` Randy Dunlap
@ 2021-05-15 20:16     ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-15 20:16 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-arch, Linus Torvalds, Vineet Gupta, Linux Kernel Mailing List

On Sat, May 15, 2021 at 8:41 PM Randy Dunlap <rdunlap@infradead.org> wrote:
> On 5/14/21 3:01 AM, Arnd Bergmann wrote:
> > From: Arnd Bergmann <arnd@arndb.de>
> > diff --git a/include/asm-generic/uaccess.h b/include/asm-generic/uaccess.h
> > index 4973328f3c6e..7e903e450659 100644
> > --- a/include/asm-generic/uaccess.h
> > +++ b/include/asm-generic/uaccess.h
> > @@ -19,7 +19,7 @@ __get_user_fn(size_t size, const void __user *from, void *to)
> >
> >       switch (size) {
> >       case 1:
> > -             *(u8 *)to = get_unaligned((u8 __force *)from);
> > +             *(u8 *)to = *((u8 __force *)from);
> >               return 0;
> >       case 2:
> >               *(u16 *)to = get_unaligned((u16 __force *)from);
> > @@ -45,7 +45,7 @@ __put_user_fn(size_t size, void __user *to, void *from)
> >
> >       switch (size) {
> >       case 1:
> > -             put_unaligned(*(u8 *)from, (u8 __force *)to);
> > +             *(*(u8 *)from, (u8 __force *)to);
>
> Should that be           from =
> ?

Thanks a lot for catching the typo!

Changed now to

        *(u8 __force *)to = *(u8 *)from;

For some reason neither my own build testing nor the kernel
build bot caught it so far.

        Arnd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 08/13] partitions: msdos: fix one-byte get_unaligned()
  2021-05-14 10:00 ` [PATCH v2 08/13] partitions: msdos: fix one-byte get_unaligned() Arnd Bergmann
@ 2021-05-17 10:28   ` Christoph Hellwig
  2021-05-17 10:44     ` Arnd Bergmann
  0 siblings, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2021-05-17 10:28 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, Linus Torvalds, Vineet Gupta, Arnd Bergmann,
	Richard Russon (FlatCap),
	Jens Axboe, linux-ntfs-dev, linux-block, linux-kernel

On Fri, May 14, 2021 at 12:00:56PM +0200, Arnd Bergmann wrote:
>  /* Borrowed from msdos.c */
> -#define SYS_IND(p)		(get_unaligned(&(p)->sys_ind))
> +#define SYS_IND(p)		((p)->sys_ind)

Please just kill this macro entirely.

> -#define SYS_IND(p)	get_unaligned(&p->sys_ind)
> +#define SYS_IND(p)	(p->sys_ind)

Same here.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 08/13] partitions: msdos: fix one-byte get_unaligned()
  2021-05-17 10:28   ` Christoph Hellwig
@ 2021-05-17 10:44     ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-17 10:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-arch, Linus Torvalds, Vineet Gupta,
	Richard Russon (FlatCap),
	Jens Axboe, linux-ntfs-dev, linux-block,
	Linux Kernel Mailing List

On Mon, May 17, 2021 at 12:28 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, May 14, 2021 at 12:00:56PM +0200, Arnd Bergmann wrote:
> >  /* Borrowed from msdos.c */
> > -#define SYS_IND(p)           (get_unaligned(&(p)->sys_ind))
> > +#define SYS_IND(p)           ((p)->sys_ind)
>
> Please just kill this macro entirely.
>
> > -#define SYS_IND(p)   get_unaligned(&p->sys_ind)
> > +#define SYS_IND(p)   (p->sys_ind)
>
> Same here.

Done, thanks for taking a look.

       Arnd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
  2021-05-14 10:00   ` Arnd Bergmann
@ 2021-05-17 21:53     ` Eric Biggers
  -1 siblings, 0 replies; 96+ messages in thread
From: Eric Biggers @ 2021-05-17 21:53 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, Linus Torvalds, Vineet Gupta, Arnd Bergmann,
	Russell King, Herbert Xu, David S. Miller, Thomas Bogendoerfer,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-mips

On Fri, May 14, 2021 at 12:00:55PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected
> behavior when faced with overlapping unaligned pointers. The kernel's
> unaligned/access-ok.h header technically invokes undefined behavior
> that happens to usually work on the architectures using it, but if the
> compiler optimizes code based on the assumption that undefined behavior
> doesn't happen, it can create output that actually causes data corruption.
> 
> A related problem was previously found on 32-bit ARMv7, where most
> instructions can be used on unaligned data, but 64-bit ldrd/strd causes
> an exception. The workaround was to always use the unaligned/le_struct.h
> helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM:
> 8715/1: add a private asm/unaligned.h").
> 
> The same solution should work on all other architectures as well, so
> remove the access-ok.h variant and use the other one unconditionally on
> all architectures, picking either the big-endian or little-endian version.

FYI, gcc 10 had a bug where it miscompiled code that uses "packed structs" to
copy between overlapping unaligned pointers
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994).

I'm not sure whether the kernel will run into that or not, and gcc has since
fixed it.  But it's worth mentioning, especially since the issue mentioned in
this commit sounds very similar (overlapping unaligned pointers), and both
involved implementations of DEFLATE decompression.

Anyway, partly due to the above, in userspace I now only use memcpy() to
implement {get,put}_unaligned_*, since these days it seems to be compiled
optimally and have the least amount of problems.

I wonder if the kernel should do the same, or whether there are still cases
where memcpy() isn't compiled optimally.  armv6/7 used to be one such case, but
it was fixed in gcc 6.

- Eric

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
@ 2021-05-17 21:53     ` Eric Biggers
  0 siblings, 0 replies; 96+ messages in thread
From: Eric Biggers @ 2021-05-17 21:53 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, Linus Torvalds, Vineet Gupta, Arnd Bergmann,
	Russell King, Herbert Xu, David S. Miller, Thomas Bogendoerfer,
	linux-arm-kernel, linux-kernel, linux-crypto, linux-mips

On Fri, May 14, 2021 at 12:00:55PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected
> behavior when faced with overlapping unaligned pointers. The kernel's
> unaligned/access-ok.h header technically invokes undefined behavior
> that happens to usually work on the architectures using it, but if the
> compiler optimizes code based on the assumption that undefined behavior
> doesn't happen, it can create output that actually causes data corruption.
> 
> A related problem was previously found on 32-bit ARMv7, where most
> instructions can be used on unaligned data, but 64-bit ldrd/strd causes
> an exception. The workaround was to always use the unaligned/le_struct.h
> helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM:
> 8715/1: add a private asm/unaligned.h").
> 
> The same solution should work on all other architectures as well, so
> remove the access-ok.h variant and use the other one unconditionally on
> all architectures, picking either the big-endian or little-endian version.

FYI, gcc 10 had a bug where it miscompiled code that uses "packed structs" to
copy between overlapping unaligned pointers
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994).

I'm not sure whether the kernel will run into that or not, and gcc has since
fixed it.  But it's worth mentioning, especially since the issue mentioned in
this commit sounds very similar (overlapping unaligned pointers), and both
involved implementations of DEFLATE decompression.

Anyway, partly due to the above, in userspace I now only use memcpy() to
implement {get,put}_unaligned_*, since these days it seems to be compiled
optimally and have the least amount of problems.

I wonder if the kernel should do the same, or whether there are still cases
where memcpy() isn't compiled optimally.  armv6/7 used to be one such case, but
it was fixed in gcc 6.

- Eric

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
  2021-05-17 21:53     ` Eric Biggers
@ 2021-05-18  7:25       ` Arnd Bergmann
  -1 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-18  7:25 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-arch, Linus Torvalds, Vineet Gupta, Russell King,
	Herbert Xu, David S. Miller, Thomas Bogendoerfer, Linux ARM,
	Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER

On Mon, May 17, 2021 at 11:53 PM Eric Biggers <ebiggers@kernel.org> wrote:
> On Fri, May 14, 2021 at 12:00:55PM +0200, Arnd Bergmann wrote:
> > From: Arnd Bergmann <arnd@arndb.de>
> >
> > As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected
> > behavior when faced with overlapping unaligned pointers. The kernel's
> > unaligned/access-ok.h header technically invokes undefined behavior
> > that happens to usually work on the architectures using it, but if the
> > compiler optimizes code based on the assumption that undefined behavior
> > doesn't happen, it can create output that actually causes data corruption.
> >
> > A related problem was previously found on 32-bit ARMv7, where most
> > instructions can be used on unaligned data, but 64-bit ldrd/strd causes
> > an exception. The workaround was to always use the unaligned/le_struct.h
> > helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM:
> > 8715/1: add a private asm/unaligned.h").
> >
> > The same solution should work on all other architectures as well, so
> > remove the access-ok.h variant and use the other one unconditionally on
> > all architectures, picking either the big-endian or little-endian version.
>
> FYI, gcc 10 had a bug where it miscompiled code that uses "packed structs" to
> copy between overlapping unaligned pointers
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994).

Thank you for pointing this out

> I'm not sure whether the kernel will run into that or not, and gcc has since
> fixed it.  But it's worth mentioning, especially since the issue mentioned in
> this commit sounds very similar (overlapping unaligned pointers), and both
> involved implementations of DEFLATE decompression.

I tried reproducing this on the kernel deflate code with the kernel.org gcc-10.1
and gcc-10.3 crosstool versions but couldn't quite get there with Vineet's
preprocessed source https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

Trying with both the original get_unaligned() version in there and the
packed-struct
variant, I get the same output from gcc-10.1 and gcc-10.3 when I compile those
myself for arc hs4x , but it's rather different from the output that Vineet got
and I don't know how to spot whether the problem exists in any of those
versions.

> Anyway, partly due to the above, in userspace I now only use memcpy() to
> implement {get,put}_unaligned_*, since these days it seems to be compiled
> optimally and have the least amount of problems.
>
> I wonder if the kernel should do the same, or whether there are still cases
> where memcpy() isn't compiled optimally.  armv6/7 used to be one such case, but
> it was fixed in gcc 6.

It would have to be memmove(), not memcpy() in this case, right?
My feeling is that if gcc-4.9 and gcc-5 produce correct but slightly slower
code, we can live with that, unlike the possibility of gcc-10.{1,2} producing
incorrect code.

Since the new asm/unaligned.h has a single implementation across all
architectures, we could probably fall back to a memmove based version for
the compilers affected by the 94994 bug,  but I'd first need to have a better
way to test regarding whether given combination of asm/unaligned.h and
compiler version runs into this bug.

I have checked your reproducer and confirmed that it does affect x86_64
gcc-10.1 -O3 with my proposed version of asm-generic/unaligned.h, but
does not trigger on any other version (4.9 though 9.3, 10.3 or 11.1), and not
on -O2 or "-O3 -mno-sse" builds or on arm64, but that doesn't necessarily
mean it's safe on these.

        Arnd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
@ 2021-05-18  7:25       ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-18  7:25 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-arch, Linus Torvalds, Vineet Gupta, Russell King,
	Herbert Xu, David S. Miller, Thomas Bogendoerfer, Linux ARM,
	Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER

On Mon, May 17, 2021 at 11:53 PM Eric Biggers <ebiggers@kernel.org> wrote:
> On Fri, May 14, 2021 at 12:00:55PM +0200, Arnd Bergmann wrote:
> > From: Arnd Bergmann <arnd@arndb.de>
> >
> > As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected
> > behavior when faced with overlapping unaligned pointers. The kernel's
> > unaligned/access-ok.h header technically invokes undefined behavior
> > that happens to usually work on the architectures using it, but if the
> > compiler optimizes code based on the assumption that undefined behavior
> > doesn't happen, it can create output that actually causes data corruption.
> >
> > A related problem was previously found on 32-bit ARMv7, where most
> > instructions can be used on unaligned data, but 64-bit ldrd/strd causes
> > an exception. The workaround was to always use the unaligned/le_struct.h
> > helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM:
> > 8715/1: add a private asm/unaligned.h").
> >
> > The same solution should work on all other architectures as well, so
> > remove the access-ok.h variant and use the other one unconditionally on
> > all architectures, picking either the big-endian or little-endian version.
>
> FYI, gcc 10 had a bug where it miscompiled code that uses "packed structs" to
> copy between overlapping unaligned pointers
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994).

Thank you for pointing this out

> I'm not sure whether the kernel will run into that or not, and gcc has since
> fixed it.  But it's worth mentioning, especially since the issue mentioned in
> this commit sounds very similar (overlapping unaligned pointers), and both
> involved implementations of DEFLATE decompression.

I tried reproducing this on the kernel deflate code with the kernel.org gcc-10.1
and gcc-10.3 crosstool versions but couldn't quite get there with Vineet's
preprocessed source https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

Trying with both the original get_unaligned() version in there and the
packed-struct
variant, I get the same output from gcc-10.1 and gcc-10.3 when I compile those
myself for arc hs4x , but it's rather different from the output that Vineet got
and I don't know how to spot whether the problem exists in any of those
versions.

> Anyway, partly due to the above, in userspace I now only use memcpy() to
> implement {get,put}_unaligned_*, since these days it seems to be compiled
> optimally and have the least amount of problems.
>
> I wonder if the kernel should do the same, or whether there are still cases
> where memcpy() isn't compiled optimally.  armv6/7 used to be one such case, but
> it was fixed in gcc 6.

It would have to be memmove(), not memcpy() in this case, right?
My feeling is that if gcc-4.9 and gcc-5 produce correct but slightly slower
code, we can live with that, unlike the possibility of gcc-10.{1,2} producing
incorrect code.

Since the new asm/unaligned.h has a single implementation across all
architectures, we could probably fall back to a memmove based version for
the compilers affected by the 94994 bug,  but I'd first need to have a better
way to test regarding whether given combination of asm/unaligned.h and
compiler version runs into this bug.

I have checked your reproducer and confirmed that it does affect x86_64
gcc-10.1 -O3 with my proposed version of asm-generic/unaligned.h, but
does not trigger on any other version (4.9 though 9.3, 10.3 or 11.1), and not
on -O2 or "-O3 -mno-sse" builds or on arm64, but that doesn't necessarily
mean it's safe on these.

        Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
  2021-05-18  7:25       ` Arnd Bergmann
@ 2021-05-18 14:56         ` Linus Torvalds
  -1 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-18 14:56 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Eric Biggers, linux-arch, Vineet Gupta, Russell King, Herbert Xu,
	David S. Miller, Thomas Bogendoerfer, Linux ARM,
	Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER

On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > I wonder if the kernel should do the same, or whether there are still cases
> > where memcpy() isn't compiled optimally.  armv6/7 used to be one such case, but
> > it was fixed in gcc 6.
>
> It would have to be memmove(), not memcpy() in this case, right?

No, it would simply be something like

  #define __get_unaligned_t(type, ptr) \
        ({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; })

  #define get_unaligned(ptr) \
        __get_unaligned_t(typeof(*(ptr)), ptr)

but honestly, the likelihood that the compiler generates something
horrible (possibly because of KASAN etc) is uncomfortably high.

I'd prefer the __packed thing. We don't actually use -O3, and it's
considered a bad idea, and the gcc bug is as such less likely than
just  the above generating unacceptable code (we have several cases
where "bad code generation" ends up being an actual bug, since we
depend on inlining and depend on some code sequences not generating
calls etc).

But I hate how gcc is buggy in so many places here, and the
straightforward thing is made to explicitly not work.

I absolutely despise compiler people who think it's ok to generate
known bad code based on pointless "undefined behavior" arguments - and
then those same clever optimizations break even when you do things
properly.  It's basically intellectual dishonesty - doing known
fragile things, blaming the user when it breaks, but then not
acknowledging that the fragile shit they did was broken even when the
user bent over backwards.

                Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
@ 2021-05-18 14:56         ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-18 14:56 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Eric Biggers, linux-arch, Vineet Gupta, Russell King, Herbert Xu,
	David S. Miller, Thomas Bogendoerfer, Linux ARM,
	Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER

On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > I wonder if the kernel should do the same, or whether there are still cases
> > where memcpy() isn't compiled optimally.  armv6/7 used to be one such case, but
> > it was fixed in gcc 6.
>
> It would have to be memmove(), not memcpy() in this case, right?

No, it would simply be something like

  #define __get_unaligned_t(type, ptr) \
        ({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; })

  #define get_unaligned(ptr) \
        __get_unaligned_t(typeof(*(ptr)), ptr)

but honestly, the likelihood that the compiler generates something
horrible (possibly because of KASAN etc) is uncomfortably high.

I'd prefer the __packed thing. We don't actually use -O3, and it's
considered a bad idea, and the gcc bug is as such less likely than
just  the above generating unacceptable code (we have several cases
where "bad code generation" ends up being an actual bug, since we
depend on inlining and depend on some code sequences not generating
calls etc).

But I hate how gcc is buggy in so many places here, and the
straightforward thing is made to explicitly not work.

I absolutely despise compiler people who think it's ok to generate
known bad code based on pointless "undefined behavior" arguments - and
then those same clever optimizations break even when you do things
properly.  It's basically intellectual dishonesty - doing known
fragile things, blaming the user when it breaks, but then not
acknowledging that the fragile shit they did was broken even when the
user bent over backwards.

                Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
  2021-05-18 14:56         ` Linus Torvalds
@ 2021-05-18 15:41           ` Arnd Bergmann
  -1 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-18 15:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Biggers, linux-arch, Vineet Gupta, Russell King, Herbert Xu,
	David S. Miller, Thomas Bogendoerfer, Linux ARM,
	Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER

On Tue, May 18, 2021 at 4:56 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann <arnd@kernel.org> wrote:
> > >
> > > I wonder if the kernel should do the same, or whether there are still cases
> > > where memcpy() isn't compiled optimally.  armv6/7 used to be one such case, but
> > > it was fixed in gcc 6.
> >
> > It would have to be memmove(), not memcpy() in this case, right?
>
> No, it would simply be something like
>
>   #define __get_unaligned_t(type, ptr) \
>         ({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; })
>
>   #define get_unaligned(ptr) \
>         __get_unaligned_t(typeof(*(ptr)), ptr)
>
> but honestly, the likelihood that the compiler generates something
> horrible (possibly because of KASAN etc) is uncomfortably high.
>
> I'd prefer the __packed thing. We don't actually use -O3, and it's
> considered a bad idea, and the gcc bug is as such less likely than
> just  the above generating unacceptable code (we have several cases
> where "bad code generation" ends up being an actual bug, since we
> depend on inlining and depend on some code sequences not generating
> calls etc).

I think the important question is whether we know that the bug that Eric
pointed to can only happen with -O3, or whether it is something in
gcc-10.1 that got triggered by -O3 -msse on x86-64 but could equally
well get triggered on some other architecture without -O3 or
vector instructions enabled.

From the gcc fix at
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9fa5b473b5b8e289b
it looks like this code path is entered when compiling with
-ftree-loop-vectorize, which is documented as

'-ftree-loop-vectorize'
     Perform loop vectorization on trees.  This flag is enabled by
     default at '-O3' and by '-ftree-vectorize', '-fprofile-use', and
     '-fauto-profile'.

-ftree-vectorize is set in arch/arm/lib/xor-neon.c
-O3 is set for the lz4 and zstd compression helpers and for wireguard.

To be on the safe side, we could pass -fno-tree-loop-vectorize along
with -O3 on the affected gcc versions, or use a bigger hammer
(not use -O3 at all, always set -fno-tree-loop-vectorize, ...).

        Arnd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
@ 2021-05-18 15:41           ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-18 15:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Biggers, linux-arch, Vineet Gupta, Russell King, Herbert Xu,
	David S. Miller, Thomas Bogendoerfer, Linux ARM,
	Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER

On Tue, May 18, 2021 at 4:56 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann <arnd@kernel.org> wrote:
> > >
> > > I wonder if the kernel should do the same, or whether there are still cases
> > > where memcpy() isn't compiled optimally.  armv6/7 used to be one such case, but
> > > it was fixed in gcc 6.
> >
> > It would have to be memmove(), not memcpy() in this case, right?
>
> No, it would simply be something like
>
>   #define __get_unaligned_t(type, ptr) \
>         ({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; })
>
>   #define get_unaligned(ptr) \
>         __get_unaligned_t(typeof(*(ptr)), ptr)
>
> but honestly, the likelihood that the compiler generates something
> horrible (possibly because of KASAN etc) is uncomfortably high.
>
> I'd prefer the __packed thing. We don't actually use -O3, and it's
> considered a bad idea, and the gcc bug is as such less likely than
> just  the above generating unacceptable code (we have several cases
> where "bad code generation" ends up being an actual bug, since we
> depend on inlining and depend on some code sequences not generating
> calls etc).

I think the important question is whether we know that the bug that Eric
pointed to can only happen with -O3, or whether it is something in
gcc-10.1 that got triggered by -O3 -msse on x86-64 but could equally
well get triggered on some other architecture without -O3 or
vector instructions enabled.

From the gcc fix at
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9fa5b473b5b8e289b
it looks like this code path is entered when compiling with
-ftree-loop-vectorize, which is documented as

'-ftree-loop-vectorize'
     Perform loop vectorization on trees.  This flag is enabled by
     default at '-O3' and by '-ftree-vectorize', '-fprofile-use', and
     '-fauto-profile'.

-ftree-vectorize is set in arch/arm/lib/xor-neon.c
-O3 is set for the lz4 and zstd compression helpers and for wireguard.

To be on the safe side, we could pass -fno-tree-loop-vectorize along
with -O3 on the affected gcc versions, or use a bigger hammer
(not use -O3 at all, always set -fno-tree-loop-vectorize, ...).

        Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
  2021-05-18 15:41           ` Arnd Bergmann
@ 2021-05-18 16:12             ` Linus Torvalds
  -1 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-18 16:12 UTC (permalink / raw)
  To: Arnd Bergmann, Jason A. Donenfeld
  Cc: Eric Biggers, linux-arch, Vineet Gupta, Russell King, Herbert Xu,
	David S. Miller, Thomas Bogendoerfer, Linux ARM,
	Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER

On Tue, May 18, 2021 at 5:42 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> To be on the safe side, we could pass -fno-tree-loop-vectorize along
> with -O3 on the affected gcc versions, or use a bigger hammer
> (not use -O3 at all, always set -fno-tree-loop-vectorize, ...).

I personally think -O3 in general is unsafe.

It has historically been horribly buggy. It's gotten better, but this
case clearly shows that "gotten better" really isn't that high of a
bar.

Very few projects use -O3, which is obviously part of why it's buggy.
But the other part of why it's buggy is that vectorization is simply
very complicated, and honestly, judging by the last report the gcc
people don't care about being careful. They literally are ok with
knowingly generating an off-by-one range check, because "it's
undefined behavior".

With that kind of mentality, I'm not personally all that inclined to
say "sure, use -O3". We know it has bugs even for the well-defined
cases.

> -O3 is set for the lz4 and zstd compression helpers and for wireguard.

I'm actually surprised wireguard would use -O3. Yes, performance is
important. But for wireguard, correctness is certainly important too.
Maybe Jason isn't aware of just how bad gcc -O3 has historically been?

And -O3 has often generated _slower_ code, in addition to the bugs.
It's not like it's a situation where "-O3 is obviously better than
-O2". There's a reason -O2 is the default.

And that tends to be even more true in the kernel than in many user
space programs (ie smaller loops, generally much higher I$ miss rates
etc).

Jason? How big of a deal is that -O3 for wireguard wrt the normal -O2?
There are known buggy gcc versions that aren't ancient.

Of the other cases, that xor-neon.c case actually makes sense. For
that file, it literally exists _only_ to get a vectorized version of
the trivial xor_8regs loop. It's one of the (very very few) cases of
vectorization we actually want. And in that case, we might even want
to make things easier - and more explicit - for the compiler by making
the xor_8regs loops use "restrict" pointers.

That neon case actually wants and needs that tree-vectorization to
DTRT. But maybe it doesn't need the actual _loop_ vectorization? The
xor_8regs code is literally using hand-unrolled loops already, exactly
to make it as simple as possible for the compiler (but the lack of
"restrict" pointers means that it's not all that simple after all, and
I assume the compiler generates conditionals for the NEON case?

lz4 is questionable - yes, upstream lh4 seems to use -O3 (good), but
it also very much uses unaligned accesses, which is where the gcc bug
hits. I doubt that it really needs or wants the loop vectorization.

zstd looks very similar to lz4.

End result: at a minimum, I'd suggest using
"-fno-tree-loop-vectorize", although somebody should check that NEON
case.

And I still think that using O3 for anything halfway complicated
should be considered odd and need some strong numbers to enable.

               Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
@ 2021-05-18 16:12             ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-05-18 16:12 UTC (permalink / raw)
  To: Arnd Bergmann, Jason A. Donenfeld
  Cc: Eric Biggers, linux-arch, Vineet Gupta, Russell King, Herbert Xu,
	David S. Miller, Thomas Bogendoerfer, Linux ARM,
	Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER

On Tue, May 18, 2021 at 5:42 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> To be on the safe side, we could pass -fno-tree-loop-vectorize along
> with -O3 on the affected gcc versions, or use a bigger hammer
> (not use -O3 at all, always set -fno-tree-loop-vectorize, ...).

I personally think -O3 in general is unsafe.

It has historically been horribly buggy. It's gotten better, but this
case clearly shows that "gotten better" really isn't that high of a
bar.

Very few projects use -O3, which is obviously part of why it's buggy.
But the other part of why it's buggy is that vectorization is simply
very complicated, and honestly, judging by the last report the gcc
people don't care about being careful. They literally are ok with
knowingly generating an off-by-one range check, because "it's
undefined behavior".

With that kind of mentality, I'm not personally all that inclined to
say "sure, use -O3". We know it has bugs even for the well-defined
cases.

> -O3 is set for the lz4 and zstd compression helpers and for wireguard.

I'm actually surprised wireguard would use -O3. Yes, performance is
important. But for wireguard, correctness is certainly important too.
Maybe Jason isn't aware of just how bad gcc -O3 has historically been?

And -O3 has often generated _slower_ code, in addition to the bugs.
It's not like it's a situation where "-O3 is obviously better than
-O2". There's a reason -O2 is the default.

And that tends to be even more true in the kernel than in many user
space programs (ie smaller loops, generally much higher I$ miss rates
etc).

Jason? How big of a deal is that -O3 for wireguard wrt the normal -O2?
There are known buggy gcc versions that aren't ancient.

Of the other cases, that xor-neon.c case actually makes sense. For
that file, it literally exists _only_ to get a vectorized version of
the trivial xor_8regs loop. It's one of the (very very few) cases of
vectorization we actually want. And in that case, we might even want
to make things easier - and more explicit - for the compiler by making
the xor_8regs loops use "restrict" pointers.

That neon case actually wants and needs that tree-vectorization to
DTRT. But maybe it doesn't need the actual _loop_ vectorization? The
xor_8regs code is literally using hand-unrolled loops already, exactly
to make it as simple as possible for the compiler (but the lack of
"restrict" pointers means that it's not all that simple after all, and
I assume the compiler generates conditionals for the NEON case?

lz4 is questionable - yes, upstream lh4 seems to use -O3 (good), but
it also very much uses unaligned accesses, which is where the gcc bug
hits. I doubt that it really needs or wants the loop vectorization.

zstd looks very similar to lz4.

End result: at a minimum, I'd suggest using
"-fno-tree-loop-vectorize", although somebody should check that NEON
case.

And I still think that using O3 for anything halfway complicated
should be considered odd and need some strong numbers to enable.

               Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
  2021-05-18 16:12             ` Linus Torvalds
@ 2021-05-18 18:09               ` Jason A. Donenfeld
  -1 siblings, 0 replies; 96+ messages in thread
From: Jason A. Donenfeld @ 2021-05-18 18:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arnd Bergmann, Eric Biggers, linux-arch, Vineet Gupta,
	Russell King, Herbert Xu, David S. Miller, Thomas Bogendoerfer,
	Linux ARM, Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER

Hi Linus,

On Tue, May 18, 2021 at 6:12 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> I'm actually surprised wireguard would use -O3. Yes, performance is
> important. But for wireguard, correctness is certainly important too.
> Maybe Jason isn't aware of just how bad gcc -O3 has historically been?
> Jason? How big of a deal is that -O3 for wireguard wrt the normal -O2?
> There are known buggy gcc versions that aren't ancient.

My impression has always been that O3 might sometimes generate slower
code, but not that it generates buggy code so commonly. Thanks for
letting me know.

I have a habit of compulsively run IDA Pro after making changes (brain
damage from too many years as a "security person" or something), to
see what the compiler did, and I've just been doing that with O3 since
the beginning of the project, so that's what I wound up optimizing
for. Or sometimes I'll work little things out in Godbolt's compiler
explorer. It's not like it matters much most of the time, but
sometimes I enjoy the golf. Anyway, I've never noticed it producing
any clearly wrong code compared to O2. But I'm obviously not testing
on all compilers or on all architectures. So if you think there's
danger lurking somewhere, it seems reasonable to change that to O2.

Comparing gcc 11's output between O2 and O3, it looks like the primary
difference is that the constant propagation is much less aggressive
with O2, and less inlining in general also means that some stores and
loads to local variables across static function calls aren't being
coalesced. A few null checks are removed too, where the compiler can
prove them away.

So while I've never seen issues with that code under O3, I don't see a
super compelling speed up anywhere either, but rather a bunch of
places that may or may not be theoretically faster or slower on some
system, maybe. I can queue up a patch for the next wireguard series I
send to Dave.

Jason

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
@ 2021-05-18 18:09               ` Jason A. Donenfeld
  0 siblings, 0 replies; 96+ messages in thread
From: Jason A. Donenfeld @ 2021-05-18 18:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arnd Bergmann, Eric Biggers, linux-arch, Vineet Gupta,
	Russell King, Herbert Xu, David S. Miller, Thomas Bogendoerfer,
	Linux ARM, Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER

Hi Linus,

On Tue, May 18, 2021 at 6:12 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> I'm actually surprised wireguard would use -O3. Yes, performance is
> important. But for wireguard, correctness is certainly important too.
> Maybe Jason isn't aware of just how bad gcc -O3 has historically been?
> Jason? How big of a deal is that -O3 for wireguard wrt the normal -O2?
> There are known buggy gcc versions that aren't ancient.

My impression has always been that O3 might sometimes generate slower
code, but not that it generates buggy code so commonly. Thanks for
letting me know.

I have a habit of compulsively run IDA Pro after making changes (brain
damage from too many years as a "security person" or something), to
see what the compiler did, and I've just been doing that with O3 since
the beginning of the project, so that's what I wound up optimizing
for. Or sometimes I'll work little things out in Godbolt's compiler
explorer. It's not like it matters much most of the time, but
sometimes I enjoy the golf. Anyway, I've never noticed it producing
any clearly wrong code compared to O2. But I'm obviously not testing
on all compilers or on all architectures. So if you think there's
danger lurking somewhere, it seems reasonable to change that to O2.

Comparing gcc 11's output between O2 and O3, it looks like the primary
difference is that the constant propagation is much less aggressive
with O2, and less inlining in general also means that some stores and
loads to local variables across static function calls aren't being
coalesced. A few null checks are removed too, where the compiler can
prove them away.

So while I've never seen issues with that code under O3, I don't see a
super compelling speed up anywhere either, but rather a bunch of
places that may or may not be theoretically faster or slower on some
system, maybe. I can queue up a patch for the next wireguard series I
send to Dave.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
  2021-05-18 16:12             ` Linus Torvalds
@ 2021-05-18 20:51               ` Arnd Bergmann
  -1 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-18 20:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, Eric Biggers, linux-arch, Vineet Gupta,
	Russell King, Herbert Xu, David S. Miller, Thomas Bogendoerfer,
	Linux ARM, Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER, Nobuhiro Iwamatsu

On Tue, May 18, 2021 at 6:12 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 18, 2021 at 5:42 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> Of the other cases, that xor-neon.c case actually makes sense. For
> that file, it literally exists _only_ to get a vectorized version of
> the trivial xor_8regs loop. It's one of the (very very few) cases of
> vectorization we actually want. And in that case, we might even want
> to make things easier - and more explicit - for the compiler by making
> the xor_8regs loops use "restrict" pointers.
>
> That neon case actually wants and needs that tree-vectorization to
> DTRT. But maybe it doesn't need the actual _loop_ vectorization? The
> xor_8regs code is literally using hand-unrolled loops already, exactly
> to make it as simple as possible for the compiler (but the lack of
> "restrict" pointers means that it's not all that simple after all, and
> I assume the compiler generates conditionals for the NEON case?

Right, I think there is an ongoing debate over how to best handle this
one in clang, since that does not do any vectorization for this file
unless the pointers are marked "restrict". As far as I remember, there
are some callers that want to do the xor in place though, which
means restrict is wrong.

> lz4 is questionable - yes, upstream lh4 seems to use -O3 (good), but
> it also very much uses unaligned accesses, which is where the gcc bug
> hits. I doubt that it really needs or wants the loop vectorization.

I ran some limited speed tests with the lz4 sources that come with Ubuntu,
using gcc-10.3 on an AMD Zen1 Threadripper with 10GB of /dev/urandom
input.
This package patches the sources to use -O2 and no vectorization,
which turns out to be the fastest combination for my stupid test as well.

The results are barely above noise, but it appears that  -O2
-ftree-loop-vectorize
makes it slightly slower than just -O2, while -O3 is even slower than
that regardless of -fno-tree-loop-vectorize/-ftree-loop-vectorize.

I see that Nobuhiro Iwamatsu (Cc'd) changed the Debian lz4 package to
use -O2, but I don't see an explanation for it. I also see that the lz4 sources
force -O2 on ppc64 because -O3 causes a 30% slowdown from vectorization.
The kernel version is missing the bit that does this.

> zstd looks very similar to lz4.

> End result: at a minimum, I'd suggest using
> "-fno-tree-loop-vectorize", although somebody should check that NEON
> case.

> And I still think that using O3 for anything halfway complicated
> should be considered odd and need some strong numbers to enable.

Agreed. I think there is a fairly strong case for just using -O2 on lz4
and backport that to stable.
Searching for lz4 bugs with -O3 also finds several reports including
one that I sent myself:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702

I see that user space zstd is built with -O3 in Debian, but it the changelog
also lists "Improved : better speed on clang and gcc -O2, thanks to Eric
Biggers", so maybe Eric has some useful ideas on whether we should
just use -O2 for the in-kernel version.

        Arnd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
@ 2021-05-18 20:51               ` Arnd Bergmann
  0 siblings, 0 replies; 96+ messages in thread
From: Arnd Bergmann @ 2021-05-18 20:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, Eric Biggers, linux-arch, Vineet Gupta,
	Russell King, Herbert Xu, David S. Miller, Thomas Bogendoerfer,
	Linux ARM, Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER, Nobuhiro Iwamatsu

On Tue, May 18, 2021 at 6:12 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, May 18, 2021 at 5:42 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> Of the other cases, that xor-neon.c case actually makes sense. For
> that file, it literally exists _only_ to get a vectorized version of
> the trivial xor_8regs loop. It's one of the (very very few) cases of
> vectorization we actually want. And in that case, we might even want
> to make things easier - and more explicit - for the compiler by making
> the xor_8regs loops use "restrict" pointers.
>
> That neon case actually wants and needs that tree-vectorization to
> DTRT. But maybe it doesn't need the actual _loop_ vectorization? The
> xor_8regs code is literally using hand-unrolled loops already, exactly
> to make it as simple as possible for the compiler (but the lack of
> "restrict" pointers means that it's not all that simple after all, and
> I assume the compiler generates conditionals for the NEON case?

Right, I think there is an ongoing debate over how to best handle this
one in clang, since that does not do any vectorization for this file
unless the pointers are marked "restrict". As far as I remember, there
are some callers that want to do the xor in place though, which
means restrict is wrong.

> lz4 is questionable - yes, upstream lh4 seems to use -O3 (good), but
> it also very much uses unaligned accesses, which is where the gcc bug
> hits. I doubt that it really needs or wants the loop vectorization.

I ran some limited speed tests with the lz4 sources that come with Ubuntu,
using gcc-10.3 on an AMD Zen1 Threadripper with 10GB of /dev/urandom
input.
This package patches the sources to use -O2 and no vectorization,
which turns out to be the fastest combination for my stupid test as well.

The results are barely above noise, but it appears that  -O2
-ftree-loop-vectorize
makes it slightly slower than just -O2, while -O3 is even slower than
that regardless of -fno-tree-loop-vectorize/-ftree-loop-vectorize.

I see that Nobuhiro Iwamatsu (Cc'd) changed the Debian lz4 package to
use -O2, but I don't see an explanation for it. I also see that the lz4 sources
force -O2 on ppc64 because -O3 causes a 30% slowdown from vectorization.
The kernel version is missing the bit that does this.

> zstd looks very similar to lz4.

> End result: at a minimum, I'd suggest using
> "-fno-tree-loop-vectorize", although somebody should check that NEON
> case.

> And I still think that using O3 for anything halfway complicated
> should be considered odd and need some strong numbers to enable.

Agreed. I think there is a fairly strong case for just using -O2 on lz4
and backport that to stable.
Searching for lz4 bugs with -O3 also finds several reports including
one that I sent myself:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702

I see that user space zstd is built with -O3 in Debian, but it the changelog
also lists "Improved : better speed on clang and gcc -O2, thanks to Eric
Biggers", so maybe Eric has some useful ideas on whether we should
just use -O2 for the in-kernel version.

        Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
  2021-05-18 14:56         ` Linus Torvalds
@ 2021-05-18 21:14           ` David Laight
  -1 siblings, 0 replies; 96+ messages in thread
From: David Laight @ 2021-05-18 21:14 UTC (permalink / raw)
  To: 'Linus Torvalds', Arnd Bergmann
  Cc: Eric Biggers, linux-arch, Vineet Gupta, Russell King, Herbert Xu,
	David S. Miller, Thomas Bogendoerfer, Linux ARM,
	Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER

From: Linus Torvalds
> Sent: 18 May 2021 15:56
> 
> On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann <arnd@kernel.org> wrote:
> > >
> > > I wonder if the kernel should do the same, or whether there are still cases
> > > where memcpy() isn't compiled optimally.  armv6/7 used to be one such case, but
> > > it was fixed in gcc 6.
> >
> > It would have to be memmove(), not memcpy() in this case, right?
> 
> No, it would simply be something like
> 
>   #define __get_unaligned_t(type, ptr) \
>         ({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; })

You still need something to ensure that gcc can't assume that
'ptr' has an aligned type.
If there is an 'int *ptr' visible in the call chain no amount
of (void *) casts will make gcc forget the alignment.
So the memcpy() will get converted to an aligned load-store pair.
(This has always caused grief on sparc.)

A cast though (long) might be enough, as might a cast to a __packed
struct pointer type.
Using a union of the two pointer types might be ok - but might
generate a store/load to stack.
An alternative is an asm statement with input and output of different
pointer types but using the same register for both.
That ought to force the compile to forget any tracked type
and value.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
@ 2021-05-18 21:14           ` David Laight
  0 siblings, 0 replies; 96+ messages in thread
From: David Laight @ 2021-05-18 21:14 UTC (permalink / raw)
  To: 'Linus Torvalds', Arnd Bergmann
  Cc: Eric Biggers, linux-arch, Vineet Gupta, Russell King, Herbert Xu,
	David S. Miller, Thomas Bogendoerfer, Linux ARM,
	Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER

From: Linus Torvalds
> Sent: 18 May 2021 15:56
> 
> On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann <arnd@kernel.org> wrote:
> > >
> > > I wonder if the kernel should do the same, or whether there are still cases
> > > where memcpy() isn't compiled optimally.  armv6/7 used to be one such case, but
> > > it was fixed in gcc 6.
> >
> > It would have to be memmove(), not memcpy() in this case, right?
> 
> No, it would simply be something like
> 
>   #define __get_unaligned_t(type, ptr) \
>         ({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; })

You still need something to ensure that gcc can't assume that
'ptr' has an aligned type.
If there is an 'int *ptr' visible in the call chain no amount
of (void *) casts will make gcc forget the alignment.
So the memcpy() will get converted to an aligned load-store pair.
(This has always caused grief on sparc.)

A cast though (long) might be enough, as might a cast to a __packed
struct pointer type.
Using a union of the two pointer types might be ok - but might
generate a store/load to stack.
An alternative is an asm statement with input and output of different
pointer types but using the same register for both.
That ought to force the compile to forget any tracked type
and value.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
  2021-05-18 20:51               ` Arnd Bergmann
@ 2021-05-18 21:31                 ` Eric Biggers
  -1 siblings, 0 replies; 96+ messages in thread
From: Eric Biggers @ 2021-05-18 21:31 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linus Torvalds, Jason A. Donenfeld, linux-arch, Vineet Gupta,
	Russell King, Herbert Xu, David S. Miller, Thomas Bogendoerfer,
	Linux ARM, Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER, Nobuhiro Iwamatsu

On Tue, May 18, 2021 at 10:51:23PM +0200, Arnd Bergmann wrote:
> 
> > zstd looks very similar to lz4.
> 
> > End result: at a minimum, I'd suggest using
> > "-fno-tree-loop-vectorize", although somebody should check that NEON
> > case.
> 
> > And I still think that using O3 for anything halfway complicated
> > should be considered odd and need some strong numbers to enable.
> 
> Agreed. I think there is a fairly strong case for just using -O2 on lz4
> and backport that to stable.
> Searching for lz4 bugs with -O3 also finds several reports including
> one that I sent myself:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702
> 
> I see that user space zstd is built with -O3 in Debian, but it the changelog
> also lists "Improved : better speed on clang and gcc -O2, thanks to Eric
> Biggers", so maybe Eric has some useful ideas on whether we should
> just use -O2 for the in-kernel version.
> 

In my opinion, -O2 is a good default even for compression code.  I generally
don't see any benefit from -O3 in compression code I've written.

That being said, -O2 is what I usually use during development.  Other people
could write code that relies on -O3 to be optimized well.

The Makefiles for lz4 and zstd use -O3 by default, which is a little concerning.
I do expect that they're still well-written enough to do well with -O2 too, but
it would require doing benchmarks to tell for sure.  (As Arnd noted, it happens
that I did do such benchmarks on zstd about 5 years ago, and I found an issue
where some functions weren't marked inline when they should be, causing them to
be inlined at -O3 but not at -O2.  That got fixed.)

- Eric

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
@ 2021-05-18 21:31                 ` Eric Biggers
  0 siblings, 0 replies; 96+ messages in thread
From: Eric Biggers @ 2021-05-18 21:31 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linus Torvalds, Jason A. Donenfeld, linux-arch, Vineet Gupta,
	Russell King, Herbert Xu, David S. Miller, Thomas Bogendoerfer,
	Linux ARM, Linux Kernel Mailing List,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	open list:BROADCOM NVRAM DRIVER, Nobuhiro Iwamatsu

On Tue, May 18, 2021 at 10:51:23PM +0200, Arnd Bergmann wrote:
> 
> > zstd looks very similar to lz4.
> 
> > End result: at a minimum, I'd suggest using
> > "-fno-tree-loop-vectorize", although somebody should check that NEON
> > case.
> 
> > And I still think that using O3 for anything halfway complicated
> > should be considered odd and need some strong numbers to enable.
> 
> Agreed. I think there is a fairly strong case for just using -O2 on lz4
> and backport that to stable.
> Searching for lz4 bugs with -O3 also finds several reports including
> one that I sent myself:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702
> 
> I see that user space zstd is built with -O3 in Debian, but it the changelog
> also lists "Improved : better speed on clang and gcc -O2, thanks to Eric
> Biggers", so maybe Eric has some useful ideas on whether we should
> just use -O2 for the in-kernel version.
> 

In my opinion, -O2 is a good default even for compression code.  I generally
don't see any benefit from -O3 in compression code I've written.

That being said, -O2 is what I usually use during development.  Other people
could write code that relies on -O3 to be optimized well.

The Makefiles for lz4 and zstd use -O3 by default, which is a little concerning.
I do expect that they're still well-written enough to do well with -O2 too, but
it would require doing benchmarks to tell for sure.  (As Arnd noted, it happens
that I did do such benchmarks on zstd about 5 years ago, and I found an issue
where some functions weren't marked inline when they should be, causing them to
be inlined at -O3 but not at -O2.  That got fixed.)

- Eric

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
  2021-05-14 10:00 ` Arnd Bergmann
  (?)
@ 2021-12-16 17:29   ` Ard Biesheuvel
  -1 siblings, 0 replies; 96+ messages in thread
From: Ard Biesheuvel @ 2021-12-16 17:29 UTC (permalink / raw)
  To: Arnd Bergmann, Jason A. Donenfeld, johannes, Kees Cook, Nick Desaulniers
  Cc: linux-arch, Linus Torvalds, Vineet Gupta, Arnd Bergmann,
	Amitkumar Karwar, Benjamin Herrenschmidt, Borislav Petkov,
	Eric Dumazet, Florian Fainelli, Ganapathi Bhat,
	Geert Uytterhoeven, H. Peter Anvin, Ingo Molnar, Jakub Kicinski,
	James Morris, Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato, X86 ML,
	Linux Kernel Mailing List, Linux ARM, linux-m68k,
	Linux Crypto Mailing List, openrisc,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	linux-sh, open list:SPARC + UltraSPARC (sparc/sparc64),
	linux-ntfs-dev, linux-block, linux-wireless,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	linux-security-module

Hi Arnd,

(replying to an old thread as this came up in the discussion regarding
misaligned loads and stored in siphash() when compiled for ARM
[f7e5b9bfa6c8820407b64eabc1f29c9a87e8993d])

On Fri, 14 May 2021 at 12:02, Arnd Bergmann <arnd@kernel.org> wrote:
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> The get_unaligned()/put_unaligned() helpers are traditionally architecture
> specific, with the two main variants being the "access-ok.h" version
> that assumes unaligned pointer accesses always work on a particular
> architecture, and the "le-struct.h" version that casts the data to a
> byte aligned type before dereferencing, for architectures that cannot
> always do unaligned accesses in hardware.
>
> Based on the discussion linked below, it appears that the access-ok
> version is not realiable on any architecture, but the struct version
> probably has no downsides. This series changes the code to use the
> same implementation on all architectures, addressing the few exceptions
> separately.
>
> I've included this version in the asm-generic tree for 5.14 already,
> addressing the few issues that were pointed out in the RFC. If there
> are any remaining problems, I hope those can be addressed as follow-up
> patches.
>

I think this series is a huge improvement, but it does not solve the
UB problem completely. As we found, there are open issues in the GCC
bugzilla regarding assumptions in the compiler that aligned quantities
either overlap entirely or not at all. (e.g.,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363)

CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in many places to
conditionally emit code that violates C alignment rules. E.g., there
is this example in Documentation/core-api/unaligned-memory-access.rst:

bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
{
#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
             ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
  return fold == 0;
#else
...

(which now deviates from its actual implementation, but the point is
the same) where CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in the
wrong way (IMHO).

The pattern seems to be

#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  // ignore alignment rules, just cast to a more aligned pointer type
#else
  // use unaligned accessors, which could be either cheap or expensive,
  // depending on the architecture
#endif

whereas the following pattern makes more sense, I think, and does not
violate any C rules in the common case:

#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  // use unaligned accessors, which are cheap or even entirely free
#else
  // avoid unaligned accessors, as they are expensive; instead, reorganize
  // the data so we don't need them (similar to setting NET_IP_ALIGN to 2)
#endif

The only remaining problem here is reinterpreting a char* pointer to a
u32*, e.g., for accessing the IP address in an Ethernet frame when
NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
as I understand it.

In the 32-bit ARM case (v6+) [which is admittedly an outlier] this
makes a substantial difference, as ARMv6 does have efficient unaligned
accessors (load/store word or halfword may be used on misaligned
addresses) but requires that load/store double-word and load/store
multiple are only used on 32-bit aligned addresses. GCC does the right
thing with the unaligned accessors, but blindly casting away
misalignment may result in alignment traps if the compiler happened to
emit load-double or load-multiple instructions for the memory access
in question.

Jason already verifed that in the siphash() case, the aligned and
unaligned versions of the code actually compile to the same machine
code on x86, as the unaligned accessors just disappear. I suspect this
to be the case for many instances where
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is being used, mostly in the
networking stack.

So I intend to dig a bit deeper into this, and perhaps propose some
changes where the interpretation of
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is documented more clearly, and
tweaked according to my suggestion above (while ensuring that codegen
does not suffer, of course)

Thoughts, concerns, objections?


--
Ard.




> Link: https://lore.kernel.org/lkml/75d07691-1e4f-741f-9852-38c0b4f520bc@synopsys.com/
> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
> Link: https://lore.kernel.org/lkml/20210507220813.365382-14-arnd@kernel.org/
> Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2
>
>
> Arnd Bergmann (13):
>   asm-generic: use asm-generic/unaligned.h for most architectures
>   openrisc: always use unaligned-struct header
>   sh: remove unaligned access for sh4a
>   m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>   powerpc: use linux/unaligned/le_struct.h on LE power7
>   asm-generic: unaligned: remove byteshift helpers
>   asm-generic: unaligned always use struct helpers
>   partitions: msdos: fix one-byte get_unaligned()
>   apparmor: use get_unaligned() only for multi-byte words
>   mwifiex: re-fix for unaligned accesses
>   netpoll: avoid put_unaligned() on single character
>   asm-generic: uaccess: 1-byte access is always aligned
>   asm-generic: simplify asm/unaligned.h
>
>  arch/alpha/include/asm/unaligned.h          |  12 --
>  arch/arm/include/asm/unaligned.h            |  27 ---
>  arch/ia64/include/asm/unaligned.h           |  12 --
>  arch/m68k/Kconfig                           |   1 +
>  arch/m68k/include/asm/unaligned.h           |  26 ---
>  arch/microblaze/include/asm/unaligned.h     |  27 ---
>  arch/mips/crypto/crc32-mips.c               |   2 +-
>  arch/openrisc/include/asm/unaligned.h       |  47 -----
>  arch/parisc/include/asm/unaligned.h         |   6 +-
>  arch/powerpc/include/asm/unaligned.h        |  22 ---
>  arch/sh/include/asm/unaligned-sh4a.h        | 199 --------------------
>  arch/sh/include/asm/unaligned.h             |  13 --
>  arch/sparc/include/asm/unaligned.h          |  11 --
>  arch/x86/include/asm/unaligned.h            |  15 --
>  arch/xtensa/include/asm/unaligned.h         |  29 ---
>  block/partitions/ldm.h                      |   2 +-
>  block/partitions/msdos.c                    |   2 +-
>  drivers/net/wireless/marvell/mwifiex/pcie.c |  10 +-
>  include/asm-generic/uaccess.h               |   4 +-
>  include/asm-generic/unaligned.h             | 141 +++++++++++---
>  include/linux/unaligned/access_ok.h         |  68 -------
>  include/linux/unaligned/be_byteshift.h      |  71 -------
>  include/linux/unaligned/be_memmove.h        |  37 ----
>  include/linux/unaligned/be_struct.h         |  37 ----
>  include/linux/unaligned/generic.h           | 115 -----------
>  include/linux/unaligned/le_byteshift.h      |  71 -------
>  include/linux/unaligned/le_memmove.h        |  37 ----
>  include/linux/unaligned/le_struct.h         |  37 ----
>  include/linux/unaligned/memmove.h           |  46 -----
>  net/core/netpoll.c                          |   4 +-
>  security/apparmor/policy_unpack.c           |   2 +-
>  31 files changed, 131 insertions(+), 1002 deletions(-)
>  delete mode 100644 arch/alpha/include/asm/unaligned.h
>  delete mode 100644 arch/arm/include/asm/unaligned.h
>  delete mode 100644 arch/ia64/include/asm/unaligned.h
>  delete mode 100644 arch/m68k/include/asm/unaligned.h
>  delete mode 100644 arch/microblaze/include/asm/unaligned.h
>  delete mode 100644 arch/openrisc/include/asm/unaligned.h
>  delete mode 100644 arch/powerpc/include/asm/unaligned.h
>  delete mode 100644 arch/sh/include/asm/unaligned-sh4a.h
>  delete mode 100644 arch/sh/include/asm/unaligned.h
>  delete mode 100644 arch/sparc/include/asm/unaligned.h
>  delete mode 100644 arch/x86/include/asm/unaligned.h
>  delete mode 100644 arch/xtensa/include/asm/unaligned.h
>  delete mode 100644 include/linux/unaligned/access_ok.h
>  delete mode 100644 include/linux/unaligned/be_byteshift.h
>  delete mode 100644 include/linux/unaligned/be_memmove.h
>  delete mode 100644 include/linux/unaligned/be_struct.h
>  delete mode 100644 include/linux/unaligned/generic.h
>  delete mode 100644 include/linux/unaligned/le_byteshift.h
>  delete mode 100644 include/linux/unaligned/le_memmove.h
>  delete mode 100644 include/linux/unaligned/le_struct.h
>  delete mode 100644 include/linux/unaligned/memmove.h
>
> --
> 2.29.2
>
> Cc: Amitkumar Karwar <amitkarwar@gmail.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: Ganapathi Bhat <ganapathi017@gmail.com>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: James Morris <jmorris@namei.org>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: John Johansen <john.johansen@canonical.com>
> Cc: Jonas Bonn <jonas@southpole.se>
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Rich Felker <dalias@libc.org>
> Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Sharvari Harisangam <sharvari.harisangam@nxp.com>
> Cc: Stafford Horne <shorne@gmail.com>
> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
> Cc: Xinming Hu <huxinming820@gmail.com>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-m68k@lists.linux-m68k.org
> Cc: linux-crypto@vger.kernel.org
> Cc: openrisc@lists.librecores.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-sh@vger.kernel.org
> Cc: sparclinux@vger.kernel.org
> Cc: linux-ntfs-dev@lists.sourceforge.net
> Cc: linux-block@vger.kernel.org
> Cc: linux-wireless@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Cc: linux-security-module@vger.kernel.org
>
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-12-16 17:29   ` Ard Biesheuvel
  0 siblings, 0 replies; 96+ messages in thread
From: Ard Biesheuvel @ 2021-12-16 17:29 UTC (permalink / raw)
  To: Arnd Bergmann, Jason A. Donenfeld, johannes, Kees Cook, Nick Desaulniers
  Cc: Rich Felker, linux-sh, Amitkumar Karwar, Russell King,
	Eric Dumazet, Paul Mackerras, H. Peter Anvin,
	open list:SPARC + UltraSPARC (sparc/sparc64),
	Thomas Gleixner, linux-arch, Florian Fainelli, Yoshinori Sato,
	X86 ML, James Morris, Ingo Molnar, Geert Uytterhoeven, Linux ARM,
	Richard Russon (FlatCap),
	Jakub Kicinski, Serge E. Hallyn, Jonas Bonn, Arnd Bergmann,
	Ganapathi Bhat, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Stefan Kristiansson, linux-block, linux-m68k, openrisc,
	Borislav Petkov, Stafford Horne, Kalle Valo, Jens Axboe,
	John Johansen, Xinming Hu, Vineet Gupta, linux-wireless,
	Linux Kernel Mailing List, Vladimir Oltean, linux-ntfs-dev,
	linux-security-module, Linux Crypto Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Linus Torvalds, Sharvari Harisangam

Hi Arnd,

(replying to an old thread as this came up in the discussion regarding
misaligned loads and stored in siphash() when compiled for ARM
[f7e5b9bfa6c8820407b64eabc1f29c9a87e8993d])

On Fri, 14 May 2021 at 12:02, Arnd Bergmann <arnd@kernel.org> wrote:
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> The get_unaligned()/put_unaligned() helpers are traditionally architecture
> specific, with the two main variants being the "access-ok.h" version
> that assumes unaligned pointer accesses always work on a particular
> architecture, and the "le-struct.h" version that casts the data to a
> byte aligned type before dereferencing, for architectures that cannot
> always do unaligned accesses in hardware.
>
> Based on the discussion linked below, it appears that the access-ok
> version is not realiable on any architecture, but the struct version
> probably has no downsides. This series changes the code to use the
> same implementation on all architectures, addressing the few exceptions
> separately.
>
> I've included this version in the asm-generic tree for 5.14 already,
> addressing the few issues that were pointed out in the RFC. If there
> are any remaining problems, I hope those can be addressed as follow-up
> patches.
>

I think this series is a huge improvement, but it does not solve the
UB problem completely. As we found, there are open issues in the GCC
bugzilla regarding assumptions in the compiler that aligned quantities
either overlap entirely or not at all. (e.g.,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363)

CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in many places to
conditionally emit code that violates C alignment rules. E.g., there
is this example in Documentation/core-api/unaligned-memory-access.rst:

bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
{
#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
             ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
  return fold == 0;
#else
...

(which now deviates from its actual implementation, but the point is
the same) where CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in the
wrong way (IMHO).

The pattern seems to be

#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  // ignore alignment rules, just cast to a more aligned pointer type
#else
  // use unaligned accessors, which could be either cheap or expensive,
  // depending on the architecture
#endif

whereas the following pattern makes more sense, I think, and does not
violate any C rules in the common case:

#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  // use unaligned accessors, which are cheap or even entirely free
#else
  // avoid unaligned accessors, as they are expensive; instead, reorganize
  // the data so we don't need them (similar to setting NET_IP_ALIGN to 2)
#endif

The only remaining problem here is reinterpreting a char* pointer to a
u32*, e.g., for accessing the IP address in an Ethernet frame when
NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
as I understand it.

In the 32-bit ARM case (v6+) [which is admittedly an outlier] this
makes a substantial difference, as ARMv6 does have efficient unaligned
accessors (load/store word or halfword may be used on misaligned
addresses) but requires that load/store double-word and load/store
multiple are only used on 32-bit aligned addresses. GCC does the right
thing with the unaligned accessors, but blindly casting away
misalignment may result in alignment traps if the compiler happened to
emit load-double or load-multiple instructions for the memory access
in question.

Jason already verifed that in the siphash() case, the aligned and
unaligned versions of the code actually compile to the same machine
code on x86, as the unaligned accessors just disappear. I suspect this
to be the case for many instances where
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is being used, mostly in the
networking stack.

So I intend to dig a bit deeper into this, and perhaps propose some
changes where the interpretation of
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is documented more clearly, and
tweaked according to my suggestion above (while ensuring that codegen
does not suffer, of course)

Thoughts, concerns, objections?


--
Ard.




> Link: https://lore.kernel.org/lkml/75d07691-1e4f-741f-9852-38c0b4f520bc@synopsys.com/
> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
> Link: https://lore.kernel.org/lkml/20210507220813.365382-14-arnd@kernel.org/
> Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2
>
>
> Arnd Bergmann (13):
>   asm-generic: use asm-generic/unaligned.h for most architectures
>   openrisc: always use unaligned-struct header
>   sh: remove unaligned access for sh4a
>   m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>   powerpc: use linux/unaligned/le_struct.h on LE power7
>   asm-generic: unaligned: remove byteshift helpers
>   asm-generic: unaligned always use struct helpers
>   partitions: msdos: fix one-byte get_unaligned()
>   apparmor: use get_unaligned() only for multi-byte words
>   mwifiex: re-fix for unaligned accesses
>   netpoll: avoid put_unaligned() on single character
>   asm-generic: uaccess: 1-byte access is always aligned
>   asm-generic: simplify asm/unaligned.h
>
>  arch/alpha/include/asm/unaligned.h          |  12 --
>  arch/arm/include/asm/unaligned.h            |  27 ---
>  arch/ia64/include/asm/unaligned.h           |  12 --
>  arch/m68k/Kconfig                           |   1 +
>  arch/m68k/include/asm/unaligned.h           |  26 ---
>  arch/microblaze/include/asm/unaligned.h     |  27 ---
>  arch/mips/crypto/crc32-mips.c               |   2 +-
>  arch/openrisc/include/asm/unaligned.h       |  47 -----
>  arch/parisc/include/asm/unaligned.h         |   6 +-
>  arch/powerpc/include/asm/unaligned.h        |  22 ---
>  arch/sh/include/asm/unaligned-sh4a.h        | 199 --------------------
>  arch/sh/include/asm/unaligned.h             |  13 --
>  arch/sparc/include/asm/unaligned.h          |  11 --
>  arch/x86/include/asm/unaligned.h            |  15 --
>  arch/xtensa/include/asm/unaligned.h         |  29 ---
>  block/partitions/ldm.h                      |   2 +-
>  block/partitions/msdos.c                    |   2 +-
>  drivers/net/wireless/marvell/mwifiex/pcie.c |  10 +-
>  include/asm-generic/uaccess.h               |   4 +-
>  include/asm-generic/unaligned.h             | 141 +++++++++++---
>  include/linux/unaligned/access_ok.h         |  68 -------
>  include/linux/unaligned/be_byteshift.h      |  71 -------
>  include/linux/unaligned/be_memmove.h        |  37 ----
>  include/linux/unaligned/be_struct.h         |  37 ----
>  include/linux/unaligned/generic.h           | 115 -----------
>  include/linux/unaligned/le_byteshift.h      |  71 -------
>  include/linux/unaligned/le_memmove.h        |  37 ----
>  include/linux/unaligned/le_struct.h         |  37 ----
>  include/linux/unaligned/memmove.h           |  46 -----
>  net/core/netpoll.c                          |   4 +-
>  security/apparmor/policy_unpack.c           |   2 +-
>  31 files changed, 131 insertions(+), 1002 deletions(-)
>  delete mode 100644 arch/alpha/include/asm/unaligned.h
>  delete mode 100644 arch/arm/include/asm/unaligned.h
>  delete mode 100644 arch/ia64/include/asm/unaligned.h
>  delete mode 100644 arch/m68k/include/asm/unaligned.h
>  delete mode 100644 arch/microblaze/include/asm/unaligned.h
>  delete mode 100644 arch/openrisc/include/asm/unaligned.h
>  delete mode 100644 arch/powerpc/include/asm/unaligned.h
>  delete mode 100644 arch/sh/include/asm/unaligned-sh4a.h
>  delete mode 100644 arch/sh/include/asm/unaligned.h
>  delete mode 100644 arch/sparc/include/asm/unaligned.h
>  delete mode 100644 arch/x86/include/asm/unaligned.h
>  delete mode 100644 arch/xtensa/include/asm/unaligned.h
>  delete mode 100644 include/linux/unaligned/access_ok.h
>  delete mode 100644 include/linux/unaligned/be_byteshift.h
>  delete mode 100644 include/linux/unaligned/be_memmove.h
>  delete mode 100644 include/linux/unaligned/be_struct.h
>  delete mode 100644 include/linux/unaligned/generic.h
>  delete mode 100644 include/linux/unaligned/le_byteshift.h
>  delete mode 100644 include/linux/unaligned/le_memmove.h
>  delete mode 100644 include/linux/unaligned/le_struct.h
>  delete mode 100644 include/linux/unaligned/memmove.h
>
> --
> 2.29.2
>
> Cc: Amitkumar Karwar <amitkarwar@gmail.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: Ganapathi Bhat <ganapathi017@gmail.com>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: James Morris <jmorris@namei.org>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: John Johansen <john.johansen@canonical.com>
> Cc: Jonas Bonn <jonas@southpole.se>
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Rich Felker <dalias@libc.org>
> Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Sharvari Harisangam <sharvari.harisangam@nxp.com>
> Cc: Stafford Horne <shorne@gmail.com>
> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
> Cc: Xinming Hu <huxinming820@gmail.com>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-m68k@lists.linux-m68k.org
> Cc: linux-crypto@vger.kernel.org
> Cc: openrisc@lists.librecores.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-sh@vger.kernel.org
> Cc: sparclinux@vger.kernel.org
> Cc: linux-ntfs-dev@lists.sourceforge.net
> Cc: linux-block@vger.kernel.org
> Cc: linux-wireless@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Cc: linux-security-module@vger.kernel.org
>
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-12-16 17:29   ` Ard Biesheuvel
  0 siblings, 0 replies; 96+ messages in thread
From: Ard Biesheuvel @ 2021-12-16 17:29 UTC (permalink / raw)
  To: openrisc

Hi Arnd,

(replying to an old thread as this came up in the discussion regarding
misaligned loads and stored in siphash() when compiled for ARM
[f7e5b9bfa6c8820407b64eabc1f29c9a87e8993d])

On Fri, 14 May 2021 at 12:02, Arnd Bergmann <arnd@kernel.org> wrote:
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> The get_unaligned()/put_unaligned() helpers are traditionally architecture
> specific, with the two main variants being the "access-ok.h" version
> that assumes unaligned pointer accesses always work on a particular
> architecture, and the "le-struct.h" version that casts the data to a
> byte aligned type before dereferencing, for architectures that cannot
> always do unaligned accesses in hardware.
>
> Based on the discussion linked below, it appears that the access-ok
> version is not realiable on any architecture, but the struct version
> probably has no downsides. This series changes the code to use the
> same implementation on all architectures, addressing the few exceptions
> separately.
>
> I've included this version in the asm-generic tree for 5.14 already,
> addressing the few issues that were pointed out in the RFC. If there
> are any remaining problems, I hope those can be addressed as follow-up
> patches.
>

I think this series is a huge improvement, but it does not solve the
UB problem completely. As we found, there are open issues in the GCC
bugzilla regarding assumptions in the compiler that aligned quantities
either overlap entirely or not at all. (e.g.,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363)

CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in many places to
conditionally emit code that violates C alignment rules. E.g., there
is this example in Documentation/core-api/unaligned-memory-access.rst:

bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
{
#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
             ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
  return fold == 0;
#else
...

(which now deviates from its actual implementation, but the point is
the same) where CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in the
wrong way (IMHO).

The pattern seems to be

#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  // ignore alignment rules, just cast to a more aligned pointer type
#else
  // use unaligned accessors, which could be either cheap or expensive,
  // depending on the architecture
#endif

whereas the following pattern makes more sense, I think, and does not
violate any C rules in the common case:

#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  // use unaligned accessors, which are cheap or even entirely free
#else
  // avoid unaligned accessors, as they are expensive; instead, reorganize
  // the data so we don't need them (similar to setting NET_IP_ALIGN to 2)
#endif

The only remaining problem here is reinterpreting a char* pointer to a
u32*, e.g., for accessing the IP address in an Ethernet frame when
NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
as I understand it.

In the 32-bit ARM case (v6+) [which is admittedly an outlier] this
makes a substantial difference, as ARMv6 does have efficient unaligned
accessors (load/store word or halfword may be used on misaligned
addresses) but requires that load/store double-word and load/store
multiple are only used on 32-bit aligned addresses. GCC does the right
thing with the unaligned accessors, but blindly casting away
misalignment may result in alignment traps if the compiler happened to
emit load-double or load-multiple instructions for the memory access
in question.

Jason already verifed that in the siphash() case, the aligned and
unaligned versions of the code actually compile to the same machine
code on x86, as the unaligned accessors just disappear. I suspect this
to be the case for many instances where
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is being used, mostly in the
networking stack.

So I intend to dig a bit deeper into this, and perhaps propose some
changes where the interpretation of
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is documented more clearly, and
tweaked according to my suggestion above (while ensuring that codegen
does not suffer, of course)

Thoughts, concerns, objections?


--
Ard.




> Link: https://lore.kernel.org/lkml/75d07691-1e4f-741f-9852-38c0b4f520bc at synopsys.com/
> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
> Link: https://lore.kernel.org/lkml/20210507220813.365382-14-arnd at kernel.org/
> Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2
>
>
> Arnd Bergmann (13):
>   asm-generic: use asm-generic/unaligned.h for most architectures
>   openrisc: always use unaligned-struct header
>   sh: remove unaligned access for sh4a
>   m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>   powerpc: use linux/unaligned/le_struct.h on LE power7
>   asm-generic: unaligned: remove byteshift helpers
>   asm-generic: unaligned always use struct helpers
>   partitions: msdos: fix one-byte get_unaligned()
>   apparmor: use get_unaligned() only for multi-byte words
>   mwifiex: re-fix for unaligned accesses
>   netpoll: avoid put_unaligned() on single character
>   asm-generic: uaccess: 1-byte access is always aligned
>   asm-generic: simplify asm/unaligned.h
>
>  arch/alpha/include/asm/unaligned.h          |  12 --
>  arch/arm/include/asm/unaligned.h            |  27 ---
>  arch/ia64/include/asm/unaligned.h           |  12 --
>  arch/m68k/Kconfig                           |   1 +
>  arch/m68k/include/asm/unaligned.h           |  26 ---
>  arch/microblaze/include/asm/unaligned.h     |  27 ---
>  arch/mips/crypto/crc32-mips.c               |   2 +-
>  arch/openrisc/include/asm/unaligned.h       |  47 -----
>  arch/parisc/include/asm/unaligned.h         |   6 +-
>  arch/powerpc/include/asm/unaligned.h        |  22 ---
>  arch/sh/include/asm/unaligned-sh4a.h        | 199 --------------------
>  arch/sh/include/asm/unaligned.h             |  13 --
>  arch/sparc/include/asm/unaligned.h          |  11 --
>  arch/x86/include/asm/unaligned.h            |  15 --
>  arch/xtensa/include/asm/unaligned.h         |  29 ---
>  block/partitions/ldm.h                      |   2 +-
>  block/partitions/msdos.c                    |   2 +-
>  drivers/net/wireless/marvell/mwifiex/pcie.c |  10 +-
>  include/asm-generic/uaccess.h               |   4 +-
>  include/asm-generic/unaligned.h             | 141 +++++++++++---
>  include/linux/unaligned/access_ok.h         |  68 -------
>  include/linux/unaligned/be_byteshift.h      |  71 -------
>  include/linux/unaligned/be_memmove.h        |  37 ----
>  include/linux/unaligned/be_struct.h         |  37 ----
>  include/linux/unaligned/generic.h           | 115 -----------
>  include/linux/unaligned/le_byteshift.h      |  71 -------
>  include/linux/unaligned/le_memmove.h        |  37 ----
>  include/linux/unaligned/le_struct.h         |  37 ----
>  include/linux/unaligned/memmove.h           |  46 -----
>  net/core/netpoll.c                          |   4 +-
>  security/apparmor/policy_unpack.c           |   2 +-
>  31 files changed, 131 insertions(+), 1002 deletions(-)
>  delete mode 100644 arch/alpha/include/asm/unaligned.h
>  delete mode 100644 arch/arm/include/asm/unaligned.h
>  delete mode 100644 arch/ia64/include/asm/unaligned.h
>  delete mode 100644 arch/m68k/include/asm/unaligned.h
>  delete mode 100644 arch/microblaze/include/asm/unaligned.h
>  delete mode 100644 arch/openrisc/include/asm/unaligned.h
>  delete mode 100644 arch/powerpc/include/asm/unaligned.h
>  delete mode 100644 arch/sh/include/asm/unaligned-sh4a.h
>  delete mode 100644 arch/sh/include/asm/unaligned.h
>  delete mode 100644 arch/sparc/include/asm/unaligned.h
>  delete mode 100644 arch/x86/include/asm/unaligned.h
>  delete mode 100644 arch/xtensa/include/asm/unaligned.h
>  delete mode 100644 include/linux/unaligned/access_ok.h
>  delete mode 100644 include/linux/unaligned/be_byteshift.h
>  delete mode 100644 include/linux/unaligned/be_memmove.h
>  delete mode 100644 include/linux/unaligned/be_struct.h
>  delete mode 100644 include/linux/unaligned/generic.h
>  delete mode 100644 include/linux/unaligned/le_byteshift.h
>  delete mode 100644 include/linux/unaligned/le_memmove.h
>  delete mode 100644 include/linux/unaligned/le_struct.h
>  delete mode 100644 include/linux/unaligned/memmove.h
>
> --
> 2.29.2
>
> Cc: Amitkumar Karwar <amitkarwar@gmail.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: Ganapathi Bhat <ganapathi017@gmail.com>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: James Morris <jmorris@namei.org>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: John Johansen <john.johansen@canonical.com>
> Cc: Jonas Bonn <jonas@southpole.se>
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Rich Felker <dalias@libc.org>
> Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Sharvari Harisangam <sharvari.harisangam@nxp.com>
> Cc: Stafford Horne <shorne@gmail.com>
> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
> Cc: Xinming Hu <huxinming820@gmail.com>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: x86 at kernel.org
> Cc: linux-kernel at vger.kernel.org
> Cc: linux-arm-kernel at lists.infradead.org
> Cc: linux-m68k at lists.linux-m68k.org
> Cc: linux-crypto at vger.kernel.org
> Cc: openrisc at lists.librecores.org
> Cc: linuxppc-dev at lists.ozlabs.org
> Cc: linux-sh at vger.kernel.org
> Cc: sparclinux at vger.kernel.org
> Cc: linux-ntfs-dev at lists.sourceforge.net
> Cc: linux-block at vger.kernel.org
> Cc: linux-wireless at vger.kernel.org
> Cc: netdev at vger.kernel.org
> Cc: linux-arch at vger.kernel.org
> Cc: linux-security-module at vger.kernel.org
>
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
  2021-12-16 17:29   ` Ard Biesheuvel
  (?)
@ 2021-12-16 17:42     ` Linus Torvalds
  -1 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-12-16 17:42 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Arnd Bergmann, Jason A. Donenfeld, Johannes Berg, Kees Cook,
	Nick Desaulniers, linux-arch, Vineet Gupta, Arnd Bergmann,
	Amitkumar Karwar, Benjamin Herrenschmidt, Borislav Petkov,
	Eric Dumazet, Florian Fainelli, Ganapathi Bhat,
	Geert Uytterhoeven, H. Peter Anvin, Ingo Molnar, Jakub Kicinski,
	James Morris, Jens Axboe, John Johansen, Jonas Bonn, Kalle Valo,
	Michael Ellerman, Paul Mackerras, Rich Felker,
	Richard Russon (FlatCap),
	Russell King, Serge E. Hallyn, Sharvari Harisangam,
	Stafford Horne, Stefan Kristiansson, Thomas Gleixner,
	Vladimir Oltean, Xinming Hu, Yoshinori Sato, X86 ML,
	Linux Kernel Mailing List, Linux ARM, linux-m68k,
	Linux Crypto Mailing List, openrisc,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Linux-sh list, open list:SPARC + UltraSPARC (sparc/sparc64),
	linux-ntfs-dev, linux-block, linux-wireless,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	LSM List

On Thu, Dec 16, 2021 at 9:29 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in many places to
> conditionally emit code that violates C alignment rules. E.g., there
> is this example in Documentation/core-api/unaligned-memory-access.rst:
>
> bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
> {
> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>   u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
>              ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
>   return fold == 0;
> #else

It probably works fine in practice - the one case we had was really
pretty special, and about the vectorizer doing odd things.

But I think we should strive to convert these to use
"get_unaligned()", since code generation is fine. It still often makes
sense to have that test for the config variable, simply because the
approach might be different if we know unaligned accesses are slow.

So I'll happily take patches that do obvious conversions to
get_unaligned() where they make sense, but I don't think we should
consider this some huge hard requirement.

                 Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-12-16 17:42     ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-12-16 17:42 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-wireless, Jason A. Donenfeld, Rich Felker, Linux-sh list,
	Amitkumar Karwar, Russell King, Eric Dumazet, Paul Mackerras,
	H. Peter Anvin, open list:SPARC + UltraSPARC (sparc/sparc64),
	Thomas Gleixner, linux-arch, Florian Fainelli, Yoshinori Sato,
	X86 ML, James Morris, Ingo Molnar, Geert Uytterhoeven, Linux ARM,
	Richard Russon (FlatCap),
	Jakub Kicinski, Serge E. Hallyn, Jonas Bonn, Kees Cook,
	Arnd Bergmann, Ganapathi Bhat, Stefan Kristiansson, linux-block,
	linux-m68k, openrisc, Borislav Petkov, Stafford Horne,
	Kalle Valo, Jens Axboe, Arnd Bergmann, John Johansen, Xinming Hu,
	Vineet Gupta, Nick Desaulniers, Linux Kernel Mailing List,
	Vladimir Oltean, linux-ntfs-dev, LSM List,
	Linux Crypto Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Johannes Berg, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Sharvari Harisangam

On Thu, Dec 16, 2021 at 9:29 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in many places to
> conditionally emit code that violates C alignment rules. E.g., there
> is this example in Documentation/core-api/unaligned-memory-access.rst:
>
> bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
> {
> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>   u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
>              ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
>   return fold == 0;
> #else

It probably works fine in practice - the one case we had was really
pretty special, and about the vectorizer doing odd things.

But I think we should strive to convert these to use
"get_unaligned()", since code generation is fine. It still often makes
sense to have that test for the config variable, simply because the
approach might be different if we know unaligned accesses are slow.

So I'll happily take patches that do obvious conversions to
get_unaligned() where they make sense, but I don't think we should
consider this some huge hard requirement.

                 Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-12-16 17:42     ` Linus Torvalds
  0 siblings, 0 replies; 96+ messages in thread
From: Linus Torvalds @ 2021-12-16 17:42 UTC (permalink / raw)
  To: openrisc

On Thu, Dec 16, 2021 at 9:29 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in many places to
> conditionally emit code that violates C alignment rules. E.g., there
> is this example in Documentation/core-api/unaligned-memory-access.rst:
>
> bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
> {
> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>   u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
>              ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
>   return fold == 0;
> #else

It probably works fine in practice - the one case we had was really
pretty special, and about the vectorizer doing odd things.

But I think we should strive to convert these to use
"get_unaligned()", since code generation is fine. It still often makes
sense to have that test for the config variable, simply because the
approach might be different if we know unaligned accesses are slow.

So I'll happily take patches that do obvious conversions to
get_unaligned() where they make sense, but I don't think we should
consider this some huge hard requirement.

                 Linus

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
  2021-12-16 17:29   ` Ard Biesheuvel
  (?)
@ 2021-12-16 17:49     ` David Laight
  -1 siblings, 0 replies; 96+ messages in thread
From: David Laight @ 2021-12-16 17:49 UTC (permalink / raw)
  To: 'Ard Biesheuvel',
	Arnd Bergmann, Jason A. Donenfeld, johannes, Kees Cook,
	Nick Desaulniers
  Cc: Rich Felker, linux-sh, Amitkumar Karwar, Russell King,
	Eric Dumazet, Paul Mackerras, H. Peter Anvin,
	open list:SPARC + UltraSPARC (sparc/sparc64),
	Thomas Gleixner, linux-arch, Florian Fainelli, Yoshinori Sato,
	X86 ML, James Morris, Ingo Molnar, Geert Uytterhoeven, Linux ARM,
	Richard Russon (FlatCap),
	Jakub Kicinski, Serge E. Hallyn, Jonas Bonn, Arnd Bergmann,
	Ganapathi Bhat, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Stefan Kristiansson, linux-block, linux-m68k, openrisc,
	Borislav Petkov, Stafford Horne, Kalle Valo, Jens Axboe,
	John Johansen, Xinming Hu, Vineet Gupta, linux-wireless,
	Linux Kernel Mailing List, Vladimir Oltean, linux-ntfs-dev,
	linux-security-module, Linux Crypto Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Linus Torvalds, Sharvari Harisangam

From: Ard Biesheuvel
> Sent: 16 December 2021 17:30
> 
> Hi Arnd,
> 
> (replying to an old thread as this came up in the discussion regarding
> misaligned loads and stored in siphash() when compiled for ARM
> [f7e5b9bfa6c8820407b64eabc1f29c9a87e8993d])
> 
> On Fri, 14 May 2021 at 12:02, Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > From: Arnd Bergmann <arnd@arndb.de>
> >
> > The get_unaligned()/put_unaligned() helpers are traditionally architecture
> > specific, with the two main variants being the "access-ok.h" version
> > that assumes unaligned pointer accesses always work on a particular
> > architecture, and the "le-struct.h" version that casts the data to a
> > byte aligned type before dereferencing, for architectures that cannot
> > always do unaligned accesses in hardware.

I'm pretty sure the compiler is allowed to 'read through' that cast
and still do an aligned access.
It has always been hard to get the compiler to 'forget' about known/expected
alignment - typically trying to stop memcpy() faulting on sparc.
Real function calls are usually required - but LTO may scupper that.

> >
> > Based on the discussion linked below, it appears that the access-ok
> > version is not realiable on any architecture, but the struct version
> > probably has no downsides. This series changes the code to use the
> > same implementation on all architectures, addressing the few exceptions
> > separately.
> >
> > I've included this version in the asm-generic tree for 5.14 already,
> > addressing the few issues that were pointed out in the RFC. If there
> > are any remaining problems, I hope those can be addressed as follow-up
> > patches.
> >
> 
> I think this series is a huge improvement, but it does not solve the
> UB problem completely. As we found, there are open issues in the GCC
> bugzilla regarding assumptions in the compiler that aligned quantities
> either overlap entirely or not at all. (e.g.,
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363)

I think we can stop the compiler merging unaligned requests by adding a byte-sized
memory barrier for the base address before and after the access.
That should still support complex addressing modes (esp on x86).

Another option is to do the misaligned access from within an asm statement.
While architecture dependant, it only really depends on the syntax of the ld/st
instruction.
The compiler can't merge those because it doesn't know whether the data is
'frobbed' before/after the memory access.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-12-16 17:49     ` David Laight
  0 siblings, 0 replies; 96+ messages in thread
From: David Laight @ 2021-12-16 17:49 UTC (permalink / raw)
  To: 'Ard Biesheuvel',
	Arnd Bergmann, Jason A. Donenfeld, johannes, Kees Cook,
	Nick Desaulniers
  Cc: Rich Felker, linux-sh, Richard Russon (FlatCap),
	Amitkumar Karwar, James Morris, Eric Dumazet, Paul Mackerras,
	H. Peter Anvin, open list:SPARC + UltraSPARC (sparc/sparc64),
	Stafford Horne, linux-arch, Florian Fainelli, Yoshinori Sato,
	X86 ML, Russell King, Linus Torvalds, Ingo Molnar,
	Geert Uytterhoeven, Kalle Valo, Vladimir Oltean, Jakub Kicinski,
	Serge E. Hallyn, Jonas Bonn, Arnd Bergmann, Ganapathi Bhat,
	Stefan Kristiansson, linux-block, linux-m68k, openrisc,
	Borislav Petkov, Thomas Gleixner, Linux ARM, Jens Axboe,
	John Johansen, Xinming Hu, Vineet Gupta, linux-wireless,
	Linux Kernel Mailing List, linux-ntfs-dev, linux-security-module,
	Linux Crypto Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Sharvari Harisangam

From: Ard Biesheuvel
> Sent: 16 December 2021 17:30
> 
> Hi Arnd,
> 
> (replying to an old thread as this came up in the discussion regarding
> misaligned loads and stored in siphash() when compiled for ARM
> [f7e5b9bfa6c8820407b64eabc1f29c9a87e8993d])
> 
> On Fri, 14 May 2021 at 12:02, Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > From: Arnd Bergmann <arnd@arndb.de>
> >
> > The get_unaligned()/put_unaligned() helpers are traditionally architecture
> > specific, with the two main variants being the "access-ok.h" version
> > that assumes unaligned pointer accesses always work on a particular
> > architecture, and the "le-struct.h" version that casts the data to a
> > byte aligned type before dereferencing, for architectures that cannot
> > always do unaligned accesses in hardware.

I'm pretty sure the compiler is allowed to 'read through' that cast
and still do an aligned access.
It has always been hard to get the compiler to 'forget' about known/expected
alignment - typically trying to stop memcpy() faulting on sparc.
Real function calls are usually required - but LTO may scupper that.

> >
> > Based on the discussion linked below, it appears that the access-ok
> > version is not realiable on any architecture, but the struct version
> > probably has no downsides. This series changes the code to use the
> > same implementation on all architectures, addressing the few exceptions
> > separately.
> >
> > I've included this version in the asm-generic tree for 5.14 already,
> > addressing the few issues that were pointed out in the RFC. If there
> > are any remaining problems, I hope those can be addressed as follow-up
> > patches.
> >
> 
> I think this series is a huge improvement, but it does not solve the
> UB problem completely. As we found, there are open issues in the GCC
> bugzilla regarding assumptions in the compiler that aligned quantities
> either overlap entirely or not at all. (e.g.,
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363)

I think we can stop the compiler merging unaligned requests by adding a byte-sized
memory barrier for the base address before and after the access.
That should still support complex addressing modes (esp on x86).

Another option is to do the misaligned access from within an asm statement.
While architecture dependant, it only really depends on the syntax of the ld/st
instruction.
The compiler can't merge those because it doesn't know whether the data is
'frobbed' before/after the memory access.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-12-16 17:49     ` David Laight
  0 siblings, 0 replies; 96+ messages in thread
From: David Laight @ 2021-12-16 17:49 UTC (permalink / raw)
  To: openrisc

From: Ard Biesheuvel
> Sent: 16 December 2021 17:30
> 
> Hi Arnd,
> 
> (replying to an old thread as this came up in the discussion regarding
> misaligned loads and stored in siphash() when compiled for ARM
> [f7e5b9bfa6c8820407b64eabc1f29c9a87e8993d])
> 
> On Fri, 14 May 2021 at 12:02, Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > From: Arnd Bergmann <arnd@arndb.de>
> >
> > The get_unaligned()/put_unaligned() helpers are traditionally architecture
> > specific, with the two main variants being the "access-ok.h" version
> > that assumes unaligned pointer accesses always work on a particular
> > architecture, and the "le-struct.h" version that casts the data to a
> > byte aligned type before dereferencing, for architectures that cannot
> > always do unaligned accesses in hardware.

I'm pretty sure the compiler is allowed to 'read through' that cast
and still do an aligned access.
It has always been hard to get the compiler to 'forget' about known/expected
alignment - typically trying to stop memcpy() faulting on sparc.
Real function calls are usually required - but LTO may scupper that.

> >
> > Based on the discussion linked below, it appears that the access-ok
> > version is not realiable on any architecture, but the struct version
> > probably has no downsides. This series changes the code to use the
> > same implementation on all architectures, addressing the few exceptions
> > separately.
> >
> > I've included this version in the asm-generic tree for 5.14 already,
> > addressing the few issues that were pointed out in the RFC. If there
> > are any remaining problems, I hope those can be addressed as follow-up
> > patches.
> >
> 
> I think this series is a huge improvement, but it does not solve the
> UB problem completely. As we found, there are open issues in the GCC
> bugzilla regarding assumptions in the compiler that aligned quantities
> either overlap entirely or not at all. (e.g.,
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363)

I think we can stop the compiler merging unaligned requests by adding a byte-sized
memory barrier for the base address before and after the access.
That should still support complex addressing modes (esp on x86).

Another option is to do the misaligned access from within an asm statement.
While architecture dependant, it only really depends on the syntax of the ld/st
instruction.
The compiler can't merge those because it doesn't know whether the data is
'frobbed' before/after the memory access.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
  2021-12-16 17:29   ` Ard Biesheuvel
  (?)
@ 2021-12-16 18:56     ` Segher Boessenkool
  -1 siblings, 0 replies; 96+ messages in thread
From: Segher Boessenkool @ 2021-12-16 18:56 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-wireless, Jason A. Donenfeld, Rich Felker, linux-sh,
	Richard Russon (FlatCap),
	X86 ML, Amitkumar Karwar, James Morris, Eric Dumazet,
	Paul Mackerras, linux-m68k, H. Peter Anvin,
	open list:SPARC + UltraSPARC (sparc/sparc64),
	Stafford Horne, linux-arch, Florian Fainelli, Yoshinori Sato,
	Russell King, Linus Torvalds, Ingo Molnar, Geert Uytterhoeven,
	Kalle Valo, Vladimir Oltean, Jakub Kicinski, Serge E. Hallyn,
	Jonas Bonn, Kees Cook, Arnd Bergmann, Ganapathi Bhat,
	Stefan Kristiansson, linux-block, openrisc, Borislav Petkov,
	Thomas Gleixner, Linux ARM, Jens Axboe, Arnd Bergmann,
	John Johansen, Xinming Hu, Vineet Gupta, Nick Desaulniers,
	Linux Kernel Mailing List, linux-ntfs-dev, linux-security-module,
	Linux Crypto Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	johannes, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Sharvari Harisangam

On Thu, Dec 16, 2021 at 06:29:40PM +0100, Ard Biesheuvel wrote:
> I think this series is a huge improvement, but it does not solve the
> UB problem completely. As we found, there are open issues in the GCC
> bugzilla regarding assumptions in the compiler that aligned quantities
> either overlap entirely or not at all. (e.g.,
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363)

That isn't open, it was closed as INVALID back in May.

(Naturally) aligned quantities only overlap if they are the same datum.
This follows directly from the definition of (naturally) aligned.  There
is no mystery here.

All unaligned data need to be marked up properly.

> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in many places to
> conditionally emit code that violates C alignment rules.

Most of this is ABI, not C.  It is the ABI that requires certain
alignments.  Ignoring that plain does not work, but even if it would
you will end up with much slower generated code.

> whereas the following pattern makes more sense, I think, and does not
> violate any C rules in the common case:
> 
> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>   // use unaligned accessors, which are cheap or even entirely free
> #else
>   // avoid unaligned accessors, as they are expensive; instead, reorganize
>   // the data so we don't need them (similar to setting NET_IP_ALIGN to 2)
> #endif

Yes, this looks more reasonable.

> The only remaining problem here is reinterpreting a char* pointer to a
> u32*, e.g., for accessing the IP address in an Ethernet frame when
> NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
> as I understand it.

The problem is never casting a pointer to pointer to character type, and
then later back to an appriopriate pointer type.  These things are both
required to work.  The problem always is accessing something as if it
was something of another type, which is not valid C.  This however is
exactly what -fno-strict-aliasing allows, so that works as well.

But this does not have much to do with alignment.


Segher

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-12-16 18:56     ` Segher Boessenkool
  0 siblings, 0 replies; 96+ messages in thread
From: Segher Boessenkool @ 2021-12-16 18:56 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Arnd Bergmann, Jason A. Donenfeld, johannes, Kees Cook,
	Nick Desaulniers, Rich Felker, linux-sh, Amitkumar Karwar,
	Russell King, Eric Dumazet, Paul Mackerras, H. Peter Anvin,
	open list:SPARC + UltraSPARC (sparc/sparc64),
	Thomas Gleixner, linux-arch, Florian Fainelli, Yoshinori Sato,
	X86 ML, James Morris, Ingo Molnar, Geert Uytterhoeven, Linux ARM,
	Richard Russon (FlatCap),
	Jakub Kicinski, Serge E. Hallyn, Jonas Bonn, Arnd Bergmann,
	Ganapathi Bhat, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Stefan Kristiansson, linux-block, linux-m68k, openrisc,
	Borislav Petkov, Stafford Horne, Kalle Valo, Jens Axboe,
	John Johansen, Xinming Hu, Vineet Gupta, linux-wireless,
	Linux Kernel Mailing List, Vladimir Oltean, linux-ntfs-dev,
	linux-security-module, Linux Crypto Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Linus Torvalds, Sharvari Harisangam

On Thu, Dec 16, 2021 at 06:29:40PM +0100, Ard Biesheuvel wrote:
> I think this series is a huge improvement, but it does not solve the
> UB problem completely. As we found, there are open issues in the GCC
> bugzilla regarding assumptions in the compiler that aligned quantities
> either overlap entirely or not at all. (e.g.,
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363)

That isn't open, it was closed as INVALID back in May.

(Naturally) aligned quantities only overlap if they are the same datum.
This follows directly from the definition of (naturally) aligned.  There
is no mystery here.

All unaligned data need to be marked up properly.

> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in many places to
> conditionally emit code that violates C alignment rules.

Most of this is ABI, not C.  It is the ABI that requires certain
alignments.  Ignoring that plain does not work, but even if it would
you will end up with much slower generated code.

> whereas the following pattern makes more sense, I think, and does not
> violate any C rules in the common case:
> 
> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>   // use unaligned accessors, which are cheap or even entirely free
> #else
>   // avoid unaligned accessors, as they are expensive; instead, reorganize
>   // the data so we don't need them (similar to setting NET_IP_ALIGN to 2)
> #endif

Yes, this looks more reasonable.

> The only remaining problem here is reinterpreting a char* pointer to a
> u32*, e.g., for accessing the IP address in an Ethernet frame when
> NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
> as I understand it.

The problem is never casting a pointer to pointer to character type, and
then later back to an appriopriate pointer type.  These things are both
required to work.  The problem always is accessing something as if it
was something of another type, which is not valid C.  This however is
exactly what -fno-strict-aliasing allows, so that works as well.

But this does not have much to do with alignment.


Segher

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-12-16 18:56     ` Segher Boessenkool
  0 siblings, 0 replies; 96+ messages in thread
From: Segher Boessenkool @ 2021-12-16 18:56 UTC (permalink / raw)
  To: openrisc

On Thu, Dec 16, 2021 at 06:29:40PM +0100, Ard Biesheuvel wrote:
> I think this series is a huge improvement, but it does not solve the
> UB problem completely. As we found, there are open issues in the GCC
> bugzilla regarding assumptions in the compiler that aligned quantities
> either overlap entirely or not at all. (e.g.,
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363)

That isn't open, it was closed as INVALID back in May.

(Naturally) aligned quantities only overlap if they are the same datum.
This follows directly from the definition of (naturally) aligned.  There
is no mystery here.

All unaligned data need to be marked up properly.

> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in many places to
> conditionally emit code that violates C alignment rules.

Most of this is ABI, not C.  It is the ABI that requires certain
alignments.  Ignoring that plain does not work, but even if it would
you will end up with much slower generated code.

> whereas the following pattern makes more sense, I think, and does not
> violate any C rules in the common case:
> 
> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>   // use unaligned accessors, which are cheap or even entirely free
> #else
>   // avoid unaligned accessors, as they are expensive; instead, reorganize
>   // the data so we don't need them (similar to setting NET_IP_ALIGN to 2)
> #endif

Yes, this looks more reasonable.

> The only remaining problem here is reinterpreting a char* pointer to a
> u32*, e.g., for accessing the IP address in an Ethernet frame when
> NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
> as I understand it.

The problem is never casting a pointer to pointer to character type, and
then later back to an appriopriate pointer type.  These things are both
required to work.  The problem always is accessing something as if it
was something of another type, which is not valid C.  This however is
exactly what -fno-strict-aliasing allows, so that works as well.

But this does not have much to do with alignment.


Segher

^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
  2021-12-16 18:56     ` Segher Boessenkool
  (?)
@ 2021-12-17 12:34       ` David Laight
  -1 siblings, 0 replies; 96+ messages in thread
From: David Laight @ 2021-12-17 12:34 UTC (permalink / raw)
  To: 'Segher Boessenkool', Ard Biesheuvel
  Cc: linux-wireless, Jason A. Donenfeld, Rich Felker, linux-sh,
	Richard Russon (FlatCap),
	X86 ML, Amitkumar Karwar, James Morris, Eric Dumazet,
	Paul Mackerras, linux-m68k, H. Peter Anvin,
	open list:SPARC + UltraSPARC (sparc/sparc64),
	Stafford Horne, linux-arch, Florian Fainelli, Yoshinori Sato,
	Russell King, Linus Torvalds, Ingo Molnar, Geert Uytterhoeven,
	Kalle Valo, Vladimir Oltean, Jakub Kicinski, Serge E. Hallyn,
	Jonas Bonn, Kees Cook, Arnd Bergmann, Ganapathi Bhat,
	Stefan Kristiansson, linux-block, openrisc, Borislav Petkov,
	Thomas Gleixner, Linux ARM, Jens Axboe, Arnd Bergmann,
	John Johansen, Xinming Hu, Vineet Gupta, Nick Desaulniers,
	Linux Kernel Mailing List, linux-ntfs-dev, linux-security-module,
	Linux Crypto Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	johannes, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Sharvari Harisangam

From: Segher Boessenkool
> Sent: 16 December 2021 18:56
...
> > The only remaining problem here is reinterpreting a char* pointer to a
> > u32*, e.g., for accessing the IP address in an Ethernet frame when
> > NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
> > as I understand it.
> 
> The problem is never casting a pointer to pointer to character type, and
> then later back to an appriopriate pointer type.
> These things are both required to work.

I think that is true of 'void *', not 'char *'.
'char' is special in that 'strict aliasing' doesn't apply to it.
(Which is actually a pain sometimes.)

> The problem always is accessing something as if it
> was something of another type, which is not valid C.  This however is
> exactly what -fno-strict-aliasing allows, so that works as well.

IIRC the C language only allows you to have pointers to valid data items.
(Since they can only be generated by the & operator on a valid item.)
Indirecting any other pointer is probably UB!

This (sort of) allows the compiler to 'look through' casts to find
what the actual type is (or might be).
It can then use that information to make optimisation choices.
This has caused grief with memcpy() calls that are trying to copy
a structure that the coder knows is misaligned to an aligned buffer.

So while *(unaligned_ptr *)char_ptr probably has to work.
If the compiler can see *(unaligned_ptr *)(char *)int_ptr it can
assume the alignment of the 'int_ptr' and do a single aligned access.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 96+ messages in thread

* RE: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-12-17 12:34       ` David Laight
  0 siblings, 0 replies; 96+ messages in thread
From: David Laight @ 2021-12-17 12:34 UTC (permalink / raw)
  To: 'Segher Boessenkool', Ard Biesheuvel
  Cc: Jason A. Donenfeld, Rich Felker, linux-sh,
	Richard Russon (FlatCap),
	linux-m68k, Amitkumar Karwar, Russell King, Eric Dumazet,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Paul Mackerras, H. Peter Anvin,
	open list:SPARC + UltraSPARC (sparc/sparc64),
	Thomas Gleixner, linux-arch, Nick Desaulniers, Florian Fainelli,
	X86 ML, James Morris, Ingo Molnar, Geert Uytterhoeven, Linux ARM,
	Vladimir Oltean, Jakub Kicinski, Serge E. Hallyn, Jonas Bonn,
	Kees Cook, Arnd Bergmann, Ganapathi Bhat,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Stefan Kristiansson, linux-block, openrisc, Borislav Petkov,
	Stafford Horne, Kalle Valo, Jens Axboe, Arnd Bergmann,
	John Johansen, Xinming Hu, Yoshinori Sato, linux-wireless,
	Linux Kernel Mailing List, linux-ntfs-dev, linux-security-module,
	Linux Crypto Mailing List, Vineet Gupta, johannes,
	Linus Torvalds, Sharvari Harisangam

From: Segher Boessenkool
> Sent: 16 December 2021 18:56
...
> > The only remaining problem here is reinterpreting a char* pointer to a
> > u32*, e.g., for accessing the IP address in an Ethernet frame when
> > NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
> > as I understand it.
> 
> The problem is never casting a pointer to pointer to character type, and
> then later back to an appriopriate pointer type.
> These things are both required to work.

I think that is true of 'void *', not 'char *'.
'char' is special in that 'strict aliasing' doesn't apply to it.
(Which is actually a pain sometimes.)

> The problem always is accessing something as if it
> was something of another type, which is not valid C.  This however is
> exactly what -fno-strict-aliasing allows, so that works as well.

IIRC the C language only allows you to have pointers to valid data items.
(Since they can only be generated by the & operator on a valid item.)
Indirecting any other pointer is probably UB!

This (sort of) allows the compiler to 'look through' casts to find
what the actual type is (or might be).
It can then use that information to make optimisation choices.
This has caused grief with memcpy() calls that are trying to copy
a structure that the coder knows is misaligned to an aligned buffer.

So while *(unaligned_ptr *)char_ptr probably has to work.
If the compiler can see *(unaligned_ptr *)(char *)int_ptr it can
assume the alignment of the 'int_ptr' and do a single aligned access.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-12-17 12:34       ` David Laight
  0 siblings, 0 replies; 96+ messages in thread
From: David Laight @ 2021-12-17 12:34 UTC (permalink / raw)
  To: openrisc

From: Segher Boessenkool
> Sent: 16 December 2021 18:56
...
> > The only remaining problem here is reinterpreting a char* pointer to a
> > u32*, e.g., for accessing the IP address in an Ethernet frame when
> > NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
> > as I understand it.
> 
> The problem is never casting a pointer to pointer to character type, and
> then later back to an appriopriate pointer type.
> These things are both required to work.

I think that is true of 'void *', not 'char *'.
'char' is special in that 'strict aliasing' doesn't apply to it.
(Which is actually a pain sometimes.)

> The problem always is accessing something as if it
> was something of another type, which is not valid C.  This however is
> exactly what -fno-strict-aliasing allows, so that works as well.

IIRC the C language only allows you to have pointers to valid data items.
(Since they can only be generated by the & operator on a valid item.)
Indirecting any other pointer is probably UB!

This (sort of) allows the compiler to 'look through' casts to find
what the actual type is (or might be).
It can then use that information to make optimisation choices.
This has caused grief with memcpy() calls that are trying to copy
a structure that the coder knows is misaligned to an aligned buffer.

So while *(unaligned_ptr *)char_ptr probably has to work.
If the compiler can see *(unaligned_ptr *)(char *)int_ptr it can
assume the alignment of the 'int_ptr' and do a single aligned access.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
  2021-12-17 12:34       ` David Laight
  (?)
@ 2021-12-17 13:35         ` Segher Boessenkool
  -1 siblings, 0 replies; 96+ messages in thread
From: Segher Boessenkool @ 2021-12-17 13:35 UTC (permalink / raw)
  To: David Laight
  Cc: Jason A. Donenfeld, Rich Felker, linux-sh,
	Richard Russon (FlatCap),
	linux-m68k, Amitkumar Karwar, Russell King, Eric Dumazet,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Paul Mackerras, H. Peter Anvin,
	open list:SPARC + UltraSPARC (sparc/sparc64),
	Ard Biesheuvel, linux-arch, Nick Desaulniers, Florian Fainelli,
	X86 ML, James Morris, Ingo Molnar, Geert Uytterhoeven, Linux ARM,
	Vladimir Oltean, Jakub Kicinski, Serge E. Hallyn, Jonas Bonn,
	Kees Cook, Arnd Bergmann, Ganapathi Bhat,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Stefan Kristiansson, linux-block, openrisc, Borislav Petkov,
	Stafford Horne, Kalle Valo, Jens Axboe, Arnd Bergmann,
	John Johansen, Xinming Hu, Yoshinori Sato, linux-wireless,
	Linux Kernel Mailing List, linux-ntfs-dev, linux-security-module,
	Thomas Gleixner, Linux Crypto Mailing List, Vineet Gupta,
	johannes, Linus Torvalds, Sharvari Harisangam

On Fri, Dec 17, 2021 at 12:34:53PM +0000, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 16 December 2021 18:56
> ...
> > > The only remaining problem here is reinterpreting a char* pointer to a
> > > u32*, e.g., for accessing the IP address in an Ethernet frame when
> > > NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
> > > as I understand it.
> > 
> > The problem is never casting a pointer to pointer to character type, and
> > then later back to an appriopriate pointer type.
> > These things are both required to work.
> 
> I think that is true of 'void *', not 'char *'.

No, see 6.3.2.3/7.  Both are allowed (and behave the same in fact).

> 'char' is special in that 'strict aliasing' doesn't apply to it.
> (Which is actually a pain sometimes.)

That has nothing to do with it.  Yes, you can validly access any memory
as a character type, but that has nothing to do with what pointer casts
are allowed and which are not.

> > The problem always is accessing something as if it
> > was something of another type, which is not valid C.  This however is
> > exactly what -fno-strict-aliasing allows, so that works as well.
> 
> IIRC the C language only allows you to have pointers to valid data items.
> (Since they can only be generated by the & operator on a valid item.)

Not so.  For example you are explicitly allowed to have pointers one
past the last element of an array (and do arithmetic on that!), and of
course null pointers are a thing.

C allows you to make up pointers from integers as well.  This is
perfectly fine to do.  Accessing anything via such pointers might well
be not standard C, of course.

> Indirecting any other pointer is probably UB!

If a pointer points to an object, indirecting it gives an lvalue of that
object.  It does not matter how you got that pointer, all that matters
is that it points at a valid object.

> This (sort of) allows the compiler to 'look through' casts to find
> what the actual type is (or might be).
> It can then use that information to make optimisation choices.
> This has caused grief with memcpy() calls that are trying to copy
> a structure that the coder knows is misaligned to an aligned buffer.

This is 6.5/7.

Alignment is 6.2.8 but it doesn't actually come into play at all here.

> So while *(unaligned_ptr *)char_ptr probably has to work.

Only if the original pointer points to an object that is correct
(including correctly aligned) for such an lvalue.

> If the compiler can see *(unaligned_ptr *)(char *)int_ptr it can
> assume the alignment of the 'int_ptr' and do a single aligned access.

It is undefined behaviour to have an address in int_ptr that is not
correctly aligned for whatever type it points to.


Segher

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-12-17 13:35         ` Segher Boessenkool
  0 siblings, 0 replies; 96+ messages in thread
From: Segher Boessenkool @ 2021-12-17 13:35 UTC (permalink / raw)
  To: David Laight
  Cc: Ard Biesheuvel, linux-wireless, Jason A. Donenfeld, Rich Felker,
	linux-sh, Richard Russon (FlatCap),
	X86 ML, Amitkumar Karwar, James Morris, Eric Dumazet,
	Paul Mackerras, linux-m68k, H. Peter Anvin,
	open list:SPARC + UltraSPARC (sparc/sparc64),
	Stafford Horne, linux-arch, Florian Fainelli, Yoshinori Sato,
	Russell King, Linus Torvalds, Ingo Molnar, Geert Uytterhoeven,
	Kalle Valo, Vladimir Oltean, Jakub Kicinski, Serge E. Hallyn,
	Jonas Bonn, Kees Cook, Arnd Bergmann, Ganapathi Bhat,
	Stefan Kristiansson, linux-block, openrisc, Borislav Petkov,
	Thomas Gleixner, Linux ARM, Jens Axboe, Arnd Bergmann,
	John Johansen, Xinming Hu, Vineet Gupta, Nick Desaulniers,
	Linux Kernel Mailing List, linux-ntfs-dev, linux-security-module,
	Linux Crypto Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	johannes, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Sharvari Harisangam

On Fri, Dec 17, 2021 at 12:34:53PM +0000, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 16 December 2021 18:56
> ...
> > > The only remaining problem here is reinterpreting a char* pointer to a
> > > u32*, e.g., for accessing the IP address in an Ethernet frame when
> > > NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
> > > as I understand it.
> > 
> > The problem is never casting a pointer to pointer to character type, and
> > then later back to an appriopriate pointer type.
> > These things are both required to work.
> 
> I think that is true of 'void *', not 'char *'.

No, see 6.3.2.3/7.  Both are allowed (and behave the same in fact).

> 'char' is special in that 'strict aliasing' doesn't apply to it.
> (Which is actually a pain sometimes.)

That has nothing to do with it.  Yes, you can validly access any memory
as a character type, but that has nothing to do with what pointer casts
are allowed and which are not.

> > The problem always is accessing something as if it
> > was something of another type, which is not valid C.  This however is
> > exactly what -fno-strict-aliasing allows, so that works as well.
> 
> IIRC the C language only allows you to have pointers to valid data items.
> (Since they can only be generated by the & operator on a valid item.)

Not so.  For example you are explicitly allowed to have pointers one
past the last element of an array (and do arithmetic on that!), and of
course null pointers are a thing.

C allows you to make up pointers from integers as well.  This is
perfectly fine to do.  Accessing anything via such pointers might well
be not standard C, of course.

> Indirecting any other pointer is probably UB!

If a pointer points to an object, indirecting it gives an lvalue of that
object.  It does not matter how you got that pointer, all that matters
is that it points at a valid object.

> This (sort of) allows the compiler to 'look through' casts to find
> what the actual type is (or might be).
> It can then use that information to make optimisation choices.
> This has caused grief with memcpy() calls that are trying to copy
> a structure that the coder knows is misaligned to an aligned buffer.

This is 6.5/7.

Alignment is 6.2.8 but it doesn't actually come into play at all here.

> So while *(unaligned_ptr *)char_ptr probably has to work.

Only if the original pointer points to an object that is correct
(including correctly aligned) for such an lvalue.

> If the compiler can see *(unaligned_ptr *)(char *)int_ptr it can
> assume the alignment of the 'int_ptr' and do a single aligned access.

It is undefined behaviour to have an address in int_ptr that is not
correctly aligned for whatever type it points to.


Segher

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [OpenRISC] [PATCH v2 00/13] Unify asm/unaligned.h around struct helper
@ 2021-12-17 13:35         ` Segher Boessenkool
  0 siblings, 0 replies; 96+ messages in thread
From: Segher Boessenkool @ 2021-12-17 13:35 UTC (permalink / raw)
  To: openrisc

On Fri, Dec 17, 2021 at 12:34:53PM +0000, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 16 December 2021 18:56
> ...
> > > The only remaining problem here is reinterpreting a char* pointer to a
> > > u32*, e.g., for accessing the IP address in an Ethernet frame when
> > > NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
> > > as I understand it.
> > 
> > The problem is never casting a pointer to pointer to character type, and
> > then later back to an appriopriate pointer type.
> > These things are both required to work.
> 
> I think that is true of 'void *', not 'char *'.

No, see 6.3.2.3/7.  Both are allowed (and behave the same in fact).

> 'char' is special in that 'strict aliasing' doesn't apply to it.
> (Which is actually a pain sometimes.)

That has nothing to do with it.  Yes, you can validly access any memory
as a character type, but that has nothing to do with what pointer casts
are allowed and which are not.

> > The problem always is accessing something as if it
> > was something of another type, which is not valid C.  This however is
> > exactly what -fno-strict-aliasing allows, so that works as well.
> 
> IIRC the C language only allows you to have pointers to valid data items.
> (Since they can only be generated by the & operator on a valid item.)

Not so.  For example you are explicitly allowed to have pointers one
past the last element of an array (and do arithmetic on that!), and of
course null pointers are a thing.

C allows you to make up pointers from integers as well.  This is
perfectly fine to do.  Accessing anything via such pointers might well
be not standard C, of course.

> Indirecting any other pointer is probably UB!

If a pointer points to an object, indirecting it gives an lvalue of that
object.  It does not matter how you got that pointer, all that matters
is that it points at a valid object.

> This (sort of) allows the compiler to 'look through' casts to find
> what the actual type is (or might be).
> It can then use that information to make optimisation choices.
> This has caused grief with memcpy() calls that are trying to copy
> a structure that the coder knows is misaligned to an aligned buffer.

This is 6.5/7.

Alignment is 6.2.8 but it doesn't actually come into play at all here.

> So while *(unaligned_ptr *)char_ptr probably has to work.

Only if the original pointer points to an object that is correct
(including correctly aligned) for such an lvalue.

> If the compiler can see *(unaligned_ptr *)(char *)int_ptr it can
> assume the alignment of the 'int_ptr' and do a single aligned access.

It is undefined behaviour to have an address in int_ptr that is not
correctly aligned for whatever type it points to.


Segher

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2021-12-17 13:54 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-14 10:00 [PATCH v2 00/13] Unify asm/unaligned.h around struct helper Arnd Bergmann
2021-05-14 10:00 ` [OpenRISC] " Arnd Bergmann
2021-05-14 10:00 ` Arnd Bergmann
2021-05-14 10:00 ` Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 01/13] asm-generic: use asm-generic/unaligned.h for most architectures Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 02/13] openrisc: always use unaligned-struct header Arnd Bergmann
2021-05-14 10:00   ` [OpenRISC] " Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 03/13] sh: remove unaligned access for sh4a Arnd Bergmann
2021-05-14 10:34   ` John Paul Adrian Glaubitz
2021-05-14 12:22     ` Arnd Bergmann
2021-05-15 15:36       ` John Paul Adrian Glaubitz
2021-05-15 20:10         ` Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 04/13] m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 05/13] powerpc: use linux/unaligned/le_struct.h on LE power7 Arnd Bergmann
2021-05-14 10:00   ` Arnd Bergmann
2021-05-14 11:48   ` Segher Boessenkool
2021-05-14 11:48     ` Segher Boessenkool
2021-05-14 13:02     ` Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 06/13] asm-generic: unaligned: remove byteshift helpers Arnd Bergmann
2021-05-14 10:00   ` Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 07/13] asm-generic: unaligned always use struct helpers Arnd Bergmann
2021-05-14 10:00   ` Arnd Bergmann
2021-05-17 21:53   ` Eric Biggers
2021-05-17 21:53     ` Eric Biggers
2021-05-18  7:25     ` Arnd Bergmann
2021-05-18  7:25       ` Arnd Bergmann
2021-05-18 14:56       ` Linus Torvalds
2021-05-18 14:56         ` Linus Torvalds
2021-05-18 15:41         ` Arnd Bergmann
2021-05-18 15:41           ` Arnd Bergmann
2021-05-18 16:12           ` Linus Torvalds
2021-05-18 16:12             ` Linus Torvalds
2021-05-18 18:09             ` Jason A. Donenfeld
2021-05-18 18:09               ` Jason A. Donenfeld
2021-05-18 20:51             ` Arnd Bergmann
2021-05-18 20:51               ` Arnd Bergmann
2021-05-18 21:31               ` Eric Biggers
2021-05-18 21:31                 ` Eric Biggers
2021-05-18 21:14         ` David Laight
2021-05-18 21:14           ` David Laight
2021-05-14 10:00 ` [PATCH v2 08/13] partitions: msdos: fix one-byte get_unaligned() Arnd Bergmann
2021-05-17 10:28   ` Christoph Hellwig
2021-05-17 10:44     ` Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 09/13] apparmor: use get_unaligned() only for multi-byte words Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 10/13] mwifiex: re-fix for unaligned accesses Arnd Bergmann
2021-05-15  6:22   ` Kalle Valo
2021-05-15  9:01     ` Arnd Bergmann
2021-05-15 18:23       ` Kalle Valo
2021-05-14 10:00 ` [PATCH v2 11/13] netpoll: avoid put_unaligned() on single character Arnd Bergmann
2021-05-14 10:01 ` [PATCH v2 12/13] asm-generic: uaccess: 1-byte access is always aligned Arnd Bergmann
2021-05-15 18:41   ` Randy Dunlap
2021-05-15 20:16     ` Arnd Bergmann
2021-05-14 10:01 ` [PATCH v2 13/13] asm-generic: simplify asm/unaligned.h Arnd Bergmann
2021-05-14 10:35   ` David Laight
2021-05-14 17:32 ` [PATCH v2 00/13] Unify asm/unaligned.h around struct helper Linus Torvalds
2021-05-14 17:32   ` [OpenRISC] " Linus Torvalds
2021-05-14 17:32   ` Linus Torvalds
2021-05-14 17:32   ` Linus Torvalds
2021-05-14 18:51   ` Vineet Gupta
2021-05-14 18:51     ` [OpenRISC] " Vineet Gupta
2021-05-14 18:51     ` Vineet Gupta
2021-05-14 18:51     ` Vineet Gupta
2021-05-14 19:22     ` Linus Torvalds
2021-05-14 19:22       ` [OpenRISC] " Linus Torvalds
2021-05-14 19:22       ` Linus Torvalds
2021-05-14 19:22       ` Linus Torvalds
2021-05-14 19:45       ` Vineet Gupta
2021-05-14 19:45         ` [OpenRISC] " Vineet Gupta
2021-05-14 19:45         ` Vineet Gupta
2021-05-14 19:45         ` Vineet Gupta
2021-05-14 20:19         ` Linus Torvalds
2021-05-14 20:19           ` [OpenRISC] " Linus Torvalds
2021-05-14 20:19           ` Linus Torvalds
2021-05-14 20:19           ` Linus Torvalds
2021-05-14 19:31   ` Arnd Bergmann
2021-05-14 19:31     ` [OpenRISC] " Arnd Bergmann
2021-05-14 19:31     ` Arnd Bergmann
2021-05-14 19:31     ` Arnd Bergmann
2021-12-16 17:29 ` Ard Biesheuvel
2021-12-16 17:29   ` [OpenRISC] " Ard Biesheuvel
2021-12-16 17:29   ` Ard Biesheuvel
2021-12-16 17:42   ` Linus Torvalds
2021-12-16 17:42     ` [OpenRISC] " Linus Torvalds
2021-12-16 17:42     ` Linus Torvalds
2021-12-16 17:49   ` David Laight
2021-12-16 17:49     ` [OpenRISC] " David Laight
2021-12-16 17:49     ` David Laight
2021-12-16 18:56   ` Segher Boessenkool
2021-12-16 18:56     ` [OpenRISC] " Segher Boessenkool
2021-12-16 18:56     ` Segher Boessenkool
2021-12-17 12:34     ` David Laight
2021-12-17 12:34       ` [OpenRISC] " David Laight
2021-12-17 12:34       ` David Laight
2021-12-17 13:35       ` Segher Boessenkool
2021-12-17 13:35         ` [OpenRISC] " Segher Boessenkool
2021-12-17 13:35         ` Segher Boessenkool

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.