All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 5.8 00/70] 5.8.18-rc1 review
@ 2020-10-31 11:35 Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 01/70] netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create Greg Kroah-Hartman
                   ` (72 more replies)
  0 siblings, 73 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuah, patches,
	lkft-triage, pavel, stable

-------------------------
Note, this is going to be the LAST 5.8.y kernel release.  After this
one, this branch is now end-of-life.  Please move to the 5.9.y branch at
this point in time.
-------------------------

This is the start of the stable review cycle for the 5.8.18 release.
There are 70 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.18-rc1.gz
or in the git tree and branch at:
	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.8.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 5.8.18-rc1

Pali Rohár <pali@kernel.org>
    phy: marvell: comphy: Convert internal SMCC firmware return codes to errno

Ricky Wu <ricky_wu@realtek.com>
    misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp()

Stafford Horne <shorne@gmail.com>
    openrisc: Fix issue with get_user for 64-bit values

Souptick Joarder <jrdr.linux@gmail.com>
    xen/gntdev.c: Mark pages as dirty

Geert Uytterhoeven <geert+renesas@glider.be>
    ata: sata_rcar: Fix DMA boundary mask

Grygorii Strashko <grygorii.strashko@ti.com>
    PM: runtime: Fix timer_expires data type on 32-bit arches

Peter Zijlstra <peterz@infradead.org>
    serial: pl011: Fix lockdep splat when handling magic-sysrq interrupt

Paras Sharma <parashar@codeaurora.org>
    serial: qcom_geni_serial: To correct QUP Version detection logic

Chris Wilson <chris@chris-wilson.co.uk>
    drm/i915/gem: Serialise debugfs i915_gem_objects with ctx->mutex

Gustavo A. R. Silva <gustavo@embeddedor.com>
    mtd: lpddr: Fix bad logic in print_drs_error

Jason Gunthorpe <jgg@nvidia.com>
    RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel()

Frederic Barrat <fbarrat@linux.ibm.com>
    cxl: Rework error message for incompatible slots

Jia-Ju Bai <baijiaju@tsinghua.edu.cn>
    p54: avoid accessing the data mapped to streaming DMA

Roberto Sassu <roberto.sassu@huawei.com>
    evm: Check size of security.evm before using it

Song Liu <songliubraving@fb.com>
    bpf: Fix comment for helper bpf_current_task_under_cgroup()

Miklos Szeredi <mszeredi@redhat.com>
    fuse: fix page dereference after free

Pali Rohár <pali@kernel.org>
    ata: ahci: mvebu: Make SATA PHY optional for Armada 3720

Pali Rohár <pali@kernel.org>
    PCI: aardvark: Fix initialization with old Marvell's Arm Trusted Firmware

Juergen Gross <jgross@suse.com>
    x86/xen: disable Firmware First mode for correctable memory errors

Thomas Gleixner <tglx@linutronix.de>
    x86/traps: Fix #DE Oops message regression

Kim Phillips <kim.phillips@amd.com>
    arch/x86/amd/ibs: Fix re-arming IBS Fetch

Gao Xiang <hsiangkao@redhat.com>
    erofs: avoid duplicated permission check for "trusted." xattrs

Leon Romanovsky <leonro@nvidia.com>
    net: protect tcf_block_unbind with block lock

Tung Nguyen <tung.q.nguyen@dektech.com.au>
    tipc: fix memory leak caused by tipc_buf_append()

Arjun Roy <arjunroy@google.com>
    tcp: Prevent low rmem stalls with SO_RCVLOWAT.

Andrew Gabbasov <andrew_gabbasov@mentor.com>
    ravb: Fix bit fields checking in ravb_hwtstamp_get()

Heiner Kallweit <hkallweit1@gmail.com>
    r8169: fix issue with forced threading in combination with shared interrupts

Guillaume Nault <gnault@redhat.com>
    net/sched: act_mpls: Add softdep on mpls_gso.ko

Alex Elder <elder@linaro.org>
    net: ipa: command payloads already mapped

Zenghui Yu <yuzenghui@huawei.com>
    net: hns3: Clear the CMDQ registers before unmapping BAR region

Aleksandr Nogikh <nogikh@google.com>
    netem: fix zero division in tabledist

Ido Schimmel <idosch@nvidia.com>
    mlxsw: core: Fix memory leak on module removal

Lijun Pan <ljp@linux.ibm.com>
    ibmvnic: fix ibmvnic_set_mac

Thomas Bogendoerfer <tbogendoerfer@suse.de>
    ibmveth: Fix use of ibmveth in a bridge.

Masahiro Fujiwara <fujiwara.masahiro@gmail.com>
    gtp: fix an use-before-init in gtp_newlink()

Raju Rangoju <rajur@chelsio.com>
    cxgb4: set up filter action after rewrites

Vinay Kumar Yadav <vinay.yadav@chelsio.com>
    chelsio/chtls: fix tls record info to user

Vinay Kumar Yadav <vinay.yadav@chelsio.com>
    chelsio/chtls: fix memory leaks in CPL handlers

Vinay Kumar Yadav <vinay.yadav@chelsio.com>
    chelsio/chtls: fix deadlock issue

Vasundhara Volam <vasundhara-v.volam@broadcom.com>
    bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally.

Vasundhara Volam <vasundhara-v.volam@broadcom.com>
    bnxt_en: Re-write PCI BARs after PCI fatal error.

Vasundhara Volam <vasundhara-v.volam@broadcom.com>
    bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.

Vasundhara Volam <vasundhara-v.volam@broadcom.com>
    bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one().

Michael Chan <michael.chan@broadcom.com>
    bnxt_en: Check abort error state in bnxt_open_nic().

Michael Schaller <misch@google.com>
    efivarfs: Replace invalid slashes with exclamation marks in dentries.

Dan Williams <dan.j.williams@intel.com>
    x86/copy_mc: Introduce copy_mc_enhanced_fast_string()

Dan Williams <dan.j.williams@intel.com>
    x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()

Randy Dunlap <rdunlap@infradead.org>
    x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled

Nick Desaulniers <ndesaulniers@google.com>
    arm64: link with -z norelro regardless of CONFIG_RELOCATABLE

Marc Zyngier <maz@kernel.org>
    arm64: Run ARCH_WORKAROUND_2 enabling code on all CPUs

Marc Zyngier <maz@kernel.org>
    arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUs

Kees Cook <keescook@chromium.org>
    fs/kernel_read_file: Remove FIRMWARE_EFI_EMBEDDED enum

Ard Biesheuvel <ardb@kernel.org>
    efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure

Rasmus Villemoes <linux@rasmusvillemoes.dk>
    scripts/setlocalversion: make git describe output more reliable

Matthew Wilcox (Oracle) <willy@infradead.org>
    io_uring: Convert advanced XArray uses to the normal API

Matthew Wilcox (Oracle) <willy@infradead.org>
    io_uring: Fix XArray usage in io_uring_add_task_file

Matthew Wilcox (Oracle) <willy@infradead.org>
    io_uring: Fix use of XArray in __io_uring_files_cancel

Jens Axboe <axboe@kernel.dk>
    io_uring: no need to call xa_destroy() on empty xarray

Hillf Danton <hdanton@sina.com>
    io-wq: fix use-after-free in io_wq_worker_running

Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    io_wq: Make io_wqe::lock a raw_spinlock_t

Jens Axboe <axboe@kernel.dk>
    io_uring: reference ->nsproxy for file table commands

Jens Axboe <axboe@kernel.dk>
    io_uring: don't rely on weak ->files references

Jens Axboe <axboe@kernel.dk>
    io_uring: enable task/files specific overflow flushing

Jens Axboe <axboe@kernel.dk>
    io_uring: return cancelation status from poll/timeout/files handlers

Jens Axboe <axboe@kernel.dk>
    io_uring: unconditionally grab req->task

Jens Axboe <axboe@kernel.dk>
    io_uring: stash ctx task reference for SQPOLL

Jens Axboe <axboe@kernel.dk>
    io_uring: move dropping of files into separate helper

Jens Axboe <axboe@kernel.dk>
    io_uring: allow timeout/poll/files killing to take task into account

Jens Axboe <axboe@kernel.dk>
    io_uring: don't run task work on an exiting task

Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
    netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create


-------------

Diffstat:

 Makefile                                           |   4 +-
 arch/arm64/Makefile                                |   4 +-
 arch/arm64/kernel/cpu_errata.c                     |  15 +
 arch/openrisc/include/asm/uaccess.h                |  35 +-
 arch/powerpc/Kconfig                               |   2 +-
 arch/powerpc/include/asm/string.h                  |   2 -
 arch/powerpc/include/asm/uaccess.h                 |  40 +-
 arch/powerpc/lib/Makefile                          |   2 +-
 .../lib/{memcpy_mcsafe_64.S => copy_mc_64.S}       |   4 +-
 arch/x86/Kconfig                                   |   2 +-
 arch/x86/Kconfig.debug                             |   2 +-
 arch/x86/events/amd/ibs.c                          |  15 +-
 arch/x86/include/asm/copy_mc_test.h                |  75 ++++
 arch/x86/include/asm/mce.h                         |   9 +
 arch/x86/include/asm/mcsafe_test.h                 |  75 ----
 arch/x86/include/asm/string_64.h                   |  32 --
 arch/x86/include/asm/uaccess.h                     |   9 +
 arch/x86/include/asm/uaccess_64.h                  |  20 -
 arch/x86/kernel/cpu/mce/core.c                     |   8 +-
 arch/x86/kernel/quirks.c                           |  10 +-
 arch/x86/kernel/traps.c                            |   2 +-
 arch/x86/lib/Makefile                              |   1 +
 arch/x86/lib/copy_mc.c                             |  96 +++++
 arch/x86/lib/copy_mc_64.S                          | 163 +++++++
 arch/x86/lib/memcpy_64.S                           | 115 -----
 arch/x86/lib/usercopy_64.c                         |  21 -
 arch/x86/pci/intel_mid_pci.c                       |   1 +
 arch/x86/xen/enlighten_pv.c                        |   9 +
 drivers/ata/ahci.h                                 |   2 +
 drivers/ata/ahci_mvebu.c                           |   2 +-
 drivers/ata/libahci_platform.c                     |   2 +-
 drivers/ata/sata_rcar.c                            |   2 +-
 drivers/base/firmware_loader/fallback_platform.c   |   2 +-
 drivers/crypto/chelsio/chtls/chtls_cm.c            |  29 +-
 drivers/crypto/chelsio/chtls/chtls_io.c            |   7 +-
 drivers/firmware/efi/libstub/arm64-stub.c          |   8 +-
 drivers/firmware/efi/libstub/fdt.c                 |   4 +-
 drivers/gpu/drm/i915/i915_debugfs.c                |   2 +
 drivers/infiniband/core/addr.c                     |  11 +-
 drivers/md/dm-writecache.c                         |  15 +-
 drivers/misc/cardreader/rtsx_pcr.c                 |   4 -
 drivers/misc/cxl/pci.c                             |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  49 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h          |   1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c  |  56 ++-
 drivers/net/ethernet/chelsio/cxgb4/t4_tcb.h        |   4 +
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  |   2 +-
 drivers/net/ethernet/ibm/ibmveth.c                 |   6 -
 drivers/net/ethernet/ibm/ibmvnic.c                 |   8 +-
 drivers/net/ethernet/mellanox/mlxsw/core.c         |   2 +
 drivers/net/ethernet/realtek/r8169_main.c          |   4 +-
 drivers/net/ethernet/renesas/ravb_main.c           |  10 +-
 drivers/net/gtp.c                                  |  16 +-
 drivers/net/ipa/gsi_trans.c                        |  21 +-
 drivers/net/wireless/intersil/p54/p54pci.c         |   4 +-
 drivers/nvdimm/claim.c                             |   2 +-
 drivers/nvdimm/pmem.c                              |   6 +-
 drivers/pci/controller/pci-aardvark.c              |   4 +-
 drivers/phy/marvell/phy-mvebu-a3700-comphy.c       |  14 +-
 drivers/phy/marvell/phy-mvebu-cp110-comphy.c       |  14 +-
 drivers/tty/serial/amba-pl011.c                    |  11 +-
 drivers/tty/serial/qcom_geni_serial.c              |   2 +-
 drivers/xen/gntdev.c                               |  17 +-
 fs/efivarfs/super.c                                |   3 +
 fs/erofs/xattr.c                                   |   2 -
 fs/exec.c                                          |   6 +
 fs/file.c                                          |   2 +
 fs/fuse/dev.c                                      |  28 +-
 fs/io-wq.c                                         | 172 ++++----
 fs/io-wq.h                                         |   1 +
 fs/io_uring.c                                      | 475 ++++++++++++++++-----
 include/linux/fs.h                                 |   1 -
 include/linux/io_uring.h                           |  53 +++
 include/linux/mtd/pfow.h                           |   2 +-
 include/linux/pm.h                                 |   2 +-
 include/linux/qcom-geni-se.h                       |   3 +
 include/linux/sched.h                              |   5 +
 include/linux/string.h                             |   9 +-
 include/linux/uaccess.h                            |  13 +
 include/linux/uio.h                                |  10 +-
 include/net/netfilter/nf_tables.h                  |   6 +
 include/uapi/linux/bpf.h                           |   4 +-
 init/init_task.c                                   |   3 +
 kernel/fork.c                                      |   6 +
 lib/Kconfig                                        |   7 +-
 lib/iov_iter.c                                     |  48 +--
 net/ipv4/tcp.c                                     |   2 +
 net/ipv4/tcp_input.c                               |   3 +-
 net/netfilter/nf_tables_api.c                      |   6 +-
 net/netfilter/nf_tables_offload.c                  |   4 +-
 net/sched/act_mpls.c                               |   1 +
 net/sched/cls_api.c                                |   4 +-
 net/sched/sch_netem.c                              |   9 +-
 net/tipc/msg.c                                     |   5 +-
 scripts/setlocalversion                            |  21 +-
 security/integrity/evm/evm_main.c                  |   6 +
 tools/arch/x86/include/asm/mcsafe_test.h           |  13 -
 tools/arch/x86/lib/memcpy_64.S                     | 115 -----
 tools/include/uapi/linux/bpf.h                     |   4 +-
 tools/objtool/check.c                              |   5 +-
 tools/perf/bench/Build                             |   1 -
 tools/perf/bench/mem-memcpy-x86-64-lib.c           |  24 --
 tools/testing/nvdimm/test/nfit.c                   |  49 +--
 .../testing/selftests/powerpc/copyloops/.gitignore |   2 +-
 tools/testing/selftests/powerpc/copyloops/Makefile |   6 +-
 .../selftests/powerpc/copyloops/copy_mc_64.S       | 242 +++++++++++
 106 files changed, 1595 insertions(+), 908 deletions(-)



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 01/70] netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 02/70] io_uring: dont run task work on an exiting task Greg Kroah-Hartman
                   ` (71 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Saeed Mirzamohammadi, Pablo Neira Ayuso

From: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>

commit 31cc578ae2de19c748af06d859019dced68e325d upstream.

This patch fixes the issue due to:

BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2
net/netfilter/nf_tables_offload.c:40
Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244

The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds.

This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue.

Add nft_expr_more() and use it to fix this problem.

Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/net/netfilter/nf_tables.h |    6 ++++++
 net/netfilter/nf_tables_api.c     |    6 +++---
 net/netfilter/nf_tables_offload.c |    4 ++--
 3 files changed, 11 insertions(+), 5 deletions(-)

--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -896,6 +896,12 @@ static inline struct nft_expr *nft_expr_
 	return (struct nft_expr *)&rule->data[rule->dlen];
 }
 
+static inline bool nft_expr_more(const struct nft_rule *rule,
+				 const struct nft_expr *expr)
+{
+	return expr != nft_expr_last(rule) && expr->ops;
+}
+
 static inline struct nft_userdata *nft_userdata(const struct nft_rule *rule)
 {
 	return (void *)&rule->data[rule->dlen];
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -295,7 +295,7 @@ static void nft_rule_expr_activate(const
 	struct nft_expr *expr;
 
 	expr = nft_expr_first(rule);
-	while (expr != nft_expr_last(rule) && expr->ops) {
+	while (nft_expr_more(rule, expr)) {
 		if (expr->ops->activate)
 			expr->ops->activate(ctx, expr);
 
@@ -310,7 +310,7 @@ static void nft_rule_expr_deactivate(con
 	struct nft_expr *expr;
 
 	expr = nft_expr_first(rule);
-	while (expr != nft_expr_last(rule) && expr->ops) {
+	while (nft_expr_more(rule, expr)) {
 		if (expr->ops->deactivate)
 			expr->ops->deactivate(ctx, expr, phase);
 
@@ -2917,7 +2917,7 @@ static void nf_tables_rule_destroy(const
 	 * is called on error from nf_tables_newrule().
 	 */
 	expr = nft_expr_first(rule);
-	while (expr != nft_expr_last(rule) && expr->ops) {
+	while (nft_expr_more(rule, expr)) {
 		next = nft_expr_next(expr);
 		nf_tables_expr_destroy(ctx, expr);
 		expr = next;
--- a/net/netfilter/nf_tables_offload.c
+++ b/net/netfilter/nf_tables_offload.c
@@ -37,7 +37,7 @@ struct nft_flow_rule *nft_flow_rule_crea
 	struct nft_expr *expr;
 
 	expr = nft_expr_first(rule);
-	while (expr->ops && expr != nft_expr_last(rule)) {
+	while (nft_expr_more(rule, expr)) {
 		if (expr->ops->offload_flags & NFT_OFFLOAD_F_ACTION)
 			num_actions++;
 
@@ -61,7 +61,7 @@ struct nft_flow_rule *nft_flow_rule_crea
 	ctx->net = net;
 	ctx->dep.type = NFT_OFFLOAD_DEP_UNSPEC;
 
-	while (expr->ops && expr != nft_expr_last(rule)) {
+	while (nft_expr_more(rule, expr)) {
 		if (!expr->ops->offload) {
 			err = -EOPNOTSUPP;
 			goto err_out;



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 02/70] io_uring: dont run task work on an exiting task
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 01/70] netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 03/70] io_uring: allow timeout/poll/files killing to take task into account Greg Kroah-Hartman
                   ` (70 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Jens Axboe

From: Jens Axboe <axboe@kernel.dk>

commit 6200b0ae4ea28a4bfd8eb434e33e6201b7a6a282 upstream.

This isn't safe, and isn't needed either. We are guaranteed that any
work we queue is on a live task (and will be run), or it goes to
our backup io-wq threads if the task is exiting.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io_uring.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1762,6 +1762,12 @@ static int io_put_kbuf(struct io_kiocb *
 
 static inline bool io_run_task_work(void)
 {
+	/*
+	 * Not safe to run on exiting task, and the task_work handling will
+	 * not add work to such a task.
+	 */
+	if (unlikely(current->flags & PF_EXITING))
+		return false;
 	if (current->task_works) {
 		__set_current_state(TASK_RUNNING);
 		task_work_run();
@@ -7791,6 +7797,8 @@ static void io_uring_cancel_files(struct
 			io_put_req(cancel_req);
 		}
 
+		/* cancellations _may_ trigger task work */
+		io_run_task_work();
 		schedule();
 		finish_wait(&ctx->inflight_wait, &wait);
 	}



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 03/70] io_uring: allow timeout/poll/files killing to take task into account
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 01/70] netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 02/70] io_uring: dont run task work on an exiting task Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 04/70] io_uring: move dropping of files into separate helper Greg Kroah-Hartman
                   ` (69 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Pavel Begunkov, Jens Axboe

From: Jens Axboe <axboe@kernel.dk>

commit f3606e3a92ddd36299642c78592fc87609abb1f6 upstream.

We currently cancel these when the ring exits, and we cancel all of
them. This is in preparation for killing only the ones associated
with a given task.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io_uring.c |   30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1141,13 +1141,25 @@ static void io_kill_timeout(struct io_ki
 	}
 }
 
-static void io_kill_timeouts(struct io_ring_ctx *ctx)
+static bool io_task_match(struct io_kiocb *req, struct task_struct *tsk)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+
+	if (!tsk || req->task == tsk)
+		return true;
+	if ((ctx->flags & IORING_SETUP_SQPOLL) && req->task == ctx->sqo_thread)
+		return true;
+	return false;
+}
+
+static void io_kill_timeouts(struct io_ring_ctx *ctx, struct task_struct *tsk)
 {
 	struct io_kiocb *req, *tmp;
 
 	spin_lock_irq(&ctx->completion_lock);
 	list_for_each_entry_safe(req, tmp, &ctx->timeout_list, list)
-		io_kill_timeout(req);
+		if (io_task_match(req, tsk))
+			io_kill_timeout(req);
 	spin_unlock_irq(&ctx->completion_lock);
 }
 
@@ -4641,7 +4653,7 @@ static bool io_poll_remove_one(struct io
 	return do_complete;
 }
 
-static void io_poll_remove_all(struct io_ring_ctx *ctx)
+static void io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk)
 {
 	struct hlist_node *tmp;
 	struct io_kiocb *req;
@@ -4652,8 +4664,10 @@ static void io_poll_remove_all(struct io
 		struct hlist_head *list;
 
 		list = &ctx->cancel_hash[i];
-		hlist_for_each_entry_safe(req, tmp, list, hash_node)
-			posted += io_poll_remove_one(req);
+		hlist_for_each_entry_safe(req, tmp, list, hash_node) {
+			if (io_task_match(req, tsk))
+				posted += io_poll_remove_one(req);
+		}
 	}
 	spin_unlock_irq(&ctx->completion_lock);
 
@@ -7556,8 +7570,8 @@ static void io_ring_ctx_wait_and_kill(st
 	percpu_ref_kill(&ctx->refs);
 	mutex_unlock(&ctx->uring_lock);
 
-	io_kill_timeouts(ctx);
-	io_poll_remove_all(ctx);
+	io_kill_timeouts(ctx, NULL);
+	io_poll_remove_all(ctx, NULL);
 
 	if (ctx->io_wq)
 		io_wq_cancel_all(ctx->io_wq);
@@ -7809,7 +7823,7 @@ static bool io_cancel_task_cb(struct io_
 	struct io_kiocb *req = container_of(work, struct io_kiocb, work);
 	struct task_struct *task = data;
 
-	return req->task == task;
+	return io_task_match(req, task);
 }
 
 static int io_uring_flush(struct file *file, void *data)



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 04/70] io_uring: move dropping of files into separate helper
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 03/70] io_uring: allow timeout/poll/files killing to take task into account Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 05/70] io_uring: stash ctx task reference for SQPOLL Greg Kroah-Hartman
                   ` (68 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Pavel Begunkov, Jens Axboe

From: Jens Axboe <axboe@kernel.dk>

commit f573d384456b3025d3f8e58b3eafaeeb0f510784 upstream.

No functional changes in this patch, prep patch for grabbing references
to the files_struct.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io_uring.c |   26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1424,6 +1424,20 @@ static inline void io_put_file(struct io
 		fput(file);
 }
 
+static void io_req_drop_files(struct io_kiocb *req)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ctx->inflight_lock, flags);
+	list_del(&req->inflight_entry);
+	if (waitqueue_active(&ctx->inflight_wait))
+		wake_up(&ctx->inflight_wait);
+	spin_unlock_irqrestore(&ctx->inflight_lock, flags);
+	req->flags &= ~REQ_F_INFLIGHT;
+	req->work.files = NULL;
+}
+
 static void __io_req_aux_free(struct io_kiocb *req)
 {
 	if (req->flags & REQ_F_NEED_CLEANUP)
@@ -1440,16 +1454,8 @@ static void __io_free_req(struct io_kioc
 {
 	__io_req_aux_free(req);
 
-	if (req->flags & REQ_F_INFLIGHT) {
-		struct io_ring_ctx *ctx = req->ctx;
-		unsigned long flags;
-
-		spin_lock_irqsave(&ctx->inflight_lock, flags);
-		list_del(&req->inflight_entry);
-		if (waitqueue_active(&ctx->inflight_wait))
-			wake_up(&ctx->inflight_wait);
-		spin_unlock_irqrestore(&ctx->inflight_lock, flags);
-	}
+	if (req->flags & REQ_F_INFLIGHT)
+		io_req_drop_files(req);
 
 	percpu_ref_put(&req->ctx->refs);
 	if (likely(!io_is_fallback_req(req)))



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 05/70] io_uring: stash ctx task reference for SQPOLL
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 04/70] io_uring: move dropping of files into separate helper Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 06/70] io_uring: unconditionally grab req->task Greg Kroah-Hartman
                   ` (67 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Pavel Begunkov, Jens Axboe

From: Jens Axboe <axboe@kernel.dk>

commit 2aede0e417db846793c276c7a1bbf7262c8349b0 upstream.

We can grab a reference to the task instead of stashing away the task
files_struct. This is doable without creating a circular reference
between the ring fd and the task itself.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io_uring.c |   39 +++++++++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 10 deletions(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -264,7 +264,16 @@ struct io_ring_ctx {
 	/* IO offload */
 	struct io_wq		*io_wq;
 	struct task_struct	*sqo_thread;	/* if using sq thread polling */
-	struct mm_struct	*sqo_mm;
+
+	/*
+	 * For SQPOLL usage - we hold a reference to the parent task, so we
+	 * have access to the ->files
+	 */
+	struct task_struct	*sqo_task;
+
+	/* Only used for accounting purposes */
+	struct mm_struct	*mm_account;
+
 	wait_queue_head_t	sqo_wait;
 
 	/*
@@ -4421,9 +4430,10 @@ static int io_sq_thread_acquire_mm(struc
 {
 	if (io_op_defs[req->opcode].needs_mm && !current->mm) {
 		if (unlikely(!(ctx->flags & IORING_SETUP_SQPOLL) ||
-			     !mmget_not_zero(ctx->sqo_mm)))
+			!ctx->sqo_task->mm ||
+			!mmget_not_zero(ctx->sqo_task->mm)))
 			return -EFAULT;
-		kthread_use_mm(ctx->sqo_mm);
+		kthread_use_mm(ctx->sqo_task->mm);
 	}
 
 	return 0;
@@ -7104,9 +7114,6 @@ static int io_sq_offload_start(struct io
 {
 	int ret;
 
-	mmgrab(current->mm);
-	ctx->sqo_mm = current->mm;
-
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
 		ret = -EPERM;
 		if (!capable(CAP_SYS_ADMIN))
@@ -7151,8 +7158,6 @@ static int io_sq_offload_start(struct io
 	return 0;
 err:
 	io_finish_async(ctx);
-	mmdrop(ctx->sqo_mm);
-	ctx->sqo_mm = NULL;
 	return ret;
 }
 
@@ -7482,8 +7487,12 @@ static void io_destroy_buffers(struct io
 static void io_ring_ctx_free(struct io_ring_ctx *ctx)
 {
 	io_finish_async(ctx);
-	if (ctx->sqo_mm)
-		mmdrop(ctx->sqo_mm);
+	if (ctx->sqo_task) {
+		put_task_struct(ctx->sqo_task);
+		ctx->sqo_task = NULL;
+		mmdrop(ctx->mm_account);
+		ctx->mm_account = NULL;
+	}
 
 	io_iopoll_reap_events(ctx);
 	io_sqe_buffer_unregister(ctx);
@@ -8256,6 +8265,16 @@ static int io_uring_create(unsigned entr
 	ctx->user = user;
 	ctx->creds = get_current_cred();
 
+	ctx->sqo_task = get_task_struct(current);
+	/*
+	 * This is just grabbed for accounting purposes. When a process exits,
+	 * the mm is exited and dropped before the files, hence we need to hang
+	 * on to this mm purely for the purposes of being able to unaccount
+	 * memory (locked/pinned vm). It's not used for anything else.
+	 */
+	mmgrab(current->mm);
+	ctx->mm_account = current->mm;
+
 	ret = io_allocate_scq_urings(ctx, p);
 	if (ret)
 		goto err;



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 06/70] io_uring: unconditionally grab req->task
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 05/70] io_uring: stash ctx task reference for SQPOLL Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 07/70] io_uring: return cancelation status from poll/timeout/files handlers Greg Kroah-Hartman
                   ` (66 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Pavel Begunkov, Jens Axboe

From: Jens Axboe <axboe@kernel.dk>

commit e3bc8e9dad7f2f83cc807111d4472164c9210153 upstream.

Sometimes we assign a weak reference to it, sometimes we grab a
reference to it. Clean this up and make it unconditional, and drop the
flag related to tracking this state.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io_uring.c |   26 +++-----------------------
 1 file changed, 3 insertions(+), 23 deletions(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -550,7 +550,6 @@ enum {
 	REQ_F_NO_FILE_TABLE_BIT,
 	REQ_F_QUEUE_TIMEOUT_BIT,
 	REQ_F_WORK_INITIALIZED_BIT,
-	REQ_F_TASK_PINNED_BIT,
 
 	/* not a real bit, just to check we're not overflowing the space */
 	__REQ_F_LAST_BIT,
@@ -608,8 +607,6 @@ enum {
 	REQ_F_QUEUE_TIMEOUT	= BIT(REQ_F_QUEUE_TIMEOUT_BIT),
 	/* io_wq_work is initialized */
 	REQ_F_WORK_INITIALIZED	= BIT(REQ_F_WORK_INITIALIZED_BIT),
-	/* req->task is refcounted */
-	REQ_F_TASK_PINNED	= BIT(REQ_F_TASK_PINNED_BIT),
 };
 
 struct async_poll {
@@ -924,21 +921,6 @@ struct sock *io_uring_get_socket(struct
 }
 EXPORT_SYMBOL(io_uring_get_socket);
 
-static void io_get_req_task(struct io_kiocb *req)
-{
-	if (req->flags & REQ_F_TASK_PINNED)
-		return;
-	get_task_struct(req->task);
-	req->flags |= REQ_F_TASK_PINNED;
-}
-
-/* not idempotent -- it doesn't clear REQ_F_TASK_PINNED */
-static void __io_put_req_task(struct io_kiocb *req)
-{
-	if (req->flags & REQ_F_TASK_PINNED)
-		put_task_struct(req->task);
-}
-
 static void io_file_put_work(struct work_struct *work);
 
 /*
@@ -1455,7 +1437,7 @@ static void __io_req_aux_free(struct io_
 	kfree(req->io);
 	if (req->file)
 		io_put_file(req, req->file, (req->flags & REQ_F_FIXED_FILE));
-	__io_put_req_task(req);
+	put_task_struct(req->task);
 	io_req_work_drop_env(req);
 }
 
@@ -1765,7 +1747,7 @@ static inline bool io_req_multi_free(str
 	if ((req->flags & REQ_F_LINK_HEAD) || io_is_fallback_req(req))
 		return false;
 
-	if (req->file || req->io)
+	if (req->file || req->io || req->task)
 		rb->need_iter++;
 
 	rb->reqs[rb->to_free++] = req;
@@ -4584,7 +4566,6 @@ static bool io_arm_poll_handler(struct i
 	if (req->flags & REQ_F_WORK_INITIALIZED)
 		memcpy(&apoll->work, &req->work, sizeof(req->work));
 
-	io_get_req_task(req);
 	req->apoll = apoll;
 	INIT_HLIST_NODE(&req->hash_node);
 
@@ -4774,8 +4755,6 @@ static int io_poll_add_prep(struct io_ki
 
 	events = READ_ONCE(sqe->poll_events);
 	poll->events = demangle_poll(events) | EPOLLERR | EPOLLHUP;
-
-	io_get_req_task(req);
 	return 0;
 }
 
@@ -6057,6 +6036,7 @@ static int io_init_req(struct io_ring_ct
 	/* one is dropped after submission, the other at completion */
 	refcount_set(&req->refs, 2);
 	req->task = current;
+	get_task_struct(req->task);
 	req->result = 0;
 
 	if (unlikely(req->opcode >= IORING_OP_LAST))



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 07/70] io_uring: return cancelation status from poll/timeout/files handlers
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 06/70] io_uring: unconditionally grab req->task Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 08/70] io_uring: enable task/files specific overflow flushing Greg Kroah-Hartman
                   ` (65 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Pavel Begunkov, Jens Axboe

From: Jens Axboe <axboe@kernel.dk>

commit 76e1b6427fd8246376a97e3227049d49188dfb9c upstream.

Return whether we found and canceled requests or not. This is in
preparation for using this information, no functional changes in this
patch.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io_uring.c |   30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1143,15 +1143,23 @@ static bool io_task_match(struct io_kioc
 	return false;
 }
 
-static void io_kill_timeouts(struct io_ring_ctx *ctx, struct task_struct *tsk)
+/*
+ * Returns true if we found and killed one or more timeouts
+ */
+static bool io_kill_timeouts(struct io_ring_ctx *ctx, struct task_struct *tsk)
 {
 	struct io_kiocb *req, *tmp;
+	int canceled = 0;
 
 	spin_lock_irq(&ctx->completion_lock);
-	list_for_each_entry_safe(req, tmp, &ctx->timeout_list, list)
-		if (io_task_match(req, tsk))
+	list_for_each_entry_safe(req, tmp, &ctx->timeout_list, list) {
+		if (io_task_match(req, tsk)) {
 			io_kill_timeout(req);
+			canceled++;
+		}
+	}
 	spin_unlock_irq(&ctx->completion_lock);
+	return canceled != 0;
 }
 
 static void __io_queue_deferred(struct io_ring_ctx *ctx)
@@ -4650,7 +4658,10 @@ static bool io_poll_remove_one(struct io
 	return do_complete;
 }
 
-static void io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk)
+/*
+ * Returns true if we found and killed one or more poll requests
+ */
+static bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk)
 {
 	struct hlist_node *tmp;
 	struct io_kiocb *req;
@@ -4670,6 +4681,8 @@ static void io_poll_remove_all(struct io
 
 	if (posted)
 		io_cqring_ev_posted(ctx);
+
+	return posted != 0;
 }
 
 static int io_poll_cancel(struct io_ring_ctx *ctx, __u64 sqe_addr)
@@ -7744,11 +7757,14 @@ static void io_cancel_defer_files(struct
 	}
 }
 
-static void io_uring_cancel_files(struct io_ring_ctx *ctx,
+/*
+ * Returns true if we found and killed one or more files pinning requests
+ */
+static bool io_uring_cancel_files(struct io_ring_ctx *ctx,
 				  struct files_struct *files)
 {
 	if (list_empty_careful(&ctx->inflight_list))
-		return;
+		return false;
 
 	io_cancel_defer_files(ctx, files);
 	/* cancel all at once, should be faster than doing it one by one*/
@@ -7811,6 +7827,8 @@ static void io_uring_cancel_files(struct
 		schedule();
 		finish_wait(&ctx->inflight_wait, &wait);
 	}
+
+	return true;
 }
 
 static bool io_cancel_task_cb(struct io_wq_work *work, void *data)



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 08/70] io_uring: enable task/files specific overflow flushing
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 07/70] io_uring: return cancelation status from poll/timeout/files handlers Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 09/70] io_uring: dont rely on weak ->files references Greg Kroah-Hartman
                   ` (64 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Pavel Begunkov, Jens Axboe

From: Jens Axboe <axboe@kernel.dk>

commit e6c8aa9ac33bd7c968af7816240fc081401fddcd upstream.

This allows us to selectively flush out pending overflows, depending on
the task and/or files_struct being passed in.

No intended functional changes in this patch.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io_uring.c |   41 ++++++++++++++++++++++++++---------------
 1 file changed, 26 insertions(+), 15 deletions(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1240,12 +1240,24 @@ static void io_cqring_ev_posted(struct i
 		eventfd_signal(ctx->cq_ev_fd, 1);
 }
 
+static inline bool io_match_files(struct io_kiocb *req,
+				       struct files_struct *files)
+{
+	if (!files)
+		return true;
+	if (req->flags & REQ_F_WORK_INITIALIZED)
+		return req->work.files == files;
+	return false;
+}
+
 /* Returns true if there are no backlogged entries after the flush */
-static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
+static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force,
+				     struct task_struct *tsk,
+				     struct files_struct *files)
 {
 	struct io_rings *rings = ctx->rings;
+	struct io_kiocb *req, *tmp;
 	struct io_uring_cqe *cqe;
-	struct io_kiocb *req;
 	unsigned long flags;
 	LIST_HEAD(list);
 
@@ -1264,7 +1276,12 @@ static bool io_cqring_overflow_flush(str
 		ctx->cq_overflow_flushed = 1;
 
 	cqe = NULL;
-	while (!list_empty(&ctx->cq_overflow_list)) {
+	list_for_each_entry_safe(req, tmp, &ctx->cq_overflow_list, list) {
+		if (tsk && req->task != tsk)
+			continue;
+		if (!io_match_files(req, files))
+			continue;
+
 		cqe = io_get_cqring(ctx);
 		if (!cqe && !force)
 			break;
@@ -1734,7 +1751,7 @@ static unsigned io_cqring_events(struct
 		if (noflush && !list_empty(&ctx->cq_overflow_list))
 			return -1U;
 
-		io_cqring_overflow_flush(ctx, false);
+		io_cqring_overflow_flush(ctx, false, NULL, NULL);
 	}
 
 	/* See comment at the top of this file */
@@ -6095,7 +6112,7 @@ static int io_submit_sqes(struct io_ring
 	/* if we have a backlog and couldn't flush it all, return BUSY */
 	if (test_bit(0, &ctx->sq_check_overflow)) {
 		if (!list_empty(&ctx->cq_overflow_list) &&
-		    !io_cqring_overflow_flush(ctx, false))
+		    !io_cqring_overflow_flush(ctx, false, NULL, NULL))
 			return -EBUSY;
 	}
 
@@ -7556,7 +7573,7 @@ static void io_ring_exit_work(struct wor
 
 	ctx = container_of(work, struct io_ring_ctx, exit_work);
 	if (ctx->rings)
-		io_cqring_overflow_flush(ctx, true);
+		io_cqring_overflow_flush(ctx, true, NULL, NULL);
 
 	/*
 	 * If we're doing polled IO and end up having requests being
@@ -7567,7 +7584,7 @@ static void io_ring_exit_work(struct wor
 	while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20)) {
 		io_iopoll_reap_events(ctx);
 		if (ctx->rings)
-			io_cqring_overflow_flush(ctx, true);
+			io_cqring_overflow_flush(ctx, true, NULL, NULL);
 	}
 	io_ring_ctx_free(ctx);
 }
@@ -7587,7 +7604,7 @@ static void io_ring_ctx_wait_and_kill(st
 	io_iopoll_reap_events(ctx);
 	/* if we failed setting up the ctx, we might not have any rings */
 	if (ctx->rings)
-		io_cqring_overflow_flush(ctx, true);
+		io_cqring_overflow_flush(ctx, true, NULL, NULL);
 	idr_for_each(&ctx->personality_idr, io_remove_personalities, ctx);
 
 	/*
@@ -7637,12 +7654,6 @@ static bool io_match_link(struct io_kioc
 	return false;
 }
 
-static inline bool io_match_files(struct io_kiocb *req,
-				       struct files_struct *files)
-{
-	return (req->flags & REQ_F_WORK_INITIALIZED) && req->work.files == files;
-}
-
 static bool io_match_link_files(struct io_kiocb *req,
 				struct files_struct *files)
 {
@@ -7959,7 +7970,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned
 	ret = 0;
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
 		if (!list_empty_careful(&ctx->cq_overflow_list))
-			io_cqring_overflow_flush(ctx, false);
+			io_cqring_overflow_flush(ctx, false, NULL, NULL);
 		if (flags & IORING_ENTER_SQ_WAKEUP)
 			wake_up(&ctx->sqo_wait);
 		submitted = to_submit;



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 09/70] io_uring: dont rely on weak ->files references
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 08/70] io_uring: enable task/files specific overflow flushing Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 10/70] io_uring: reference ->nsproxy for file table commands Greg Kroah-Hartman
                   ` (63 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Pavel Begunkov, Jens Axboe

From: Jens Axboe <axboe@kernel.dk>

commit 0f2122045b946241a9e549c2a76cea54fa58a7ff upstream.

Grab actual references to the files_struct. To avoid circular references
issues due to this, we add a per-task note that keeps track of what
io_uring contexts a task has used. When the tasks execs or exits its
assigned files, we cancel requests based on this tracking.

With that, we can grab proper references to the files table, and no
longer need to rely on stashing away ring_fd and ring_file to check
if the ring_fd may have been closed.

Cc: stable@vger.kernel.org # v5.5+
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/exec.c                |    6 
 fs/file.c                |    2 
 fs/io_uring.c            |  301 +++++++++++++++++++++++++++++++++++++++++------
 include/linux/io_uring.h |   53 ++++++++
 include/linux/sched.h    |    5 
 init/init_task.c         |    3 
 kernel/fork.c            |    6 
 7 files changed, 340 insertions(+), 36 deletions(-)
 create mode 100644 include/linux/io_uring.h

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -62,6 +62,7 @@
 #include <linux/oom.h>
 #include <linux/compat.h>
 #include <linux/vmalloc.h>
+#include <linux/io_uring.h>
 
 #include <linux/uaccess.h>
 #include <asm/mmu_context.h>
@@ -1847,6 +1848,11 @@ static int __do_execve_file(int fd, stru
 	 * further execve() calls fail. */
 	current->flags &= ~PF_NPROC_EXCEEDED;
 
+	/*
+	 * Cancel any io_uring activity across execve
+	 */
+	io_uring_task_cancel();
+
 	retval = unshare_files(&displaced);
 	if (retval)
 		goto out_ret;
--- a/fs/file.c
+++ b/fs/file.c
@@ -18,6 +18,7 @@
 #include <linux/bitops.h>
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
+#include <linux/io_uring.h>
 
 unsigned int sysctl_nr_open __read_mostly = 1024*1024;
 unsigned int sysctl_nr_open_min = BITS_PER_LONG;
@@ -439,6 +440,7 @@ void exit_files(struct task_struct *tsk)
 	struct files_struct * files = tsk->files;
 
 	if (files) {
+		io_uring_files_cancel(files);
 		task_lock(tsk);
 		tsk->files = NULL;
 		task_unlock(tsk);
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -78,6 +78,7 @@
 #include <linux/fs_struct.h>
 #include <linux/splice.h>
 #include <linux/task_work.h>
+#include <linux/io_uring.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
@@ -283,8 +284,6 @@ struct io_ring_ctx {
 	 */
 	struct fixed_file_data	*file_data;
 	unsigned		nr_user_files;
-	int 			ring_fd;
-	struct file 		*ring_file;
 
 	/* if used, fixed mapped user buffers */
 	unsigned		nr_user_bufs;
@@ -1335,7 +1334,12 @@ static void __io_cqring_fill_event(struc
 		WRITE_ONCE(cqe->user_data, req->user_data);
 		WRITE_ONCE(cqe->res, res);
 		WRITE_ONCE(cqe->flags, cflags);
-	} else if (ctx->cq_overflow_flushed) {
+	} else if (ctx->cq_overflow_flushed || req->task->io_uring->in_idle) {
+		/*
+		 * If we're in ring overflow flush mode, or in task cancel mode,
+		 * then we cannot store the request for later flushing, we need
+		 * to drop it on the floor.
+		 */
 		WRITE_ONCE(ctx->rings->cq_overflow,
 				atomic_inc_return(&ctx->cached_cq_overflow));
 	} else {
@@ -1451,17 +1455,22 @@ static void io_req_drop_files(struct io_
 		wake_up(&ctx->inflight_wait);
 	spin_unlock_irqrestore(&ctx->inflight_lock, flags);
 	req->flags &= ~REQ_F_INFLIGHT;
+	put_files_struct(req->work.files);
 	req->work.files = NULL;
 }
 
 static void __io_req_aux_free(struct io_kiocb *req)
 {
+	struct io_uring_task *tctx = req->task->io_uring;
 	if (req->flags & REQ_F_NEED_CLEANUP)
 		io_cleanup_req(req);
 
 	kfree(req->io);
 	if (req->file)
 		io_put_file(req, req->file, (req->flags & REQ_F_FIXED_FILE));
+	atomic_long_inc(&tctx->req_complete);
+	if (tctx->in_idle)
+		wake_up(&tctx->wait);
 	put_task_struct(req->task);
 	io_req_work_drop_env(req);
 }
@@ -3532,8 +3541,7 @@ static int io_close_prep(struct io_kiocb
 		return -EBADF;
 
 	req->close.fd = READ_ONCE(sqe->fd);
-	if ((req->file && req->file->f_op == &io_uring_fops) ||
-	    req->close.fd == req->ctx->ring_fd)
+	if ((req->file && req->file->f_op == &io_uring_fops))
 		return -EBADF;
 
 	req->close.put_file = NULL;
@@ -5671,32 +5679,18 @@ static int io_req_set_file(struct io_sub
 
 static int io_grab_files(struct io_kiocb *req)
 {
-	int ret = -EBADF;
 	struct io_ring_ctx *ctx = req->ctx;
 
 	if (req->work.files || (req->flags & REQ_F_NO_FILE_TABLE))
 		return 0;
-	if (!ctx->ring_file)
-		return -EBADF;
 
-	rcu_read_lock();
+	req->work.files = get_files_struct(current);
+	req->flags |= REQ_F_INFLIGHT;
+
 	spin_lock_irq(&ctx->inflight_lock);
-	/*
-	 * We use the f_ops->flush() handler to ensure that we can flush
-	 * out work accessing these files if the fd is closed. Check if
-	 * the fd has changed since we started down this path, and disallow
-	 * this operation if it has.
-	 */
-	if (fcheck(ctx->ring_fd) == ctx->ring_file) {
-		list_add(&req->inflight_entry, &ctx->inflight_list);
-		req->flags |= REQ_F_INFLIGHT;
-		req->work.files = current->files;
-		ret = 0;
-	}
+	list_add(&req->inflight_entry, &ctx->inflight_list);
 	spin_unlock_irq(&ctx->inflight_lock);
-	rcu_read_unlock();
-
-	return ret;
+	return 0;
 }
 
 static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer)
@@ -6067,6 +6061,7 @@ static int io_init_req(struct io_ring_ct
 	refcount_set(&req->refs, 2);
 	req->task = current;
 	get_task_struct(req->task);
+	atomic_long_inc(&req->task->io_uring->req_issue);
 	req->result = 0;
 
 	if (unlikely(req->opcode >= IORING_OP_LAST))
@@ -6102,8 +6097,7 @@ static int io_init_req(struct io_ring_ct
 	return io_req_set_file(state, req, READ_ONCE(sqe->fd));
 }
 
-static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
-			  struct file *ring_file, int ring_fd)
+static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 {
 	struct io_submit_state state, *statep = NULL;
 	struct io_kiocb *link = NULL;
@@ -6127,9 +6121,6 @@ static int io_submit_sqes(struct io_ring
 		statep = &state;
 	}
 
-	ctx->ring_fd = ring_fd;
-	ctx->ring_file = ring_file;
-
 	for (i = 0; i < nr; i++) {
 		const struct io_uring_sqe *sqe;
 		struct io_kiocb *req;
@@ -6290,7 +6281,7 @@ static int io_sq_thread(void *data)
 
 		mutex_lock(&ctx->uring_lock);
 		if (likely(!percpu_ref_is_dying(&ctx->refs)))
-			ret = io_submit_sqes(ctx, to_submit, NULL, -1);
+			ret = io_submit_sqes(ctx, to_submit);
 		mutex_unlock(&ctx->uring_lock);
 		timeout = jiffies + ctx->sq_thread_idle;
 	}
@@ -7119,6 +7110,34 @@ out_fput:
 	return ret;
 }
 
+static int io_uring_alloc_task_context(struct task_struct *task)
+{
+	struct io_uring_task *tctx;
+
+	tctx = kmalloc(sizeof(*tctx), GFP_KERNEL);
+	if (unlikely(!tctx))
+		return -ENOMEM;
+
+	xa_init(&tctx->xa);
+	init_waitqueue_head(&tctx->wait);
+	tctx->last = NULL;
+	tctx->in_idle = 0;
+	atomic_long_set(&tctx->req_issue, 0);
+	atomic_long_set(&tctx->req_complete, 0);
+	task->io_uring = tctx;
+	return 0;
+}
+
+void __io_uring_free(struct task_struct *tsk)
+{
+	struct io_uring_task *tctx = tsk->io_uring;
+
+	WARN_ON_ONCE(!xa_empty(&tctx->xa));
+	xa_destroy(&tctx->xa);
+	kfree(tctx);
+	tsk->io_uring = NULL;
+}
+
 static int io_sq_offload_start(struct io_ring_ctx *ctx,
 			       struct io_uring_params *p)
 {
@@ -7154,6 +7173,9 @@ static int io_sq_offload_start(struct io
 			ctx->sqo_thread = NULL;
 			goto err;
 		}
+		ret = io_uring_alloc_task_context(ctx->sqo_thread);
+		if (ret)
+			goto err;
 		wake_up_process(ctx->sqo_thread);
 	} else if (p->flags & IORING_SETUP_SQ_AFF) {
 		/* Can't have SQ_AFF without SQPOLL */
@@ -7633,7 +7655,7 @@ static bool io_wq_files_match(struct io_
 {
 	struct files_struct *files = data;
 
-	return work->files == files;
+	return !files || work->files == files;
 }
 
 /*
@@ -7787,7 +7809,7 @@ static bool io_uring_cancel_files(struct
 
 		spin_lock_irq(&ctx->inflight_lock);
 		list_for_each_entry(req, &ctx->inflight_list, inflight_entry) {
-			if (req->work.files != files)
+			if (files && req->work.files != files)
 				continue;
 			/* req is being completed, ignore */
 			if (!refcount_inc_not_zero(&req->refs))
@@ -7850,18 +7872,217 @@ static bool io_cancel_task_cb(struct io_
 	return io_task_match(req, task);
 }
 
+static bool __io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
+					    struct task_struct *task,
+					    struct files_struct *files)
+{
+	bool ret;
+
+	ret = io_uring_cancel_files(ctx, files);
+	if (!files) {
+		enum io_wq_cancel cret;
+
+		cret = io_wq_cancel_cb(ctx->io_wq, io_cancel_task_cb, task, true);
+		if (cret != IO_WQ_CANCEL_NOTFOUND)
+			ret = true;
+
+		/* SQPOLL thread does its own polling */
+		if (!(ctx->flags & IORING_SETUP_SQPOLL)) {
+			if (!list_empty_careful(&ctx->poll_list)) {
+				io_iopoll_reap_events(ctx);
+				ret = true;
+			}
+		}
+
+		ret |= io_poll_remove_all(ctx, task);
+		ret |= io_kill_timeouts(ctx, task);
+	}
+
+	return ret;
+}
+
+/*
+ * We need to iteratively cancel requests, in case a request has dependent
+ * hard links. These persist even for failure of cancelations, hence keep
+ * looping until none are found.
+ */
+static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
+					  struct files_struct *files)
+{
+	struct task_struct *task = current;
+
+	if (ctx->flags & IORING_SETUP_SQPOLL)
+		task = ctx->sqo_thread;
+
+	io_cqring_overflow_flush(ctx, true, task, files);
+
+	while (__io_uring_cancel_task_requests(ctx, task, files)) {
+		io_run_task_work();
+		cond_resched();
+	}
+}
+
+/*
+ * Note that this task has used io_uring. We use it for cancelation purposes.
+ */
+static int io_uring_add_task_file(struct file *file)
+{
+	if (unlikely(!current->io_uring)) {
+		int ret;
+
+		ret = io_uring_alloc_task_context(current);
+		if (unlikely(ret))
+			return ret;
+	}
+	if (current->io_uring->last != file) {
+		XA_STATE(xas, &current->io_uring->xa, (unsigned long) file);
+		void *old;
+
+		rcu_read_lock();
+		old = xas_load(&xas);
+		if (old != file) {
+			get_file(file);
+			xas_lock(&xas);
+			xas_store(&xas, file);
+			xas_unlock(&xas);
+		}
+		rcu_read_unlock();
+		current->io_uring->last = file;
+	}
+
+	return 0;
+}
+
+/*
+ * Remove this io_uring_file -> task mapping.
+ */
+static void io_uring_del_task_file(struct file *file)
+{
+	struct io_uring_task *tctx = current->io_uring;
+	XA_STATE(xas, &tctx->xa, (unsigned long) file);
+
+	if (tctx->last == file)
+		tctx->last = NULL;
+
+	xas_lock(&xas);
+	file = xas_store(&xas, NULL);
+	xas_unlock(&xas);
+
+	if (file)
+		fput(file);
+}
+
+static void __io_uring_attempt_task_drop(struct file *file)
+{
+	XA_STATE(xas, &current->io_uring->xa, (unsigned long) file);
+	struct file *old;
+
+	rcu_read_lock();
+	old = xas_load(&xas);
+	rcu_read_unlock();
+
+	if (old == file)
+		io_uring_del_task_file(file);
+}
+
+/*
+ * Drop task note for this file if we're the only ones that hold it after
+ * pending fput()
+ */
+static void io_uring_attempt_task_drop(struct file *file, bool exiting)
+{
+	if (!current->io_uring)
+		return;
+	/*
+	 * fput() is pending, will be 2 if the only other ref is our potential
+	 * task file note. If the task is exiting, drop regardless of count.
+	 */
+	if (!exiting && atomic_long_read(&file->f_count) != 2)
+		return;
+
+	__io_uring_attempt_task_drop(file);
+}
+
+void __io_uring_files_cancel(struct files_struct *files)
+{
+	struct io_uring_task *tctx = current->io_uring;
+	XA_STATE(xas, &tctx->xa, 0);
+
+	/* make sure overflow events are dropped */
+	tctx->in_idle = true;
+
+	do {
+		struct io_ring_ctx *ctx;
+		struct file *file;
+
+		xas_lock(&xas);
+		file = xas_next_entry(&xas, ULONG_MAX);
+		xas_unlock(&xas);
+
+		if (!file)
+			break;
+
+		ctx = file->private_data;
+
+		io_uring_cancel_task_requests(ctx, files);
+		if (files)
+			io_uring_del_task_file(file);
+	} while (1);
+}
+
+static inline bool io_uring_task_idle(struct io_uring_task *tctx)
+{
+	return atomic_long_read(&tctx->req_issue) ==
+		atomic_long_read(&tctx->req_complete);
+}
+
+/*
+ * Find any io_uring fd that this task has registered or done IO on, and cancel
+ * requests.
+ */
+void __io_uring_task_cancel(void)
+{
+	struct io_uring_task *tctx = current->io_uring;
+	DEFINE_WAIT(wait);
+	long completions;
+
+	/* make sure overflow events are dropped */
+	tctx->in_idle = true;
+
+	while (!io_uring_task_idle(tctx)) {
+		/* read completions before cancelations */
+		completions = atomic_long_read(&tctx->req_complete);
+		__io_uring_files_cancel(NULL);
+
+		prepare_to_wait(&tctx->wait, &wait, TASK_UNINTERRUPTIBLE);
+
+		/*
+		 * If we've seen completions, retry. This avoids a race where
+		 * a completion comes in before we did prepare_to_wait().
+		 */
+		if (completions != atomic_long_read(&tctx->req_complete))
+			continue;
+		if (io_uring_task_idle(tctx))
+			break;
+		schedule();
+	}
+
+	finish_wait(&tctx->wait, &wait);
+	tctx->in_idle = false;
+}
+
 static int io_uring_flush(struct file *file, void *data)
 {
 	struct io_ring_ctx *ctx = file->private_data;
 
-	io_uring_cancel_files(ctx, data);
-
 	/*
 	 * If the task is going away, cancel work it may have pending
 	 */
 	if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
-		io_wq_cancel_cb(ctx->io_wq, io_cancel_task_cb, current, true);
+		data = NULL;
 
+	io_uring_cancel_task_requests(ctx, data);
+	io_uring_attempt_task_drop(file, !data);
 	return 0;
 }
 
@@ -7975,8 +8196,11 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned
 			wake_up(&ctx->sqo_wait);
 		submitted = to_submit;
 	} else if (to_submit) {
+		ret = io_uring_add_task_file(f.file);
+		if (unlikely(ret))
+			goto out;
 		mutex_lock(&ctx->uring_lock);
-		submitted = io_submit_sqes(ctx, to_submit, f.file, fd);
+		submitted = io_submit_sqes(ctx, to_submit);
 		mutex_unlock(&ctx->uring_lock);
 
 		if (submitted != to_submit)
@@ -8188,6 +8412,7 @@ static int io_uring_get_fd(struct io_rin
 	file = anon_inode_getfile("[io_uring]", &io_uring_fops, ctx,
 					O_RDWR | O_CLOEXEC);
 	if (IS_ERR(file)) {
+err_fd:
 		put_unused_fd(ret);
 		ret = PTR_ERR(file);
 		goto err;
@@ -8196,6 +8421,10 @@ static int io_uring_get_fd(struct io_rin
 #if defined(CONFIG_UNIX)
 	ctx->ring_sock->file = file;
 #endif
+	if (unlikely(io_uring_add_task_file(file))) {
+		file = ERR_PTR(-ENOMEM);
+		goto err_fd;
+	}
 	fd_install(ret, file);
 	return ret;
 err:
--- /dev/null
+++ b/include/linux/io_uring.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _LINUX_IO_URING_H
+#define _LINUX_IO_URING_H
+
+#include <linux/sched.h>
+#include <linux/xarray.h>
+#include <linux/percpu-refcount.h>
+
+struct io_uring_task {
+	/* submission side */
+	struct xarray		xa;
+	struct wait_queue_head	wait;
+	struct file		*last;
+	atomic_long_t		req_issue;
+
+	/* completion side */
+	bool			in_idle ____cacheline_aligned_in_smp;
+	atomic_long_t		req_complete;
+};
+
+#if defined(CONFIG_IO_URING)
+void __io_uring_task_cancel(void);
+void __io_uring_files_cancel(struct files_struct *files);
+void __io_uring_free(struct task_struct *tsk);
+
+static inline void io_uring_task_cancel(void)
+{
+	if (current->io_uring && !xa_empty(&current->io_uring->xa))
+		__io_uring_task_cancel();
+}
+static inline void io_uring_files_cancel(struct files_struct *files)
+{
+	if (current->io_uring && !xa_empty(&current->io_uring->xa))
+		__io_uring_files_cancel(files);
+}
+static inline void io_uring_free(struct task_struct *tsk)
+{
+	if (tsk->io_uring)
+		__io_uring_free(tsk);
+}
+#else
+static inline void io_uring_task_cancel(void)
+{
+}
+static inline void io_uring_files_cancel(struct files_struct *files)
+{
+}
+static inline void io_uring_free(struct task_struct *tsk)
+{
+}
+#endif
+
+#endif
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -61,6 +61,7 @@ struct sighand_struct;
 struct signal_struct;
 struct task_delay_info;
 struct task_group;
+struct io_uring_task;
 
 /*
  * Task state bitmask. NOTE! These bits are also
@@ -923,6 +924,10 @@ struct task_struct {
 	/* Open file information: */
 	struct files_struct		*files;
 
+#ifdef CONFIG_IO_URING
+	struct io_uring_task		*io_uring;
+#endif
+
 	/* Namespaces: */
 	struct nsproxy			*nsproxy;
 
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -113,6 +113,9 @@ struct task_struct init_task
 	.thread		= INIT_THREAD,
 	.fs		= &init_fs,
 	.files		= &init_files,
+#ifdef CONFIG_IO_URING
+	.io_uring	= NULL,
+#endif
 	.signal		= &init_signals,
 	.sighand	= &init_sighand,
 	.nsproxy	= &init_nsproxy,
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -95,6 +95,7 @@
 #include <linux/stackleak.h>
 #include <linux/kasan.h>
 #include <linux/scs.h>
+#include <linux/io_uring.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -745,6 +746,7 @@ void __put_task_struct(struct task_struc
 	WARN_ON(refcount_read(&tsk->usage));
 	WARN_ON(tsk == current);
 
+	io_uring_free(tsk);
 	cgroup_free(tsk);
 	task_numa_free(tsk, true);
 	security_task_free(tsk);
@@ -2022,6 +2024,10 @@ static __latent_entropy struct task_stru
 	p->vtime.state = VTIME_INACTIVE;
 #endif
 
+#ifdef CONFIG_IO_URING
+	p->io_uring = NULL;
+#endif
+
 #if defined(SPLIT_RSS_COUNTING)
 	memset(&p->rss_stat, 0, sizeof(p->rss_stat));
 #endif



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 10/70] io_uring: reference ->nsproxy for file table commands
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 09/70] io_uring: dont rely on weak ->files references Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 11/70] io_wq: Make io_wqe::lock a raw_spinlock_t Greg Kroah-Hartman
                   ` (62 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Al Viro, Jens Axboe

From: Jens Axboe <axboe@kernel.dk>

commit 9b8284921513fc1ea57d87777283a59b05862f03 upstream.

If we don't get and assign the namespace for the async work, then certain
paths just don't work properly (like /dev/stdin, /proc/mounts, etc).
Anything that references the current namespace of the given task should
be assigned for async work on behalf of that task.

Cc: stable@vger.kernel.org # v5.5+
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io-wq.c    |    4 ++++
 fs/io-wq.h    |    1 +
 fs/io_uring.c |    3 +++
 3 files changed, 8 insertions(+)

--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -60,6 +60,7 @@ struct io_worker {
 	const struct cred *cur_creds;
 	const struct cred *saved_creds;
 	struct files_struct *restore_files;
+	struct nsproxy *restore_nsproxy;
 	struct fs_struct *restore_fs;
 };
 
@@ -153,6 +154,7 @@ static bool __io_worker_unuse(struct io_
 
 		task_lock(current);
 		current->files = worker->restore_files;
+		current->nsproxy = worker->restore_nsproxy;
 		task_unlock(current);
 	}
 
@@ -318,6 +320,7 @@ static void io_worker_start(struct io_wq
 
 	worker->flags |= (IO_WORKER_F_UP | IO_WORKER_F_RUNNING);
 	worker->restore_files = current->files;
+	worker->restore_nsproxy = current->nsproxy;
 	worker->restore_fs = current->fs;
 	io_wqe_inc_running(wqe, worker);
 }
@@ -454,6 +457,7 @@ static void io_impersonate_work(struct i
 	if (work->files && current->files != work->files) {
 		task_lock(current);
 		current->files = work->files;
+		current->nsproxy = work->nsproxy;
 		task_unlock(current);
 	}
 	if (work->fs && current->fs != work->fs)
--- a/fs/io-wq.h
+++ b/fs/io-wq.h
@@ -88,6 +88,7 @@ struct io_wq_work {
 	struct files_struct *files;
 	struct mm_struct *mm;
 	const struct cred *creds;
+	struct nsproxy *nsproxy;
 	struct fs_struct *fs;
 	unsigned flags;
 };
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1456,6 +1456,7 @@ static void io_req_drop_files(struct io_
 	spin_unlock_irqrestore(&ctx->inflight_lock, flags);
 	req->flags &= ~REQ_F_INFLIGHT;
 	put_files_struct(req->work.files);
+	put_nsproxy(req->work.nsproxy);
 	req->work.files = NULL;
 }
 
@@ -5685,6 +5686,8 @@ static int io_grab_files(struct io_kiocb
 		return 0;
 
 	req->work.files = get_files_struct(current);
+	get_nsproxy(current->nsproxy);
+	req->work.nsproxy = current->nsproxy;
 	req->flags |= REQ_F_INFLIGHT;
 
 	spin_lock_irq(&ctx->inflight_lock);



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 11/70] io_wq: Make io_wqe::lock a raw_spinlock_t
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 10/70] io_uring: reference ->nsproxy for file table commands Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 12/70] io-wq: fix use-after-free in io_wq_worker_running Greg Kroah-Hartman
                   ` (61 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sebastian Andrzej Siewior, Jens Axboe

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

commit 95da84659226d75698a1ab958be0af21d9cc2a9c upstream.

During a context switch the scheduler invokes wq_worker_sleeping() with
disabled preemption. Disabling preemption is needed because it protects
access to `worker->sleeping'. As an optimisation it avoids invoking
schedule() within the schedule path as part of possible wake up (thus
preempt_enable_no_resched() afterwards).

The io-wq has been added to the mix in the same section with disabled
preemption. This breaks on PREEMPT_RT because io_wq_worker_sleeping()
acquires a spinlock_t. Also within the schedule() the spinlock_t must be
acquired after tsk_is_pi_blocked() otherwise it will block on the
sleeping lock again while scheduling out.

While playing with `io_uring-bench' I didn't notice a significant
latency spike after converting io_wqe::lock to a raw_spinlock_t. The
latency was more or less the same.

In order to keep the spinlock_t it would have to be moved after the
tsk_is_pi_blocked() check which would introduce a branch instruction
into the hot path.

The lock is used to maintain the `work_list' and wakes one task up at
most.
Should io_wqe_cancel_pending_work() cause latency spikes, while
searching for a specific item, then it would need to drop the lock
during iterations.
revert_creds() is also invoked under the lock. According to debug
cred::non_rcu is 0. Otherwise it should be moved outside of the locked
section because put_cred_rcu()->free_uid() acquires a sleeping lock.

Convert io_wqe::lock to a raw_spinlock_t.c

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io-wq.c |   52 ++++++++++++++++++++++++++--------------------------
 1 file changed, 26 insertions(+), 26 deletions(-)

--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -88,7 +88,7 @@ enum {
  */
 struct io_wqe {
 	struct {
-		spinlock_t lock;
+		raw_spinlock_t lock;
 		struct io_wq_work_list work_list;
 		unsigned long hash_map;
 		unsigned flags;
@@ -149,7 +149,7 @@ static bool __io_worker_unuse(struct io_
 
 	if (current->files != worker->restore_files) {
 		__acquire(&wqe->lock);
-		spin_unlock_irq(&wqe->lock);
+		raw_spin_unlock_irq(&wqe->lock);
 		dropped_lock = true;
 
 		task_lock(current);
@@ -168,7 +168,7 @@ static bool __io_worker_unuse(struct io_
 	if (worker->mm) {
 		if (!dropped_lock) {
 			__acquire(&wqe->lock);
-			spin_unlock_irq(&wqe->lock);
+			raw_spin_unlock_irq(&wqe->lock);
 			dropped_lock = true;
 		}
 		__set_current_state(TASK_RUNNING);
@@ -222,17 +222,17 @@ static void io_worker_exit(struct io_wor
 	worker->flags = 0;
 	preempt_enable();
 
-	spin_lock_irq(&wqe->lock);
+	raw_spin_lock_irq(&wqe->lock);
 	hlist_nulls_del_rcu(&worker->nulls_node);
 	list_del_rcu(&worker->all_list);
 	if (__io_worker_unuse(wqe, worker)) {
 		__release(&wqe->lock);
-		spin_lock_irq(&wqe->lock);
+		raw_spin_lock_irq(&wqe->lock);
 	}
 	acct->nr_workers--;
 	nr_workers = wqe->acct[IO_WQ_ACCT_BOUND].nr_workers +
 			wqe->acct[IO_WQ_ACCT_UNBOUND].nr_workers;
-	spin_unlock_irq(&wqe->lock);
+	raw_spin_unlock_irq(&wqe->lock);
 
 	/* all workers gone, wq exit can proceed */
 	if (!nr_workers && refcount_dec_and_test(&wqe->wq->refs))
@@ -508,7 +508,7 @@ get_next:
 		else if (!wq_list_empty(&wqe->work_list))
 			wqe->flags |= IO_WQE_FLAG_STALLED;
 
-		spin_unlock_irq(&wqe->lock);
+		raw_spin_unlock_irq(&wqe->lock);
 		if (!work)
 			break;
 		io_assign_current_work(worker, work);
@@ -543,7 +543,7 @@ get_next:
 				io_wqe_enqueue(wqe, linked);
 
 			if (hash != -1U && !next_hashed) {
-				spin_lock_irq(&wqe->lock);
+				raw_spin_lock_irq(&wqe->lock);
 				wqe->hash_map &= ~BIT_ULL(hash);
 				wqe->flags &= ~IO_WQE_FLAG_STALLED;
 				/* dependent work is not hashed */
@@ -551,11 +551,11 @@ get_next:
 				/* skip unnecessary unlock-lock wqe->lock */
 				if (!work)
 					goto get_next;
-				spin_unlock_irq(&wqe->lock);
+				raw_spin_unlock_irq(&wqe->lock);
 			}
 		} while (work);
 
-		spin_lock_irq(&wqe->lock);
+		raw_spin_lock_irq(&wqe->lock);
 	} while (1);
 }
 
@@ -570,7 +570,7 @@ static int io_wqe_worker(void *data)
 	while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
 		set_current_state(TASK_INTERRUPTIBLE);
 loop:
-		spin_lock_irq(&wqe->lock);
+		raw_spin_lock_irq(&wqe->lock);
 		if (io_wqe_run_queue(wqe)) {
 			__set_current_state(TASK_RUNNING);
 			io_worker_handle_work(worker);
@@ -581,7 +581,7 @@ loop:
 			__release(&wqe->lock);
 			goto loop;
 		}
-		spin_unlock_irq(&wqe->lock);
+		raw_spin_unlock_irq(&wqe->lock);
 		if (signal_pending(current))
 			flush_signals(current);
 		if (schedule_timeout(WORKER_IDLE_TIMEOUT))
@@ -593,11 +593,11 @@ loop:
 	}
 
 	if (test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
-		spin_lock_irq(&wqe->lock);
+		raw_spin_lock_irq(&wqe->lock);
 		if (!wq_list_empty(&wqe->work_list))
 			io_worker_handle_work(worker);
 		else
-			spin_unlock_irq(&wqe->lock);
+			raw_spin_unlock_irq(&wqe->lock);
 	}
 
 	io_worker_exit(worker);
@@ -637,9 +637,9 @@ void io_wq_worker_sleeping(struct task_s
 
 	worker->flags &= ~IO_WORKER_F_RUNNING;
 
-	spin_lock_irq(&wqe->lock);
+	raw_spin_lock_irq(&wqe->lock);
 	io_wqe_dec_running(wqe, worker);
-	spin_unlock_irq(&wqe->lock);
+	raw_spin_unlock_irq(&wqe->lock);
 }
 
 static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
@@ -663,7 +663,7 @@ static bool create_io_worker(struct io_w
 		return false;
 	}
 
-	spin_lock_irq(&wqe->lock);
+	raw_spin_lock_irq(&wqe->lock);
 	hlist_nulls_add_head_rcu(&worker->nulls_node, &wqe->free_list);
 	list_add_tail_rcu(&worker->all_list, &wqe->all_list);
 	worker->flags |= IO_WORKER_F_FREE;
@@ -672,7 +672,7 @@ static bool create_io_worker(struct io_w
 	if (!acct->nr_workers && (worker->flags & IO_WORKER_F_BOUND))
 		worker->flags |= IO_WORKER_F_FIXED;
 	acct->nr_workers++;
-	spin_unlock_irq(&wqe->lock);
+	raw_spin_unlock_irq(&wqe->lock);
 
 	if (index == IO_WQ_ACCT_UNBOUND)
 		atomic_inc(&wq->user->processes);
@@ -727,12 +727,12 @@ static int io_wq_manager(void *data)
 			if (!node_online(node))
 				continue;
 
-			spin_lock_irq(&wqe->lock);
+			raw_spin_lock_irq(&wqe->lock);
 			if (io_wqe_need_worker(wqe, IO_WQ_ACCT_BOUND))
 				fork_worker[IO_WQ_ACCT_BOUND] = true;
 			if (io_wqe_need_worker(wqe, IO_WQ_ACCT_UNBOUND))
 				fork_worker[IO_WQ_ACCT_UNBOUND] = true;
-			spin_unlock_irq(&wqe->lock);
+			raw_spin_unlock_irq(&wqe->lock);
 			if (fork_worker[IO_WQ_ACCT_BOUND])
 				create_io_worker(wq, wqe, IO_WQ_ACCT_BOUND);
 			if (fork_worker[IO_WQ_ACCT_UNBOUND])
@@ -829,10 +829,10 @@ static void io_wqe_enqueue(struct io_wqe
 	}
 
 	work_flags = work->flags;
-	spin_lock_irqsave(&wqe->lock, flags);
+	raw_spin_lock_irqsave(&wqe->lock, flags);
 	io_wqe_insert_work(wqe, work);
 	wqe->flags &= ~IO_WQE_FLAG_STALLED;
-	spin_unlock_irqrestore(&wqe->lock, flags);
+	raw_spin_unlock_irqrestore(&wqe->lock, flags);
 
 	if ((work_flags & IO_WQ_WORK_CONCURRENT) ||
 	    !atomic_read(&acct->nr_running))
@@ -959,13 +959,13 @@ static void io_wqe_cancel_pending_work(s
 	unsigned long flags;
 
 retry:
-	spin_lock_irqsave(&wqe->lock, flags);
+	raw_spin_lock_irqsave(&wqe->lock, flags);
 	wq_list_for_each(node, prev, &wqe->work_list) {
 		work = container_of(node, struct io_wq_work, list);
 		if (!match->fn(work, match->data))
 			continue;
 		io_wqe_remove_pending(wqe, work, prev);
-		spin_unlock_irqrestore(&wqe->lock, flags);
+		raw_spin_unlock_irqrestore(&wqe->lock, flags);
 		io_run_cancel(work, wqe);
 		match->nr_pending++;
 		if (!match->cancel_all)
@@ -974,7 +974,7 @@ retry:
 		/* not safe to continue after unlock */
 		goto retry;
 	}
-	spin_unlock_irqrestore(&wqe->lock, flags);
+	raw_spin_unlock_irqrestore(&wqe->lock, flags);
 }
 
 static void io_wqe_cancel_running_work(struct io_wqe *wqe,
@@ -1082,7 +1082,7 @@ struct io_wq *io_wq_create(unsigned boun
 		}
 		atomic_set(&wqe->acct[IO_WQ_ACCT_UNBOUND].nr_running, 0);
 		wqe->wq = wq;
-		spin_lock_init(&wqe->lock);
+		raw_spin_lock_init(&wqe->lock);
 		INIT_WQ_LIST(&wqe->work_list);
 		INIT_HLIST_NULLS_HEAD(&wqe->free_list, 0);
 		INIT_LIST_HEAD(&wqe->all_list);



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 12/70] io-wq: fix use-after-free in io_wq_worker_running
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 11/70] io_wq: Make io_wqe::lock a raw_spinlock_t Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 13/70] io_uring: no need to call xa_destroy() on empty xarray Greg Kroah-Hartman
                   ` (60 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+45fa0a195b941764e0f0,
	syzbot+9af99580130003da82b1, Pavel Begunkov, Hillf Danton,
	Jens Axboe

From: Hillf Danton <hdanton@sina.com>

commit c4068bf898ddaef791049a366828d9b84b467bda upstream.

The smart syzbot has found a reproducer for the following issue:

 ==================================================================
 BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline]
 BUG: KASAN: use-after-free in atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
 BUG: KASAN: use-after-free in io_wqe_inc_running fs/io-wq.c:301 [inline]
 BUG: KASAN: use-after-free in io_wq_worker_running+0xde/0x110 fs/io-wq.c:613
 Write of size 4 at addr ffff8882183db08c by task io_wqe_worker-0/7771

 CPU: 0 PID: 7771 Comm: io_wqe_worker-0 Not tainted 5.9.0-rc4-syzkaller #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x198/0x1fd lib/dump_stack.c:118
  print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
  __kasan_report mm/kasan/report.c:513 [inline]
  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
  check_memory_region_inline mm/kasan/generic.c:186 [inline]
  check_memory_region+0x13d/0x180 mm/kasan/generic.c:192
  instrument_atomic_write include/linux/instrumented.h:71 [inline]
  atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
  io_wqe_inc_running fs/io-wq.c:301 [inline]
  io_wq_worker_running+0xde/0x110 fs/io-wq.c:613
  schedule_timeout+0x148/0x250 kernel/time/timer.c:1879
  io_wqe_worker+0x517/0x10e0 fs/io-wq.c:580
  kthread+0x3b5/0x4a0 kernel/kthread.c:292
  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

 Allocated by task 7768:
  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
  kasan_set_track mm/kasan/common.c:56 [inline]
  __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
  kmem_cache_alloc_node_trace+0x17b/0x3f0 mm/slab.c:3594
  kmalloc_node include/linux/slab.h:572 [inline]
  kzalloc_node include/linux/slab.h:677 [inline]
  io_wq_create+0x57b/0xa10 fs/io-wq.c:1064
  io_init_wq_offload fs/io_uring.c:7432 [inline]
  io_sq_offload_start fs/io_uring.c:7504 [inline]
  io_uring_create fs/io_uring.c:8625 [inline]
  io_uring_setup+0x1836/0x28e0 fs/io_uring.c:8694
  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

 Freed by task 21:
  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
  kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
  kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
  __kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
  __cache_free mm/slab.c:3418 [inline]
  kfree+0x10e/0x2b0 mm/slab.c:3756
  __io_wq_destroy fs/io-wq.c:1138 [inline]
  io_wq_destroy+0x2af/0x460 fs/io-wq.c:1146
  io_finish_async fs/io_uring.c:6836 [inline]
  io_ring_ctx_free fs/io_uring.c:7870 [inline]
  io_ring_exit_work+0x1e4/0x6d0 fs/io_uring.c:7954
  process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
  worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
  kthread+0x3b5/0x4a0 kernel/kthread.c:292
  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

 The buggy address belongs to the object at ffff8882183db000
  which belongs to the cache kmalloc-1k of size 1024
 The buggy address is located 140 bytes inside of
  1024-byte region [ffff8882183db000, ffff8882183db400)
 The buggy address belongs to the page:
 page:000000009bada22b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2183db
 flags: 0x57ffe0000000200(slab)
 raw: 057ffe0000000200 ffffea0008604c48 ffffea00086a8648 ffff8880aa040700
 raw: 0000000000000000 ffff8882183db000 0000000100000002 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff8882183daf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ffff8882183db000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 >ffff8882183db080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                       ^
  ffff8882183db100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8882183db180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ==================================================================

which is down to the comment below,

	/* all workers gone, wq exit can proceed */
	if (!nr_workers && refcount_dec_and_test(&wqe->wq->refs))
		complete(&wqe->wq->done);

because there might be multiple cases of wqe in a wq and we would wait
for every worker in every wqe to go home before releasing wq's resources
on destroying.

To that end, rework wq's refcount by making it independent of the tracking
of workers because after all they are two different things, and keeping
it balanced when workers come and go. Note the manager kthread, like
other workers, now holds a grab to wq during its lifetime.

Finally to help destroy wq, check IO_WQ_BIT_EXIT upon creating worker
and do nothing for exiting wq.

Cc: stable@vger.kernel.org # v5.5+
Reported-by: syzbot+45fa0a195b941764e0f0@syzkaller.appspotmail.com
Reported-by: syzbot+9af99580130003da82b1@syzkaller.appspotmail.com
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io-wq.c |  116 ++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 58 insertions(+), 58 deletions(-)

--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -202,7 +202,6 @@ static void io_worker_exit(struct io_wor
 {
 	struct io_wqe *wqe = worker->wqe;
 	struct io_wqe_acct *acct = io_wqe_get_acct(wqe, worker);
-	unsigned nr_workers;
 
 	/*
 	 * If we're not at zero, someone else is holding a brief reference
@@ -230,15 +229,11 @@ static void io_worker_exit(struct io_wor
 		raw_spin_lock_irq(&wqe->lock);
 	}
 	acct->nr_workers--;
-	nr_workers = wqe->acct[IO_WQ_ACCT_BOUND].nr_workers +
-			wqe->acct[IO_WQ_ACCT_UNBOUND].nr_workers;
 	raw_spin_unlock_irq(&wqe->lock);
 
-	/* all workers gone, wq exit can proceed */
-	if (!nr_workers && refcount_dec_and_test(&wqe->wq->refs))
-		complete(&wqe->wq->done);
-
 	kfree_rcu(worker, rcu);
+	if (refcount_dec_and_test(&wqe->wq->refs))
+		complete(&wqe->wq->done);
 }
 
 static inline bool io_wqe_run_queue(struct io_wqe *wqe)
@@ -644,7 +639,7 @@ void io_wq_worker_sleeping(struct task_s
 
 static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
 {
-	struct io_wqe_acct *acct =&wqe->acct[index];
+	struct io_wqe_acct *acct = &wqe->acct[index];
 	struct io_worker *worker;
 
 	worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, wqe->node);
@@ -677,6 +672,7 @@ static bool create_io_worker(struct io_w
 	if (index == IO_WQ_ACCT_UNBOUND)
 		atomic_inc(&wq->user->processes);
 
+	refcount_inc(&wq->refs);
 	wake_up_process(worker->task);
 	return true;
 }
@@ -692,28 +688,63 @@ static inline bool io_wqe_need_worker(st
 	return acct->nr_workers < acct->max_workers;
 }
 
+static bool io_wqe_worker_send_sig(struct io_worker *worker, void *data)
+{
+	send_sig(SIGINT, worker->task, 1);
+	return false;
+}
+
+/*
+ * Iterate the passed in list and call the specific function for each
+ * worker that isn't exiting
+ */
+static bool io_wq_for_each_worker(struct io_wqe *wqe,
+				  bool (*func)(struct io_worker *, void *),
+				  void *data)
+{
+	struct io_worker *worker;
+	bool ret = false;
+
+	list_for_each_entry_rcu(worker, &wqe->all_list, all_list) {
+		if (io_worker_get(worker)) {
+			/* no task if node is/was offline */
+			if (worker->task)
+				ret = func(worker, data);
+			io_worker_release(worker);
+			if (ret)
+				break;
+		}
+	}
+
+	return ret;
+}
+
+static bool io_wq_worker_wake(struct io_worker *worker, void *data)
+{
+	wake_up_process(worker->task);
+	return false;
+}
+
 /*
  * Manager thread. Tasked with creating new workers, if we need them.
  */
 static int io_wq_manager(void *data)
 {
 	struct io_wq *wq = data;
-	int workers_to_create = num_possible_nodes();
 	int node;
 
 	/* create fixed workers */
-	refcount_set(&wq->refs, workers_to_create);
+	refcount_set(&wq->refs, 1);
 	for_each_node(node) {
 		if (!node_online(node))
 			continue;
-		if (!create_io_worker(wq, wq->wqes[node], IO_WQ_ACCT_BOUND))
-			goto err;
-		workers_to_create--;
+		if (create_io_worker(wq, wq->wqes[node], IO_WQ_ACCT_BOUND))
+			continue;
+		set_bit(IO_WQ_BIT_ERROR, &wq->state);
+		set_bit(IO_WQ_BIT_EXIT, &wq->state);
+		goto out;
 	}
 
-	while (workers_to_create--)
-		refcount_dec(&wq->refs);
-
 	complete(&wq->done);
 
 	while (!kthread_should_stop()) {
@@ -745,12 +776,18 @@ static int io_wq_manager(void *data)
 	if (current->task_works)
 		task_work_run();
 
-	return 0;
-err:
-	set_bit(IO_WQ_BIT_ERROR, &wq->state);
-	set_bit(IO_WQ_BIT_EXIT, &wq->state);
-	if (refcount_sub_and_test(workers_to_create, &wq->refs))
+out:
+	if (refcount_dec_and_test(&wq->refs)) {
 		complete(&wq->done);
+		return 0;
+	}
+	/* if ERROR is set and we get here, we have workers to wake */
+	if (test_bit(IO_WQ_BIT_ERROR, &wq->state)) {
+		rcu_read_lock();
+		for_each_node(node)
+			io_wq_for_each_worker(wq->wqes[node], io_wq_worker_wake, NULL);
+		rcu_read_unlock();
+	}
 	return 0;
 }
 
@@ -858,37 +895,6 @@ void io_wq_hash_work(struct io_wq_work *
 	work->flags |= (IO_WQ_WORK_HASHED | (bit << IO_WQ_HASH_SHIFT));
 }
 
-static bool io_wqe_worker_send_sig(struct io_worker *worker, void *data)
-{
-	send_sig(SIGINT, worker->task, 1);
-	return false;
-}
-
-/*
- * Iterate the passed in list and call the specific function for each
- * worker that isn't exiting
- */
-static bool io_wq_for_each_worker(struct io_wqe *wqe,
-				  bool (*func)(struct io_worker *, void *),
-				  void *data)
-{
-	struct io_worker *worker;
-	bool ret = false;
-
-	list_for_each_entry_rcu(worker, &wqe->all_list, all_list) {
-		if (io_worker_get(worker)) {
-			/* no task if node is/was offline */
-			if (worker->task)
-				ret = func(worker, data);
-			io_worker_release(worker);
-			if (ret)
-				break;
-		}
-	}
-
-	return ret;
-}
-
 void io_wq_cancel_all(struct io_wq *wq)
 {
 	int node;
@@ -1121,12 +1127,6 @@ bool io_wq_get(struct io_wq *wq, struct
 	return refcount_inc_not_zero(&wq->use_refs);
 }
 
-static bool io_wq_worker_wake(struct io_worker *worker, void *data)
-{
-	wake_up_process(worker->task);
-	return false;
-}
-
 static void __io_wq_destroy(struct io_wq *wq)
 {
 	int node;



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 13/70] io_uring: no need to call xa_destroy() on empty xarray
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 12/70] io-wq: fix use-after-free in io_wq_worker_running Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 14/70] io_uring: Fix use of XArray in __io_uring_files_cancel Greg Kroah-Hartman
                   ` (59 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, kernel test robot, Jens Axboe

From: Jens Axboe <axboe@kernel.dk>

commit ca6484cd308a671811bf39f3119e81966eb476e3 upstream.

The kernel test robot reports this lockdep issue:

[child1:659] mbind (274) returned ENOSYS, marking as inactive.
[child1:659] mq_timedsend (279) returned ENOSYS, marking as inactive.
[main] 10175 iterations. [F:7781 S:2344 HI:2397]
[   24.610601]
[   24.610743] ================================
[   24.611083] WARNING: inconsistent lock state
[   24.611437] 5.9.0-rc7-00017-g0f2122045b9462 #5 Not tainted
[   24.611861] --------------------------------
[   24.612193] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[   24.612660] ksoftirqd/0/7 [HC0[0]:SC1[3]:HE0:SE0] takes:
[   24.613086] f00ed998 (&xa->xa_lock#4){+.?.}-{2:2}, at: xa_destroy+0x43/0xc1
[   24.613642] {SOFTIRQ-ON-W} state was registered at:
[   24.614024]   lock_acquire+0x20c/0x29b
[   24.614341]   _raw_spin_lock+0x21/0x30
[   24.614636]   io_uring_add_task_file+0xe8/0x13a
[   24.614987]   io_uring_create+0x535/0x6bd
[   24.615297]   io_uring_setup+0x11d/0x136
[   24.615606]   __ia32_sys_io_uring_setup+0xd/0xf
[   24.615977]   do_int80_syscall_32+0x53/0x6c
[   24.616306]   restore_all_switch_stack+0x0/0xb1
[   24.616677] irq event stamp: 939881
[   24.616968] hardirqs last  enabled at (939880): [<8105592d>] __local_bh_enable_ip+0x13c/0x145
[   24.617642] hardirqs last disabled at (939881): [<81b6ace3>] _raw_spin_lock_irqsave+0x1b/0x4e
[   24.618321] softirqs last  enabled at (939738): [<81b6c7c8>] __do_softirq+0x3f0/0x45a
[   24.618924] softirqs last disabled at (939743): [<81055741>] run_ksoftirqd+0x35/0x61
[   24.619521]
[   24.619521] other info that might help us debug this:
[   24.620028]  Possible unsafe locking scenario:
[   24.620028]
[   24.620492]        CPU0
[   24.620685]        ----
[   24.620894]   lock(&xa->xa_lock#4);
[   24.621168]   <Interrupt>
[   24.621381]     lock(&xa->xa_lock#4);
[   24.621695]
[   24.621695]  *** DEADLOCK ***
[   24.621695]
[   24.622154] 1 lock held by ksoftirqd/0/7:
[   24.622468]  #0: 823bfb94 (rcu_callback){....}-{0:0}, at: rcu_process_callbacks+0xc0/0x155
[   24.623106]
[   24.623106] stack backtrace:
[   24.623454] CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 5.9.0-rc7-00017-g0f2122045b9462 #5
[   24.624090] Call Trace:
[   24.624284]  ? show_stack+0x40/0x46
[   24.624551]  dump_stack+0x1b/0x1d
[   24.624809]  print_usage_bug+0x17a/0x185
[   24.625142]  mark_lock+0x11d/0x1db
[   24.625474]  ? print_shortest_lock_dependencies+0x121/0x121
[   24.625905]  __lock_acquire+0x41e/0x7bf
[   24.626206]  lock_acquire+0x20c/0x29b
[   24.626517]  ? xa_destroy+0x43/0xc1
[   24.626810]  ? lock_acquire+0x20c/0x29b
[   24.627110]  _raw_spin_lock_irqsave+0x3e/0x4e
[   24.627450]  ? xa_destroy+0x43/0xc1
[   24.627725]  xa_destroy+0x43/0xc1
[   24.627989]  __io_uring_free+0x57/0x71
[   24.628286]  ? get_pid+0x22/0x22
[   24.628544]  __put_task_struct+0xf2/0x163
[   24.628865]  put_task_struct+0x1f/0x2a
[   24.629161]  delayed_put_task_struct+0xe2/0xe9
[   24.629509]  rcu_process_callbacks+0x128/0x155
[   24.629860]  __do_softirq+0x1a3/0x45a
[   24.630151]  run_ksoftirqd+0x35/0x61
[   24.630443]  smpboot_thread_fn+0x304/0x31a
[   24.630763]  kthread+0x124/0x139
[   24.631016]  ? sort_range+0x18/0x18
[   24.631290]  ? kthread_create_worker_on_cpu+0x17/0x17
[   24.631682]  ret_from_fork+0x1c/0x28

which is complaining about xa_destroy() grabbing the xa lock in an
IRQ disabling fashion, whereas the io_uring uses cases aren't interrupt
safe. This is really an xarray issue, since it should not assume the
lock type. But for our use case, since we know the xarray is empty at
this point, there's no need to actually call xa_destroy(). So just get
rid of it.

Fixes: 0f2122045b94 ("io_uring: don't rely on weak ->files references")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io_uring.c |    1 -
 1 file changed, 1 deletion(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7136,7 +7136,6 @@ void __io_uring_free(struct task_struct
 	struct io_uring_task *tctx = tsk->io_uring;
 
 	WARN_ON_ONCE(!xa_empty(&tctx->xa));
-	xa_destroy(&tctx->xa);
 	kfree(tctx);
 	tsk->io_uring = NULL;
 }



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 14/70] io_uring: Fix use of XArray in __io_uring_files_cancel
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 13/70] io_uring: no need to call xa_destroy() on empty xarray Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 15/70] io_uring: Fix XArray usage in io_uring_add_task_file Greg Kroah-Hartman
                   ` (58 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+27c12725d8ff0bfe1a13,
	Matthew Wilcox (Oracle),
	Jens Axboe

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

commit ce765372bc443573d1d339a2bf4995de385dea3a upstream.

We have to drop the lock during each iteration, so there's no advantage
to using the advanced API.  Convert this to a standard xa_for_each() loop.

Reported-by: syzbot+27c12725d8ff0bfe1a13@syzkaller.appspotmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io_uring.c |   19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8008,28 +8008,19 @@ static void io_uring_attempt_task_drop(s
 void __io_uring_files_cancel(struct files_struct *files)
 {
 	struct io_uring_task *tctx = current->io_uring;
-	XA_STATE(xas, &tctx->xa, 0);
+	struct file *file;
+	unsigned long index;
 
 	/* make sure overflow events are dropped */
 	tctx->in_idle = true;
 
-	do {
-		struct io_ring_ctx *ctx;
-		struct file *file;
-
-		xas_lock(&xas);
-		file = xas_next_entry(&xas, ULONG_MAX);
-		xas_unlock(&xas);
-
-		if (!file)
-			break;
-
-		ctx = file->private_data;
+	xa_for_each(&tctx->xa, index, file) {
+		struct io_ring_ctx *ctx = file->private_data;
 
 		io_uring_cancel_task_requests(ctx, files);
 		if (files)
 			io_uring_del_task_file(file);
-	} while (1);
+	}
 }
 
 static inline bool io_uring_task_idle(struct io_uring_task *tctx)



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 15/70] io_uring: Fix XArray usage in io_uring_add_task_file
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 14/70] io_uring: Fix use of XArray in __io_uring_files_cancel Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 16/70] io_uring: Convert advanced XArray uses to the normal API Greg Kroah-Hartman
                   ` (57 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Matthew Wilcox (Oracle), Jens Axboe

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

commit 236434c3438c4da3dfbd6aeeab807577b85e951a upstream.

The xas_store() wasn't paired with an xas_nomem() loop, so if it couldn't
allocate memory using GFP_NOWAIT, it would leak the reference to the file
descriptor.  Also the node pointed to by the xas could be freed between
the call to xas_load() under the rcu_read_lock() and the acquisition of
the xa_lock.

It's easier to just use the normal xa_load/xa_store interface here.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
[axboe: fix missing assign after alloc, cur_uring -> tctx rename]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io_uring.c |   21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7929,27 +7929,24 @@ static void io_uring_cancel_task_request
  */
 static int io_uring_add_task_file(struct file *file)
 {
-	if (unlikely(!current->io_uring)) {
+	struct io_uring_task *tctx = current->io_uring;
+
+	if (unlikely(!tctx)) {
 		int ret;
 
 		ret = io_uring_alloc_task_context(current);
 		if (unlikely(ret))
 			return ret;
+		tctx = current->io_uring;
 	}
-	if (current->io_uring->last != file) {
-		XA_STATE(xas, &current->io_uring->xa, (unsigned long) file);
-		void *old;
+	if (tctx->last != file) {
+		void *old = xa_load(&tctx->xa, (unsigned long)file);
 
-		rcu_read_lock();
-		old = xas_load(&xas);
-		if (old != file) {
+		if (!old) {
 			get_file(file);
-			xas_lock(&xas);
-			xas_store(&xas, file);
-			xas_unlock(&xas);
+			xa_store(&tctx->xa, (unsigned long)file, file, GFP_KERNEL);
 		}
-		rcu_read_unlock();
-		current->io_uring->last = file;
+		tctx->last = file;
 	}
 
 	return 0;



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 16/70] io_uring: Convert advanced XArray uses to the normal API
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 15/70] io_uring: Fix XArray usage in io_uring_add_task_file Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 17/70] scripts/setlocalversion: make git describe output more reliable Greg Kroah-Hartman
                   ` (56 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Matthew Wilcox (Oracle), Jens Axboe

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

commit 5e2ed8c4f45093698855b1f45cdf43efbf6dd498 upstream.

There are no bugs here that I've spotted, it's just easier to use the
normal API and there are no performance advantages to using the more
verbose advanced API.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/io_uring.c |   14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7958,27 +7958,17 @@ static int io_uring_add_task_file(struct
 static void io_uring_del_task_file(struct file *file)
 {
 	struct io_uring_task *tctx = current->io_uring;
-	XA_STATE(xas, &tctx->xa, (unsigned long) file);
 
 	if (tctx->last == file)
 		tctx->last = NULL;
-
-	xas_lock(&xas);
-	file = xas_store(&xas, NULL);
-	xas_unlock(&xas);
-
+	file = xa_erase(&tctx->xa, (unsigned long)file);
 	if (file)
 		fput(file);
 }
 
 static void __io_uring_attempt_task_drop(struct file *file)
 {
-	XA_STATE(xas, &current->io_uring->xa, (unsigned long) file);
-	struct file *old;
-
-	rcu_read_lock();
-	old = xas_load(&xas);
-	rcu_read_unlock();
+	struct file *old = xa_load(&current->io_uring->xa, (unsigned long)file);
 
 	if (old == file)
 		io_uring_del_task_file(file);



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 17/70] scripts/setlocalversion: make git describe output more reliable
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 16/70] io_uring: Convert advanced XArray uses to the normal API Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 18/70] efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure Greg Kroah-Hartman
                   ` (55 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Rasmus Villemoes, Masahiro Yamada

From: Rasmus Villemoes <linux@rasmusvillemoes.dk>

commit 548b8b5168c90c42e88f70fcf041b4ce0b8e7aa8 upstream.

When building for an embedded target using Yocto, we're sometimes
observing that the version string that gets built into vmlinux (and
thus what uname -a reports) differs from the path under /lib/modules/
where modules get installed in the rootfs, but only in the length of
the -gabc123def suffix. Hence modprobe always fails.

The problem is that Yocto has the concept of "sstate" (shared state),
which allows different developers/buildbots/etc. to share build
artifacts, based on a hash of all the metadata that went into building
that artifact - and that metadata includes all dependencies (e.g. the
compiler used etc.). That normally works quite well; usually a clean
build (without using any sstate cache) done by one developer ends up
being binary identical to a build done on another host. However, one
thing that can cause two developers to end up with different builds
[and thus make one's vmlinux package incompatible with the other's
kernel-dev package], which is not captured by the metadata hashing, is
this `git describe`: The output of that can be affected by

(1) git version: before 2.11 git defaulted to a minimum of 7, since
2.11 (git.git commit e6c587) the default is dynamic based on the
number of objects in the repo
(2) hence even if both run the same git version, the output can differ
based on how many remotes are being tracked (or just lots of local
development branches or plain old garbage)
(3) and of course somebody could have a core.abbrev config setting in
~/.gitconfig

So in order to avoid `uname -a` output relying on such random details
of the build environment which are rather hard to ensure are
consistent between developers and buildbots, make sure the abbreviated
sha1 always consists of exactly 12 hex characters. That is consistent
with the current rule for -stable patches, and is almost always enough
to identify the head commit unambigously - in the few cases where it
does not, the v5.4.3-00021- prefix would certainly nail it down.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 scripts/setlocalversion |   19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

--- a/scripts/setlocalversion
+++ b/scripts/setlocalversion
@@ -45,7 +45,7 @@ scm_version()
 
 	# Check for git and a git repo.
 	if test -z "$(git rev-parse --show-cdup 2>/dev/null)" &&
-	   head=$(git rev-parse --verify --short HEAD 2>/dev/null); then
+	   head=$(git rev-parse --verify HEAD 2>/dev/null); then
 
 		# If we are at a tagged commit (like "v2.6.30-rc6"), we ignore
 		# it, because this version is defined in the top level Makefile.
@@ -59,11 +59,22 @@ scm_version()
 			fi
 			# If we are past a tagged commit (like
 			# "v2.6.30-rc5-302-g72357d5"), we pretty print it.
-			if atag="$(git describe 2>/dev/null)"; then
-				echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),$(NF))}'
+			#
+			# Ensure the abbreviated sha1 has exactly 12
+			# hex characters, to make the output
+			# independent of git version, local
+			# core.abbrev settings and/or total number of
+			# objects in the current repository - passing
+			# --abbrev=12 ensures a minimum of 12, and the
+			# awk substr() then picks the 'g' and first 12
+			# hex chars.
+			if atag="$(git describe --abbrev=12 2>/dev/null)"; then
+				echo "$atag" | awk -F- '{printf("-%05d-%s", $(NF-1),substr($(NF),0,13))}'
 
-			# If we don't have a tag at all we print -g{commitish}.
+			# If we don't have a tag at all we print -g{commitish},
+			# again using exactly 12 hex chars.
 			else
+				head="$(echo $head | cut -c1-12)"
 				printf '%s%s' -g $head
 			fi
 		fi



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 18/70] efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 17/70] scripts/setlocalversion: make git describe output more reliable Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 19/70] fs/kernel_read_file: Remove FIRMWARE_EFI_EMBEDDED enum Greg Kroah-Hartman
                   ` (54 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Heinrich Schuchardt, Ard Biesheuvel

From: Ard Biesheuvel <ardb@kernel.org>

commit d32de9130f6c79533508e2c7879f18997bfbe2a0 upstream.

Currently, on arm64, we abort on any failure from efi_get_random_bytes()
other than EFI_NOT_FOUND when it comes to setting the physical seed for
KASLR, but ignore such failures when obtaining the seed for virtual
KASLR or for early seeding of the kernel's entropy pool via the config
table. This is inconsistent, and may lead to unexpected boot failures.

So let's permit any failure for the physical seed, and simply report
the error code if it does not equal EFI_NOT_FOUND.

Cc: <stable@vger.kernel.org> # v5.8+
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/firmware/efi/libstub/arm64-stub.c |    8 +++++---
 drivers/firmware/efi/libstub/fdt.c        |    4 +---
 2 files changed, 6 insertions(+), 6 deletions(-)

--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -62,10 +62,12 @@ efi_status_t handle_kernel_image(unsigne
 			status = efi_get_random_bytes(sizeof(phys_seed),
 						      (u8 *)&phys_seed);
 			if (status == EFI_NOT_FOUND) {
-				efi_info("EFI_RNG_PROTOCOL unavailable, no randomness supplied\n");
+				efi_info("EFI_RNG_PROTOCOL unavailable, KASLR will be disabled\n");
+				efi_nokaslr = true;
 			} else if (status != EFI_SUCCESS) {
-				efi_err("efi_get_random_bytes() failed\n");
-				return status;
+				efi_err("efi_get_random_bytes() failed (0x%lx), KASLR will be disabled\n",
+					status);
+				efi_nokaslr = true;
 			}
 		} else {
 			efi_info("KASLR disabled on kernel command line\n");
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -136,7 +136,7 @@ static efi_status_t update_fdt(void *ori
 	if (status)
 		goto fdt_set_fail;
 
-	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && !efi_nokaslr) {
 		efi_status_t efi_status;
 
 		efi_status = efi_get_random_bytes(sizeof(fdt_val64),
@@ -145,8 +145,6 @@ static efi_status_t update_fdt(void *ori
 			status = fdt_setprop_var(fdt, node, "kaslr-seed", fdt_val64);
 			if (status)
 				goto fdt_set_fail;
-		} else if (efi_status != EFI_NOT_FOUND) {
-			return efi_status;
 		}
 	}
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 19/70] fs/kernel_read_file: Remove FIRMWARE_EFI_EMBEDDED enum
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 18/70] efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 20/70] arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUs Greg Kroah-Hartman
                   ` (53 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Kees Cook, Luis Chamberlain, Scott Branden

From: Kees Cook <keescook@chromium.org>

commit 06e67b849ab910a49a629445f43edb074153d0eb upstream.

The "FIRMWARE_EFI_EMBEDDED" enum is a "where", not a "what". It
should not be distinguished separately from just "FIRMWARE", as this
confuses the LSMs about what is being loaded. Additionally, there was
no actual validation of the firmware contents happening.

Fixes: e4c2c0ff00ec ("firmware: Add new platform fallback mechanism and firmware_request_platform()")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Scott Branden <scott.branden@broadcom.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201002173828.2099543-3-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/base/firmware_loader/fallback_platform.c |    2 +-
 include/linux/fs.h                               |    1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/base/firmware_loader/fallback_platform.c
+++ b/drivers/base/firmware_loader/fallback_platform.c
@@ -17,7 +17,7 @@ int firmware_fallback_platform(struct fw
 	if (!(opt_flags & FW_OPT_FALLBACK_PLATFORM))
 		return -ENOENT;
 
-	rc = security_kernel_load_data(LOADING_FIRMWARE_EFI_EMBEDDED);
+	rc = security_kernel_load_data(LOADING_FIRMWARE);
 	if (rc)
 		return rc;
 
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3011,7 +3011,6 @@ extern int do_pipe_flags(int *, int);
 	id(UNKNOWN, unknown)		\
 	id(FIRMWARE, firmware)		\
 	id(FIRMWARE_PREALLOC_BUFFER, firmware)	\
-	id(FIRMWARE_EFI_EMBEDDED, firmware)	\
 	id(MODULE, kernel-module)		\
 	id(KEXEC_IMAGE, kexec-image)		\
 	id(KEXEC_INITRAMFS, kexec-initramfs)	\



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 20/70] arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUs
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 19/70] fs/kernel_read_file: Remove FIRMWARE_EFI_EMBEDDED enum Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 21/70] arm64: Run ARCH_WORKAROUND_2 " Greg Kroah-Hartman
                   ` (52 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Suzuki K Poulose, Marc Zyngier, Will Deacon

From: Marc Zyngier <maz@kernel.org>

commit 18fce56134c987e5b4eceddafdbe4b00c07e2ae1 upstream.

Commit 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack
thereof") changed the way we deal with ARCH_WORKAROUND_1, by moving most
of the enabling code to the .matches() callback.

This has the unfortunate effect that the workaround gets only enabled on
the first affected CPU, and no other.

In order to address this, forcefully call the .matches() callback from a
.cpu_enable() callback, which brings us back to the original behaviour.

Fixes: 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack thereof")
Cc: <stable@vger.kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/arm64/kernel/cpu_errata.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -599,6 +599,12 @@ check_branch_predictor(const struct arm6
 	return (need_wa > 0);
 }
 
+static void
+cpu_enable_branch_predictor_hardening(const struct arm64_cpu_capabilities *cap)
+{
+	cap->matches(cap, SCOPE_LOCAL_CPU);
+}
+
 static const __maybe_unused struct midr_range tx2_family_cpus[] = {
 	MIDR_ALL_VERSIONS(MIDR_BRCM_VULCAN),
 	MIDR_ALL_VERSIONS(MIDR_CAVIUM_THUNDERX2),
@@ -890,9 +896,11 @@ const struct arm64_cpu_capabilities arm6
 	},
 #endif
 	{
+		.desc = "Branch predictor hardening",
 		.capability = ARM64_HARDEN_BRANCH_PREDICTOR,
 		.type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
 		.matches = check_branch_predictor,
+		.cpu_enable = cpu_enable_branch_predictor_hardening,
 	},
 #ifdef CONFIG_HARDEN_EL2_VECTORS
 	{



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 21/70] arm64: Run ARCH_WORKAROUND_2 enabling code on all CPUs
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 20/70] arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUs Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 22/70] arm64: link with -z norelro regardless of CONFIG_RELOCATABLE Greg Kroah-Hartman
                   ` (51 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Suzuki K Poulose, Marc Zyngier, Will Deacon

From: Marc Zyngier <maz@kernel.org>

commit 39533e12063be7f55e3d6ae21ffe067799d542a4 upstream.

Commit 606f8e7b27bf ("arm64: capabilities: Use linear array for
detection and verification") changed the way we deal with per-CPU errata
by only calling the .matches() callback until one CPU is found to be
affected. At this point, .matches() stop being called, and .cpu_enable()
will be called on all CPUs.

This breaks the ARCH_WORKAROUND_2 handling, as only a single CPU will be
mitigated.

In order to address this, forcefully call the .matches() callback from a
.cpu_enable() callback, which brings us back to the original behaviour.

Fixes: 606f8e7b27bf ("arm64: capabilities: Use linear array for detection and verification")
Cc: <stable@vger.kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/arm64/kernel/cpu_errata.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -457,6 +457,12 @@ out_printmsg:
 	return required;
 }
 
+static void cpu_enable_ssbd_mitigation(const struct arm64_cpu_capabilities *cap)
+{
+	if (ssbd_state != ARM64_SSBD_FORCE_DISABLE)
+		cap->matches(cap, SCOPE_LOCAL_CPU);
+}
+
 /* known invulnerable cores */
 static const struct midr_range arm64_ssb_cpus[] = {
 	MIDR_ALL_VERSIONS(MIDR_CORTEX_A35),
@@ -914,6 +920,7 @@ const struct arm64_cpu_capabilities arm6
 		.capability = ARM64_SSBD,
 		.type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
 		.matches = has_ssbd_mitigation,
+		.cpu_enable = cpu_enable_ssbd_mitigation,
 		.midr_range_list = arm64_ssb_cpus,
 	},
 #ifdef CONFIG_ARM64_ERRATUM_1418040



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 22/70] arm64: link with -z norelro regardless of CONFIG_RELOCATABLE
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 21/70] arm64: Run ARCH_WORKAROUND_2 " Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 23/70] x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled Greg Kroah-Hartman
                   ` (50 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Nick Desaulniers, Will Deacon

From: Nick Desaulniers <ndesaulniers@google.com>

commit 3b92fa7485eba16b05166fddf38ab42f2ff6ab95 upstream.

With CONFIG_EXPERT=y, CONFIG_KASAN=y, CONFIG_RANDOMIZE_BASE=n,
CONFIG_RELOCATABLE=n, we observe the following failure when trying to
link the kernel image with LD=ld.lld:

error: section: .exit.data is not contiguous with other relro sections

ld.lld defaults to -z relro while ld.bfd defaults to -z norelro. This
was previously fixed, but only for CONFIG_RELOCATABLE=y.

Fixes: 3bbd3db86470 ("arm64: relocatable: fix inconsistencies in linker script and options")
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201016175339.2429280-1-ndesaulniers@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/arm64/Makefile |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -10,14 +10,14 @@
 #
 # Copyright (C) 1995-2001 by Russell King
 
-LDFLAGS_vmlinux	:=--no-undefined -X
+LDFLAGS_vmlinux	:=--no-undefined -X -z norelro
 CPPFLAGS_vmlinux.lds = -DTEXT_OFFSET=$(TEXT_OFFSET)
 
 ifeq ($(CONFIG_RELOCATABLE), y)
 # Pass --no-apply-dynamic-relocs to restore pre-binutils-2.27 behaviour
 # for relative relocs, since this leads to better Image compression
 # with the relocation offsets always being zero.
-LDFLAGS_vmlinux		+= -shared -Bsymbolic -z notext -z norelro \
+LDFLAGS_vmlinux		+= -shared -Bsymbolic -z notext \
 			$(call ld-option, --no-apply-dynamic-relocs)
 endif
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 23/70] x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 22/70] arm64: link with -z norelro regardless of CONFIG_RELOCATABLE Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 24/70] x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() Greg Kroah-Hartman
                   ` (49 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Randy Dunlap, Bjorn Helgaas,
	Andy Shevchenko, Jesse Barnes, Thomas Gleixner, Jacob Pan,
	Len Brown, Arjan van de Ven

From: Randy Dunlap <rdunlap@infradead.org>

commit 035fff1f7aab43e420e0098f0854470a5286fb83 upstream.

Fix build error when CONFIG_ACPI is not set/enabled by adding the header
file <asm/acpi.h> which contains a stub for the function in the build
error.

    ../arch/x86/pci/intel_mid_pci.c: In function ‘intel_mid_pci_init’:
    ../arch/x86/pci/intel_mid_pci.c:303:2: error: implicit declaration of function ‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? [-Werror=implicit-function-declaration]
      acpi_noirq_set();

Fixes: a912a7584ec3 ("x86/platform/intel-mid: Move PCI initialization to arch_init()")
Link: https://lore.kernel.org/r/ea903917-e51b-4cc9-2680-bc1e36efa026@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Jesse Barnes <jsbarnes@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org	# v4.16+
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/pci/intel_mid_pci.c |    1 +
 1 file changed, 1 insertion(+)

--- a/arch/x86/pci/intel_mid_pci.c
+++ b/arch/x86/pci/intel_mid_pci.c
@@ -33,6 +33,7 @@
 #include <asm/hw_irq.h>
 #include <asm/io_apic.h>
 #include <asm/intel-mid.h>
+#include <asm/acpi.h>
 
 #define PCIE_CAP_OFFSET	0x100
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 24/70] x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 23/70] x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 25/70] x86/copy_mc: Introduce copy_mc_enhanced_fast_string() Greg Kroah-Hartman
                   ` (48 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dan Williams, Borislav Petkov,
	Tony Luck, Michael Ellerman

From: Dan Williams <dan.j.williams@intel.com>

commit ec6347bb43395cb92126788a1a5b25302543f815 upstream.

In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.

Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:

  On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
  >
  > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
  > >
  > > However now I see that copy_user_generic() works for the wrong reason.
  > > It works because the exception on the source address due to poison
  > > looks no different than a write fault on the user address to the
  > > caller, it's still just a short copy. So it makes copy_to_user() work
  > > for the wrong reason relative to the name.
  >
  > Right.
  >
  > And it won't work that way on other architectures. On x86, we have a
  > generic function that can take faults on either side, and we use it
  > for both cases (and for the "in_user" case too), but that's an
  > artifact of the architecture oddity.
  >
  > In fact, it's probably wrong even on x86 - because it can hide bugs -
  > but writing those things is painful enough that everybody prefers
  > having just one function.

Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().

Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.

One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.

 [ bp: Massage a bit. ]

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/powerpc/Kconfig                                   |    2 
 arch/powerpc/include/asm/string.h                      |    2 
 arch/powerpc/include/asm/uaccess.h                     |   40 +-
 arch/powerpc/lib/Makefile                              |    2 
 arch/powerpc/lib/copy_mc_64.S                          |  242 +++++++++++++++++
 arch/powerpc/lib/memcpy_mcsafe_64.S                    |  242 -----------------
 arch/x86/Kconfig                                       |    2 
 arch/x86/Kconfig.debug                                 |    2 
 arch/x86/include/asm/copy_mc_test.h                    |   75 +++++
 arch/x86/include/asm/mce.h                             |    9 
 arch/x86/include/asm/mcsafe_test.h                     |   75 -----
 arch/x86/include/asm/string_64.h                       |   32 --
 arch/x86/include/asm/uaccess.h                         |    9 
 arch/x86/include/asm/uaccess_64.h                      |   20 -
 arch/x86/kernel/cpu/mce/core.c                         |    8 
 arch/x86/kernel/quirks.c                               |   10 
 arch/x86/lib/Makefile                                  |    1 
 arch/x86/lib/copy_mc.c                                 |   82 +++++
 arch/x86/lib/copy_mc_64.S                              |  127 ++++++++
 arch/x86/lib/memcpy_64.S                               |  115 --------
 arch/x86/lib/usercopy_64.c                             |   21 -
 drivers/md/dm-writecache.c                             |   15 -
 drivers/nvdimm/claim.c                                 |    2 
 drivers/nvdimm/pmem.c                                  |    6 
 include/linux/string.h                                 |    9 
 include/linux/uaccess.h                                |   13 
 include/linux/uio.h                                    |   10 
 lib/Kconfig                                            |    7 
 lib/iov_iter.c                                         |   48 +--
 tools/arch/x86/include/asm/mcsafe_test.h               |   13 
 tools/arch/x86/lib/memcpy_64.S                         |  115 --------
 tools/objtool/check.c                                  |    4 
 tools/perf/bench/Build                                 |    1 
 tools/perf/bench/mem-memcpy-x86-64-lib.c               |   24 -
 tools/testing/nvdimm/test/nfit.c                       |   49 +--
 tools/testing/selftests/powerpc/copyloops/.gitignore   |    2 
 tools/testing/selftests/powerpc/copyloops/Makefile     |    6 
 tools/testing/selftests/powerpc/copyloops/copy_mc_64.S |  242 +++++++++++++++++
 38 files changed, 914 insertions(+), 770 deletions(-)

--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -135,7 +135,7 @@ config PPC
 	select ARCH_HAS_STRICT_KERNEL_RWX	if (PPC32 && !HIBERNATION)
 	select ARCH_HAS_TICK_BROADCAST		if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAS_UACCESS_FLUSHCACHE
-	select ARCH_HAS_UACCESS_MCSAFE		if PPC64
+	select ARCH_HAS_COPY_MC			if PPC64
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_KEEP_MEMBLOCK
--- a/arch/powerpc/include/asm/string.h
+++ b/arch/powerpc/include/asm/string.h
@@ -53,9 +53,7 @@ void *__memmove(void *to, const void *fr
 #ifndef CONFIG_KASAN
 #define __HAVE_ARCH_MEMSET32
 #define __HAVE_ARCH_MEMSET64
-#define __HAVE_ARCH_MEMCPY_MCSAFE
 
-extern int memcpy_mcsafe(void *dst, const void *src, __kernel_size_t sz);
 extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t);
 extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
 extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t);
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -436,6 +436,32 @@ do {								\
 extern unsigned long __copy_tofrom_user(void __user *to,
 		const void __user *from, unsigned long size);
 
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+unsigned long __must_check
+copy_mc_generic(void *to, const void *from, unsigned long size);
+
+static inline unsigned long __must_check
+copy_mc_to_kernel(void *to, const void *from, unsigned long size)
+{
+	return copy_mc_generic(to, from, size);
+}
+#define copy_mc_to_kernel copy_mc_to_kernel
+
+static inline unsigned long __must_check
+copy_mc_to_user(void __user *to, const void *from, unsigned long n)
+{
+	if (likely(check_copy_size(from, n, true))) {
+		if (access_ok(to, n)) {
+			allow_write_to_user(to, n);
+			n = copy_mc_generic((void *)to, from, n);
+			prevent_write_to_user(to, n);
+		}
+	}
+
+	return n;
+}
+#endif
+
 #ifdef __powerpc64__
 static inline unsigned long
 raw_copy_in_user(void __user *to, const void __user *from, unsigned long n)
@@ -524,20 +550,6 @@ raw_copy_to_user(void __user *to, const
 	return ret;
 }
 
-static __always_inline unsigned long __must_check
-copy_to_user_mcsafe(void __user *to, const void *from, unsigned long n)
-{
-	if (likely(check_copy_size(from, n, true))) {
-		if (access_ok(to, n)) {
-			allow_write_to_user(to, n);
-			n = memcpy_mcsafe((void *)to, from, n);
-			prevent_write_to_user(to, n);
-		}
-	}
-
-	return n;
-}
-
 unsigned long __arch_clear_user(void __user *addr, unsigned long size);
 
 static inline unsigned long clear_user(void __user *addr, unsigned long size)
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -39,7 +39,7 @@ obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_
 			       memcpy_power7.o
 
 obj64-y	+= copypage_64.o copyuser_64.o mem_64.o hweight_64.o \
-	   memcpy_64.o memcpy_mcsafe_64.o
+	   memcpy_64.o copy_mc_64.o
 
 obj64-$(CONFIG_SMP)	+= locks.o
 obj64-$(CONFIG_ALTIVEC)	+= vmx-helper.o
--- /dev/null
+++ b/arch/powerpc/lib/copy_mc_64.S
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) IBM Corporation, 2011
+ * Derived from copyuser_power7.s by Anton Blanchard <anton@au.ibm.com>
+ * Author - Balbir Singh <bsingharora@gmail.com>
+ */
+#include <asm/ppc_asm.h>
+#include <asm/errno.h>
+#include <asm/export.h>
+
+	.macro err1
+100:
+	EX_TABLE(100b,.Ldo_err1)
+	.endm
+
+	.macro err2
+200:
+	EX_TABLE(200b,.Ldo_err2)
+	.endm
+
+	.macro err3
+300:	EX_TABLE(300b,.Ldone)
+	.endm
+
+.Ldo_err2:
+	ld	r22,STK_REG(R22)(r1)
+	ld	r21,STK_REG(R21)(r1)
+	ld	r20,STK_REG(R20)(r1)
+	ld	r19,STK_REG(R19)(r1)
+	ld	r18,STK_REG(R18)(r1)
+	ld	r17,STK_REG(R17)(r1)
+	ld	r16,STK_REG(R16)(r1)
+	ld	r15,STK_REG(R15)(r1)
+	ld	r14,STK_REG(R14)(r1)
+	addi	r1,r1,STACKFRAMESIZE
+.Ldo_err1:
+	/* Do a byte by byte copy to get the exact remaining size */
+	mtctr	r7
+46:
+err3;	lbz	r0,0(r4)
+	addi	r4,r4,1
+err3;	stb	r0,0(r3)
+	addi	r3,r3,1
+	bdnz	46b
+	li	r3,0
+	blr
+
+.Ldone:
+	mfctr	r3
+	blr
+
+
+_GLOBAL(copy_mc_generic)
+	mr	r7,r5
+	cmpldi	r5,16
+	blt	.Lshort_copy
+
+.Lcopy:
+	/* Get the source 8B aligned */
+	neg	r6,r4
+	mtocrf	0x01,r6
+	clrldi	r6,r6,(64-3)
+
+	bf	cr7*4+3,1f
+err1;	lbz	r0,0(r4)
+	addi	r4,r4,1
+err1;	stb	r0,0(r3)
+	addi	r3,r3,1
+	subi	r7,r7,1
+
+1:	bf	cr7*4+2,2f
+err1;	lhz	r0,0(r4)
+	addi	r4,r4,2
+err1;	sth	r0,0(r3)
+	addi	r3,r3,2
+	subi	r7,r7,2
+
+2:	bf	cr7*4+1,3f
+err1;	lwz	r0,0(r4)
+	addi	r4,r4,4
+err1;	stw	r0,0(r3)
+	addi	r3,r3,4
+	subi	r7,r7,4
+
+3:	sub	r5,r5,r6
+	cmpldi	r5,128
+
+	mflr	r0
+	stdu	r1,-STACKFRAMESIZE(r1)
+	std	r14,STK_REG(R14)(r1)
+	std	r15,STK_REG(R15)(r1)
+	std	r16,STK_REG(R16)(r1)
+	std	r17,STK_REG(R17)(r1)
+	std	r18,STK_REG(R18)(r1)
+	std	r19,STK_REG(R19)(r1)
+	std	r20,STK_REG(R20)(r1)
+	std	r21,STK_REG(R21)(r1)
+	std	r22,STK_REG(R22)(r1)
+	std	r0,STACKFRAMESIZE+16(r1)
+
+	blt	5f
+	srdi	r6,r5,7
+	mtctr	r6
+
+	/* Now do cacheline (128B) sized loads and stores. */
+	.align	5
+4:
+err2;	ld	r0,0(r4)
+err2;	ld	r6,8(r4)
+err2;	ld	r8,16(r4)
+err2;	ld	r9,24(r4)
+err2;	ld	r10,32(r4)
+err2;	ld	r11,40(r4)
+err2;	ld	r12,48(r4)
+err2;	ld	r14,56(r4)
+err2;	ld	r15,64(r4)
+err2;	ld	r16,72(r4)
+err2;	ld	r17,80(r4)
+err2;	ld	r18,88(r4)
+err2;	ld	r19,96(r4)
+err2;	ld	r20,104(r4)
+err2;	ld	r21,112(r4)
+err2;	ld	r22,120(r4)
+	addi	r4,r4,128
+err2;	std	r0,0(r3)
+err2;	std	r6,8(r3)
+err2;	std	r8,16(r3)
+err2;	std	r9,24(r3)
+err2;	std	r10,32(r3)
+err2;	std	r11,40(r3)
+err2;	std	r12,48(r3)
+err2;	std	r14,56(r3)
+err2;	std	r15,64(r3)
+err2;	std	r16,72(r3)
+err2;	std	r17,80(r3)
+err2;	std	r18,88(r3)
+err2;	std	r19,96(r3)
+err2;	std	r20,104(r3)
+err2;	std	r21,112(r3)
+err2;	std	r22,120(r3)
+	addi	r3,r3,128
+	subi	r7,r7,128
+	bdnz	4b
+
+	clrldi	r5,r5,(64-7)
+
+	/* Up to 127B to go */
+5:	srdi	r6,r5,4
+	mtocrf	0x01,r6
+
+6:	bf	cr7*4+1,7f
+err2;	ld	r0,0(r4)
+err2;	ld	r6,8(r4)
+err2;	ld	r8,16(r4)
+err2;	ld	r9,24(r4)
+err2;	ld	r10,32(r4)
+err2;	ld	r11,40(r4)
+err2;	ld	r12,48(r4)
+err2;	ld	r14,56(r4)
+	addi	r4,r4,64
+err2;	std	r0,0(r3)
+err2;	std	r6,8(r3)
+err2;	std	r8,16(r3)
+err2;	std	r9,24(r3)
+err2;	std	r10,32(r3)
+err2;	std	r11,40(r3)
+err2;	std	r12,48(r3)
+err2;	std	r14,56(r3)
+	addi	r3,r3,64
+	subi	r7,r7,64
+
+7:	ld	r14,STK_REG(R14)(r1)
+	ld	r15,STK_REG(R15)(r1)
+	ld	r16,STK_REG(R16)(r1)
+	ld	r17,STK_REG(R17)(r1)
+	ld	r18,STK_REG(R18)(r1)
+	ld	r19,STK_REG(R19)(r1)
+	ld	r20,STK_REG(R20)(r1)
+	ld	r21,STK_REG(R21)(r1)
+	ld	r22,STK_REG(R22)(r1)
+	addi	r1,r1,STACKFRAMESIZE
+
+	/* Up to 63B to go */
+	bf	cr7*4+2,8f
+err1;	ld	r0,0(r4)
+err1;	ld	r6,8(r4)
+err1;	ld	r8,16(r4)
+err1;	ld	r9,24(r4)
+	addi	r4,r4,32
+err1;	std	r0,0(r3)
+err1;	std	r6,8(r3)
+err1;	std	r8,16(r3)
+err1;	std	r9,24(r3)
+	addi	r3,r3,32
+	subi	r7,r7,32
+
+	/* Up to 31B to go */
+8:	bf	cr7*4+3,9f
+err1;	ld	r0,0(r4)
+err1;	ld	r6,8(r4)
+	addi	r4,r4,16
+err1;	std	r0,0(r3)
+err1;	std	r6,8(r3)
+	addi	r3,r3,16
+	subi	r7,r7,16
+
+9:	clrldi	r5,r5,(64-4)
+
+	/* Up to 15B to go */
+.Lshort_copy:
+	mtocrf	0x01,r5
+	bf	cr7*4+0,12f
+err1;	lwz	r0,0(r4)	/* Less chance of a reject with word ops */
+err1;	lwz	r6,4(r4)
+	addi	r4,r4,8
+err1;	stw	r0,0(r3)
+err1;	stw	r6,4(r3)
+	addi	r3,r3,8
+	subi	r7,r7,8
+
+12:	bf	cr7*4+1,13f
+err1;	lwz	r0,0(r4)
+	addi	r4,r4,4
+err1;	stw	r0,0(r3)
+	addi	r3,r3,4
+	subi	r7,r7,4
+
+13:	bf	cr7*4+2,14f
+err1;	lhz	r0,0(r4)
+	addi	r4,r4,2
+err1;	sth	r0,0(r3)
+	addi	r3,r3,2
+	subi	r7,r7,2
+
+14:	bf	cr7*4+3,15f
+err1;	lbz	r0,0(r4)
+err1;	stb	r0,0(r3)
+
+15:	li	r3,0
+	blr
+
+EXPORT_SYMBOL_GPL(copy_mc_generic);
--- a/arch/powerpc/lib/memcpy_mcsafe_64.S
+++ /dev/null
@@ -1,242 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Copyright (C) IBM Corporation, 2011
- * Derived from copyuser_power7.s by Anton Blanchard <anton@au.ibm.com>
- * Author - Balbir Singh <bsingharora@gmail.com>
- */
-#include <asm/ppc_asm.h>
-#include <asm/errno.h>
-#include <asm/export.h>
-
-	.macro err1
-100:
-	EX_TABLE(100b,.Ldo_err1)
-	.endm
-
-	.macro err2
-200:
-	EX_TABLE(200b,.Ldo_err2)
-	.endm
-
-	.macro err3
-300:	EX_TABLE(300b,.Ldone)
-	.endm
-
-.Ldo_err2:
-	ld	r22,STK_REG(R22)(r1)
-	ld	r21,STK_REG(R21)(r1)
-	ld	r20,STK_REG(R20)(r1)
-	ld	r19,STK_REG(R19)(r1)
-	ld	r18,STK_REG(R18)(r1)
-	ld	r17,STK_REG(R17)(r1)
-	ld	r16,STK_REG(R16)(r1)
-	ld	r15,STK_REG(R15)(r1)
-	ld	r14,STK_REG(R14)(r1)
-	addi	r1,r1,STACKFRAMESIZE
-.Ldo_err1:
-	/* Do a byte by byte copy to get the exact remaining size */
-	mtctr	r7
-46:
-err3;	lbz	r0,0(r4)
-	addi	r4,r4,1
-err3;	stb	r0,0(r3)
-	addi	r3,r3,1
-	bdnz	46b
-	li	r3,0
-	blr
-
-.Ldone:
-	mfctr	r3
-	blr
-
-
-_GLOBAL(memcpy_mcsafe)
-	mr	r7,r5
-	cmpldi	r5,16
-	blt	.Lshort_copy
-
-.Lcopy:
-	/* Get the source 8B aligned */
-	neg	r6,r4
-	mtocrf	0x01,r6
-	clrldi	r6,r6,(64-3)
-
-	bf	cr7*4+3,1f
-err1;	lbz	r0,0(r4)
-	addi	r4,r4,1
-err1;	stb	r0,0(r3)
-	addi	r3,r3,1
-	subi	r7,r7,1
-
-1:	bf	cr7*4+2,2f
-err1;	lhz	r0,0(r4)
-	addi	r4,r4,2
-err1;	sth	r0,0(r3)
-	addi	r3,r3,2
-	subi	r7,r7,2
-
-2:	bf	cr7*4+1,3f
-err1;	lwz	r0,0(r4)
-	addi	r4,r4,4
-err1;	stw	r0,0(r3)
-	addi	r3,r3,4
-	subi	r7,r7,4
-
-3:	sub	r5,r5,r6
-	cmpldi	r5,128
-
-	mflr	r0
-	stdu	r1,-STACKFRAMESIZE(r1)
-	std	r14,STK_REG(R14)(r1)
-	std	r15,STK_REG(R15)(r1)
-	std	r16,STK_REG(R16)(r1)
-	std	r17,STK_REG(R17)(r1)
-	std	r18,STK_REG(R18)(r1)
-	std	r19,STK_REG(R19)(r1)
-	std	r20,STK_REG(R20)(r1)
-	std	r21,STK_REG(R21)(r1)
-	std	r22,STK_REG(R22)(r1)
-	std	r0,STACKFRAMESIZE+16(r1)
-
-	blt	5f
-	srdi	r6,r5,7
-	mtctr	r6
-
-	/* Now do cacheline (128B) sized loads and stores. */
-	.align	5
-4:
-err2;	ld	r0,0(r4)
-err2;	ld	r6,8(r4)
-err2;	ld	r8,16(r4)
-err2;	ld	r9,24(r4)
-err2;	ld	r10,32(r4)
-err2;	ld	r11,40(r4)
-err2;	ld	r12,48(r4)
-err2;	ld	r14,56(r4)
-err2;	ld	r15,64(r4)
-err2;	ld	r16,72(r4)
-err2;	ld	r17,80(r4)
-err2;	ld	r18,88(r4)
-err2;	ld	r19,96(r4)
-err2;	ld	r20,104(r4)
-err2;	ld	r21,112(r4)
-err2;	ld	r22,120(r4)
-	addi	r4,r4,128
-err2;	std	r0,0(r3)
-err2;	std	r6,8(r3)
-err2;	std	r8,16(r3)
-err2;	std	r9,24(r3)
-err2;	std	r10,32(r3)
-err2;	std	r11,40(r3)
-err2;	std	r12,48(r3)
-err2;	std	r14,56(r3)
-err2;	std	r15,64(r3)
-err2;	std	r16,72(r3)
-err2;	std	r17,80(r3)
-err2;	std	r18,88(r3)
-err2;	std	r19,96(r3)
-err2;	std	r20,104(r3)
-err2;	std	r21,112(r3)
-err2;	std	r22,120(r3)
-	addi	r3,r3,128
-	subi	r7,r7,128
-	bdnz	4b
-
-	clrldi	r5,r5,(64-7)
-
-	/* Up to 127B to go */
-5:	srdi	r6,r5,4
-	mtocrf	0x01,r6
-
-6:	bf	cr7*4+1,7f
-err2;	ld	r0,0(r4)
-err2;	ld	r6,8(r4)
-err2;	ld	r8,16(r4)
-err2;	ld	r9,24(r4)
-err2;	ld	r10,32(r4)
-err2;	ld	r11,40(r4)
-err2;	ld	r12,48(r4)
-err2;	ld	r14,56(r4)
-	addi	r4,r4,64
-err2;	std	r0,0(r3)
-err2;	std	r6,8(r3)
-err2;	std	r8,16(r3)
-err2;	std	r9,24(r3)
-err2;	std	r10,32(r3)
-err2;	std	r11,40(r3)
-err2;	std	r12,48(r3)
-err2;	std	r14,56(r3)
-	addi	r3,r3,64
-	subi	r7,r7,64
-
-7:	ld	r14,STK_REG(R14)(r1)
-	ld	r15,STK_REG(R15)(r1)
-	ld	r16,STK_REG(R16)(r1)
-	ld	r17,STK_REG(R17)(r1)
-	ld	r18,STK_REG(R18)(r1)
-	ld	r19,STK_REG(R19)(r1)
-	ld	r20,STK_REG(R20)(r1)
-	ld	r21,STK_REG(R21)(r1)
-	ld	r22,STK_REG(R22)(r1)
-	addi	r1,r1,STACKFRAMESIZE
-
-	/* Up to 63B to go */
-	bf	cr7*4+2,8f
-err1;	ld	r0,0(r4)
-err1;	ld	r6,8(r4)
-err1;	ld	r8,16(r4)
-err1;	ld	r9,24(r4)
-	addi	r4,r4,32
-err1;	std	r0,0(r3)
-err1;	std	r6,8(r3)
-err1;	std	r8,16(r3)
-err1;	std	r9,24(r3)
-	addi	r3,r3,32
-	subi	r7,r7,32
-
-	/* Up to 31B to go */
-8:	bf	cr7*4+3,9f
-err1;	ld	r0,0(r4)
-err1;	ld	r6,8(r4)
-	addi	r4,r4,16
-err1;	std	r0,0(r3)
-err1;	std	r6,8(r3)
-	addi	r3,r3,16
-	subi	r7,r7,16
-
-9:	clrldi	r5,r5,(64-4)
-
-	/* Up to 15B to go */
-.Lshort_copy:
-	mtocrf	0x01,r5
-	bf	cr7*4+0,12f
-err1;	lwz	r0,0(r4)	/* Less chance of a reject with word ops */
-err1;	lwz	r6,4(r4)
-	addi	r4,r4,8
-err1;	stw	r0,0(r3)
-err1;	stw	r6,4(r3)
-	addi	r3,r3,8
-	subi	r7,r7,8
-
-12:	bf	cr7*4+1,13f
-err1;	lwz	r0,0(r4)
-	addi	r4,r4,4
-err1;	stw	r0,0(r3)
-	addi	r3,r3,4
-	subi	r7,r7,4
-
-13:	bf	cr7*4+2,14f
-err1;	lhz	r0,0(r4)
-	addi	r4,r4,2
-err1;	sth	r0,0(r3)
-	addi	r3,r3,2
-	subi	r7,r7,2
-
-14:	bf	cr7*4+3,15f
-err1;	lbz	r0,0(r4)
-err1;	stb	r0,0(r3)
-
-15:	li	r3,0
-	blr
-
-EXPORT_SYMBOL_GPL(memcpy_mcsafe);
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -75,7 +75,7 @@ config X86
 	select ARCH_HAS_PTE_DEVMAP		if X86_64
 	select ARCH_HAS_PTE_SPECIAL
 	select ARCH_HAS_UACCESS_FLUSHCACHE	if X86_64
-	select ARCH_HAS_UACCESS_MCSAFE		if X86_64 && X86_MCE
+	select ARCH_HAS_COPY_MC			if X86_64
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SET_DIRECT_MAP
 	select ARCH_HAS_STRICT_KERNEL_RWX
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -59,7 +59,7 @@ config EARLY_PRINTK_USB_XDBC
 	  You should normally say N here, unless you want to debug early
 	  crashes or need a very simple printk logging facility.
 
-config MCSAFE_TEST
+config COPY_MC_TEST
 	def_bool n
 
 config EFI_PGT_DUMP
--- /dev/null
+++ b/arch/x86/include/asm/copy_mc_test.h
@@ -0,0 +1,75 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _COPY_MC_TEST_H_
+#define _COPY_MC_TEST_H_
+
+#ifndef __ASSEMBLY__
+#ifdef CONFIG_COPY_MC_TEST
+extern unsigned long copy_mc_test_src;
+extern unsigned long copy_mc_test_dst;
+
+static inline void copy_mc_inject_src(void *addr)
+{
+	if (addr)
+		copy_mc_test_src = (unsigned long) addr;
+	else
+		copy_mc_test_src = ~0UL;
+}
+
+static inline void copy_mc_inject_dst(void *addr)
+{
+	if (addr)
+		copy_mc_test_dst = (unsigned long) addr;
+	else
+		copy_mc_test_dst = ~0UL;
+}
+#else /* CONFIG_COPY_MC_TEST */
+static inline void copy_mc_inject_src(void *addr)
+{
+}
+
+static inline void copy_mc_inject_dst(void *addr)
+{
+}
+#endif /* CONFIG_COPY_MC_TEST */
+
+#else /* __ASSEMBLY__ */
+#include <asm/export.h>
+
+#ifdef CONFIG_COPY_MC_TEST
+.macro COPY_MC_TEST_CTL
+	.pushsection .data
+	.align 8
+	.globl copy_mc_test_src
+	copy_mc_test_src:
+		.quad 0
+	EXPORT_SYMBOL_GPL(copy_mc_test_src)
+	.globl copy_mc_test_dst
+	copy_mc_test_dst:
+		.quad 0
+	EXPORT_SYMBOL_GPL(copy_mc_test_dst)
+	.popsection
+.endm
+
+.macro COPY_MC_TEST_SRC reg count target
+	leaq \count(\reg), %r9
+	cmp copy_mc_test_src, %r9
+	ja \target
+.endm
+
+.macro COPY_MC_TEST_DST reg count target
+	leaq \count(\reg), %r9
+	cmp copy_mc_test_dst, %r9
+	ja \target
+.endm
+#else
+.macro COPY_MC_TEST_CTL
+.endm
+
+.macro COPY_MC_TEST_SRC reg count target
+.endm
+
+.macro COPY_MC_TEST_DST reg count target
+.endm
+#endif /* CONFIG_COPY_MC_TEST */
+#endif /* __ASSEMBLY__ */
+#endif /* _COPY_MC_TEST_H_ */
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -174,6 +174,15 @@ extern void mce_unregister_decode_chain(
 
 extern int mce_p5_enabled;
 
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+extern void enable_copy_mc_fragile(void);
+unsigned long __must_check copy_mc_fragile(void *dst, const void *src, unsigned cnt);
+#else
+static inline void enable_copy_mc_fragile(void)
+{
+}
+#endif
+
 #ifdef CONFIG_X86_MCE
 int mcheck_init(void);
 void mcheck_cpu_init(struct cpuinfo_x86 *c);
--- a/arch/x86/include/asm/mcsafe_test.h
+++ /dev/null
@@ -1,75 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _MCSAFE_TEST_H_
-#define _MCSAFE_TEST_H_
-
-#ifndef __ASSEMBLY__
-#ifdef CONFIG_MCSAFE_TEST
-extern unsigned long mcsafe_test_src;
-extern unsigned long mcsafe_test_dst;
-
-static inline void mcsafe_inject_src(void *addr)
-{
-	if (addr)
-		mcsafe_test_src = (unsigned long) addr;
-	else
-		mcsafe_test_src = ~0UL;
-}
-
-static inline void mcsafe_inject_dst(void *addr)
-{
-	if (addr)
-		mcsafe_test_dst = (unsigned long) addr;
-	else
-		mcsafe_test_dst = ~0UL;
-}
-#else /* CONFIG_MCSAFE_TEST */
-static inline void mcsafe_inject_src(void *addr)
-{
-}
-
-static inline void mcsafe_inject_dst(void *addr)
-{
-}
-#endif /* CONFIG_MCSAFE_TEST */
-
-#else /* __ASSEMBLY__ */
-#include <asm/export.h>
-
-#ifdef CONFIG_MCSAFE_TEST
-.macro MCSAFE_TEST_CTL
-	.pushsection .data
-	.align 8
-	.globl mcsafe_test_src
-	mcsafe_test_src:
-		.quad 0
-	EXPORT_SYMBOL_GPL(mcsafe_test_src)
-	.globl mcsafe_test_dst
-	mcsafe_test_dst:
-		.quad 0
-	EXPORT_SYMBOL_GPL(mcsafe_test_dst)
-	.popsection
-.endm
-
-.macro MCSAFE_TEST_SRC reg count target
-	leaq \count(\reg), %r9
-	cmp mcsafe_test_src, %r9
-	ja \target
-.endm
-
-.macro MCSAFE_TEST_DST reg count target
-	leaq \count(\reg), %r9
-	cmp mcsafe_test_dst, %r9
-	ja \target
-.endm
-#else
-.macro MCSAFE_TEST_CTL
-.endm
-
-.macro MCSAFE_TEST_SRC reg count target
-.endm
-
-.macro MCSAFE_TEST_DST reg count target
-.endm
-#endif /* CONFIG_MCSAFE_TEST */
-#endif /* __ASSEMBLY__ */
-#endif /* _MCSAFE_TEST_H_ */
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -82,38 +82,6 @@ int strcmp(const char *cs, const char *c
 
 #endif
 
-#define __HAVE_ARCH_MEMCPY_MCSAFE 1
-__must_check unsigned long __memcpy_mcsafe(void *dst, const void *src,
-		size_t cnt);
-DECLARE_STATIC_KEY_FALSE(mcsafe_key);
-
-/**
- * memcpy_mcsafe - copy memory with indication if a machine check happened
- *
- * @dst:	destination address
- * @src:	source address
- * @cnt:	number of bytes to copy
- *
- * Low level memory copy function that catches machine checks
- * We only call into the "safe" function on systems that can
- * actually do machine check recovery. Everyone else can just
- * use memcpy().
- *
- * Return 0 for success, or number of bytes not copied if there was an
- * exception.
- */
-static __always_inline __must_check unsigned long
-memcpy_mcsafe(void *dst, const void *src, size_t cnt)
-{
-#ifdef CONFIG_X86_MCE
-	if (static_branch_unlikely(&mcsafe_key))
-		return __memcpy_mcsafe(dst, src, cnt);
-	else
-#endif
-		memcpy(dst, src, cnt);
-	return 0;
-}
-
 #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
 #define __HAVE_ARCH_MEMCPY_FLUSHCACHE 1
 void __memcpy_flushcache(void *dst, const void *src, size_t cnt);
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -455,6 +455,15 @@ extern __must_check long strnlen_user(co
 unsigned long __must_check clear_user(void __user *mem, unsigned long len);
 unsigned long __must_check __clear_user(void __user *mem, unsigned long len);
 
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+unsigned long __must_check
+copy_mc_to_kernel(void *to, const void *from, unsigned len);
+#define copy_mc_to_kernel copy_mc_to_kernel
+
+unsigned long __must_check
+copy_mc_to_user(void *to, const void *from, unsigned len);
+#endif
+
 /*
  * movsl can be slow when source and dest are not both 8-byte aligned
  */
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -47,22 +47,6 @@ copy_user_generic(void *to, const void *
 }
 
 static __always_inline __must_check unsigned long
-copy_to_user_mcsafe(void *to, const void *from, unsigned len)
-{
-	unsigned long ret;
-
-	__uaccess_begin();
-	/*
-	 * Note, __memcpy_mcsafe() is explicitly used since it can
-	 * handle exceptions / faults.  memcpy_mcsafe() may fall back to
-	 * memcpy() which lacks this handling.
-	 */
-	ret = __memcpy_mcsafe(to, from, len);
-	__uaccess_end();
-	return ret;
-}
-
-static __always_inline __must_check unsigned long
 raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 {
 	return copy_user_generic(dst, (__force void *)src, size);
@@ -102,8 +86,4 @@ __copy_from_user_flushcache(void *dst, c
 	kasan_check_write(dst, size);
 	return __copy_user_flushcache(dst, src, size);
 }
-
-unsigned long
-mcsafe_handle_tail(char *to, char *from, unsigned len);
-
 #endif /* _ASM_X86_UACCESS_64_H */
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -40,7 +40,6 @@
 #include <linux/debugfs.h>
 #include <linux/irq_work.h>
 #include <linux/export.h>
-#include <linux/jump_label.h>
 #include <linux/set_memory.h>
 #include <linux/task_work.h>
 #include <linux/hardirq.h>
@@ -2122,7 +2121,7 @@ void mce_disable_bank(int bank)
 	and older.
  * mce=nobootlog Don't log MCEs from before booting.
  * mce=bios_cmci_threshold Don't program the CMCI threshold
- * mce=recovery force enable memcpy_mcsafe()
+ * mce=recovery force enable copy_mc_fragile()
  */
 static int __init mcheck_enable(char *str)
 {
@@ -2730,13 +2729,10 @@ static void __init mcheck_debugfs_init(v
 static void __init mcheck_debugfs_init(void) { }
 #endif
 
-DEFINE_STATIC_KEY_FALSE(mcsafe_key);
-EXPORT_SYMBOL_GPL(mcsafe_key);
-
 static int __init mcheck_late_init(void)
 {
 	if (mca_cfg.recovery)
-		static_branch_inc(&mcsafe_key);
+		enable_copy_mc_fragile();
 
 	mcheck_debugfs_init();
 
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -8,6 +8,7 @@
 
 #include <asm/hpet.h>
 #include <asm/setup.h>
+#include <asm/mce.h>
 
 #if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && defined(CONFIG_PCI)
 
@@ -624,10 +625,6 @@ static void amd_disable_seq_and_redirect
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F3,
 			amd_disable_seq_and_redirect_scrub);
 
-#if defined(CONFIG_X86_64) && defined(CONFIG_X86_MCE)
-#include <linux/jump_label.h>
-#include <asm/string_64.h>
-
 /* Ivy Bridge, Haswell, Broadwell */
 static void quirk_intel_brickland_xeon_ras_cap(struct pci_dev *pdev)
 {
@@ -636,7 +633,7 @@ static void quirk_intel_brickland_xeon_r
 	pci_read_config_dword(pdev, 0x84, &capid0);
 
 	if (capid0 & 0x10)
-		static_branch_inc(&mcsafe_key);
+		enable_copy_mc_fragile();
 }
 
 /* Skylake */
@@ -653,7 +650,7 @@ static void quirk_intel_purley_xeon_ras_
 	 * enabled, so memory machine check recovery is also enabled.
 	 */
 	if ((capid0 & 0xc0) == 0xc0 || (capid5 & 0x1e0))
-		static_branch_inc(&mcsafe_key);
+		enable_copy_mc_fragile();
 
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0ec3, quirk_intel_brickland_xeon_ras_cap);
@@ -661,7 +658,6 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IN
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fc0, quirk_intel_brickland_xeon_ras_cap);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2083, quirk_intel_purley_xeon_ras_cap);
 #endif
-#endif
 
 bool x86_apple_machine;
 EXPORT_SYMBOL(x86_apple_machine);
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -44,6 +44,7 @@ obj-$(CONFIG_SMP) += msr-smp.o cache-smp
 lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
+lib-$(CONFIG_ARCH_HAS_COPY_MC) += copy_mc.o copy_mc_64.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 lib-$(CONFIG_FUNCTION_ERROR_INJECTION)	+= error-inject.o
--- /dev/null
+++ b/arch/x86/lib/copy_mc.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2016-2020 Intel Corporation. All rights reserved. */
+
+#include <linux/jump_label.h>
+#include <linux/uaccess.h>
+#include <linux/export.h>
+#include <linux/string.h>
+#include <linux/types.h>
+
+#include <asm/mce.h>
+
+#ifdef CONFIG_X86_MCE
+/*
+ * See COPY_MC_TEST for self-test of the copy_mc_fragile()
+ * implementation.
+ */
+static DEFINE_STATIC_KEY_FALSE(copy_mc_fragile_key);
+
+void enable_copy_mc_fragile(void)
+{
+	static_branch_inc(&copy_mc_fragile_key);
+}
+#define copy_mc_fragile_enabled (static_branch_unlikely(&copy_mc_fragile_key))
+
+/*
+ * Similar to copy_user_handle_tail, probe for the write fault point, or
+ * source exception point.
+ */
+__visible notrace unsigned long
+copy_mc_fragile_handle_tail(char *to, char *from, unsigned len)
+{
+	for (; len; --len, to++, from++)
+		if (copy_mc_fragile(to, from, 1))
+			break;
+	return len;
+}
+#else
+/*
+ * No point in doing careful copying, or consulting a static key when
+ * there is no #MC handler in the CONFIG_X86_MCE=n case.
+ */
+void enable_copy_mc_fragile(void)
+{
+}
+#define copy_mc_fragile_enabled (0)
+#endif
+
+/**
+ * copy_mc_to_kernel - memory copy that handles source exceptions
+ *
+ * @dst:	destination address
+ * @src:	source address
+ * @len:	number of bytes to copy
+ *
+ * Call into the 'fragile' version on systems that have trouble
+ * actually do machine check recovery. Everyone else can just
+ * use memcpy().
+ *
+ * Return 0 for success, or number of bytes not copied if there was an
+ * exception.
+ */
+unsigned long __must_check copy_mc_to_kernel(void *dst, const void *src, unsigned len)
+{
+	if (copy_mc_fragile_enabled)
+		return copy_mc_fragile(dst, src, len);
+	memcpy(dst, src, len);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(copy_mc_to_kernel);
+
+unsigned long __must_check copy_mc_to_user(void *dst, const void *src, unsigned len)
+{
+	unsigned long ret;
+
+	if (!copy_mc_fragile_enabled)
+		return copy_user_generic(dst, src, len);
+
+	__uaccess_begin();
+	ret = copy_mc_fragile(dst, src, len);
+	__uaccess_end();
+	return ret;
+}
--- /dev/null
+++ b/arch/x86/lib/copy_mc_64.S
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2016-2020 Intel Corporation. All rights reserved. */
+
+#include <linux/linkage.h>
+#include <asm/copy_mc_test.h>
+#include <asm/export.h>
+#include <asm/asm.h>
+
+#ifndef CONFIG_UML
+
+#ifdef CONFIG_X86_MCE
+COPY_MC_TEST_CTL
+
+/*
+ * copy_mc_fragile - copy memory with indication if an exception / fault happened
+ *
+ * The 'fragile' version is opted into by platform quirks and takes
+ * pains to avoid unrecoverable corner cases like 'fast-string'
+ * instruction sequences, and consuming poison across a cacheline
+ * boundary. The non-fragile version is equivalent to memcpy()
+ * regardless of CPU machine-check-recovery capability.
+ */
+SYM_FUNC_START(copy_mc_fragile)
+	cmpl $8, %edx
+	/* Less than 8 bytes? Go to byte copy loop */
+	jb .L_no_whole_words
+
+	/* Check for bad alignment of source */
+	testl $7, %esi
+	/* Already aligned */
+	jz .L_8byte_aligned
+
+	/* Copy one byte at a time until source is 8-byte aligned */
+	movl %esi, %ecx
+	andl $7, %ecx
+	subl $8, %ecx
+	negl %ecx
+	subl %ecx, %edx
+.L_read_leading_bytes:
+	movb (%rsi), %al
+	COPY_MC_TEST_SRC %rsi 1 .E_leading_bytes
+	COPY_MC_TEST_DST %rdi 1 .E_leading_bytes
+.L_write_leading_bytes:
+	movb %al, (%rdi)
+	incq %rsi
+	incq %rdi
+	decl %ecx
+	jnz .L_read_leading_bytes
+
+.L_8byte_aligned:
+	movl %edx, %ecx
+	andl $7, %edx
+	shrl $3, %ecx
+	jz .L_no_whole_words
+
+.L_read_words:
+	movq (%rsi), %r8
+	COPY_MC_TEST_SRC %rsi 8 .E_read_words
+	COPY_MC_TEST_DST %rdi 8 .E_write_words
+.L_write_words:
+	movq %r8, (%rdi)
+	addq $8, %rsi
+	addq $8, %rdi
+	decl %ecx
+	jnz .L_read_words
+
+	/* Any trailing bytes? */
+.L_no_whole_words:
+	andl %edx, %edx
+	jz .L_done_memcpy_trap
+
+	/* Copy trailing bytes */
+	movl %edx, %ecx
+.L_read_trailing_bytes:
+	movb (%rsi), %al
+	COPY_MC_TEST_SRC %rsi 1 .E_trailing_bytes
+	COPY_MC_TEST_DST %rdi 1 .E_trailing_bytes
+.L_write_trailing_bytes:
+	movb %al, (%rdi)
+	incq %rsi
+	incq %rdi
+	decl %ecx
+	jnz .L_read_trailing_bytes
+
+	/* Copy successful. Return zero */
+.L_done_memcpy_trap:
+	xorl %eax, %eax
+.L_done:
+	ret
+SYM_FUNC_END(copy_mc_fragile)
+EXPORT_SYMBOL_GPL(copy_mc_fragile)
+
+	.section .fixup, "ax"
+	/*
+	 * Return number of bytes not copied for any failure. Note that
+	 * there is no "tail" handling since the source buffer is 8-byte
+	 * aligned and poison is cacheline aligned.
+	 */
+.E_read_words:
+	shll	$3, %ecx
+.E_leading_bytes:
+	addl	%edx, %ecx
+.E_trailing_bytes:
+	mov	%ecx, %eax
+	jmp	.L_done
+
+	/*
+	 * For write fault handling, given the destination is unaligned,
+	 * we handle faults on multi-byte writes with a byte-by-byte
+	 * copy up to the write-protected page.
+	 */
+.E_write_words:
+	shll	$3, %ecx
+	addl	%edx, %ecx
+	movl	%ecx, %edx
+	jmp copy_mc_fragile_handle_tail
+
+	.previous
+
+	_ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes)
+	_ASM_EXTABLE_FAULT(.L_read_words, .E_read_words)
+	_ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes)
+	_ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes)
+	_ASM_EXTABLE(.L_write_words, .E_write_words)
+	_ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
+#endif /* CONFIG_X86_MCE */
+#endif /* !CONFIG_UML */
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -4,7 +4,6 @@
 #include <linux/linkage.h>
 #include <asm/errno.h>
 #include <asm/cpufeatures.h>
-#include <asm/mcsafe_test.h>
 #include <asm/alternative-asm.h>
 #include <asm/export.h>
 
@@ -187,117 +186,3 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
 SYM_FUNC_END(memcpy_orig)
 
 .popsection
-
-#ifndef CONFIG_UML
-
-MCSAFE_TEST_CTL
-
-/*
- * __memcpy_mcsafe - memory copy with machine check exception handling
- * Note that we only catch machine checks when reading the source addresses.
- * Writes to target are posted and don't generate machine checks.
- */
-SYM_FUNC_START(__memcpy_mcsafe)
-	cmpl $8, %edx
-	/* Less than 8 bytes? Go to byte copy loop */
-	jb .L_no_whole_words
-
-	/* Check for bad alignment of source */
-	testl $7, %esi
-	/* Already aligned */
-	jz .L_8byte_aligned
-
-	/* Copy one byte at a time until source is 8-byte aligned */
-	movl %esi, %ecx
-	andl $7, %ecx
-	subl $8, %ecx
-	negl %ecx
-	subl %ecx, %edx
-.L_read_leading_bytes:
-	movb (%rsi), %al
-	MCSAFE_TEST_SRC %rsi 1 .E_leading_bytes
-	MCSAFE_TEST_DST %rdi 1 .E_leading_bytes
-.L_write_leading_bytes:
-	movb %al, (%rdi)
-	incq %rsi
-	incq %rdi
-	decl %ecx
-	jnz .L_read_leading_bytes
-
-.L_8byte_aligned:
-	movl %edx, %ecx
-	andl $7, %edx
-	shrl $3, %ecx
-	jz .L_no_whole_words
-
-.L_read_words:
-	movq (%rsi), %r8
-	MCSAFE_TEST_SRC %rsi 8 .E_read_words
-	MCSAFE_TEST_DST %rdi 8 .E_write_words
-.L_write_words:
-	movq %r8, (%rdi)
-	addq $8, %rsi
-	addq $8, %rdi
-	decl %ecx
-	jnz .L_read_words
-
-	/* Any trailing bytes? */
-.L_no_whole_words:
-	andl %edx, %edx
-	jz .L_done_memcpy_trap
-
-	/* Copy trailing bytes */
-	movl %edx, %ecx
-.L_read_trailing_bytes:
-	movb (%rsi), %al
-	MCSAFE_TEST_SRC %rsi 1 .E_trailing_bytes
-	MCSAFE_TEST_DST %rdi 1 .E_trailing_bytes
-.L_write_trailing_bytes:
-	movb %al, (%rdi)
-	incq %rsi
-	incq %rdi
-	decl %ecx
-	jnz .L_read_trailing_bytes
-
-	/* Copy successful. Return zero */
-.L_done_memcpy_trap:
-	xorl %eax, %eax
-.L_done:
-	ret
-SYM_FUNC_END(__memcpy_mcsafe)
-EXPORT_SYMBOL_GPL(__memcpy_mcsafe)
-
-	.section .fixup, "ax"
-	/*
-	 * Return number of bytes not copied for any failure. Note that
-	 * there is no "tail" handling since the source buffer is 8-byte
-	 * aligned and poison is cacheline aligned.
-	 */
-.E_read_words:
-	shll	$3, %ecx
-.E_leading_bytes:
-	addl	%edx, %ecx
-.E_trailing_bytes:
-	mov	%ecx, %eax
-	jmp	.L_done
-
-	/*
-	 * For write fault handling, given the destination is unaligned,
-	 * we handle faults on multi-byte writes with a byte-by-byte
-	 * copy up to the write-protected page.
-	 */
-.E_write_words:
-	shll	$3, %ecx
-	addl	%edx, %ecx
-	movl	%ecx, %edx
-	jmp mcsafe_handle_tail
-
-	.previous
-
-	_ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes)
-	_ASM_EXTABLE_FAULT(.L_read_words, .E_read_words)
-	_ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes)
-	_ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes)
-	_ASM_EXTABLE(.L_write_words, .E_write_words)
-	_ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
-#endif
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -56,27 +56,6 @@ unsigned long clear_user(void __user *to
 }
 EXPORT_SYMBOL(clear_user);
 
-/*
- * Similar to copy_user_handle_tail, probe for the write fault point,
- * but reuse __memcpy_mcsafe in case a new read error is encountered.
- * clac() is handled in _copy_to_iter_mcsafe().
- */
-__visible notrace unsigned long
-mcsafe_handle_tail(char *to, char *from, unsigned len)
-{
-	for (; len; --len, to++, from++) {
-		/*
-		 * Call the assembly routine back directly since
-		 * memcpy_mcsafe() may silently fallback to memcpy.
-		 */
-		unsigned long rem = __memcpy_mcsafe(to, from, 1);
-
-		if (rem)
-			break;
-	}
-	return len;
-}
-
 #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
 /**
  * clean_cache_range - write back a cache range with CLWB
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -49,7 +49,7 @@ do {								\
 #define pmem_assign(dest, src)	((dest) = (src))
 #endif
 
-#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(DM_WRITECACHE_HAS_PMEM)
+#if IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC) && defined(DM_WRITECACHE_HAS_PMEM)
 #define DM_WRITECACHE_HANDLE_HARDWARE_ERRORS
 #endif
 
@@ -992,7 +992,8 @@ static void writecache_resume(struct dm_
 	}
 	wc->freelist_size = 0;
 
-	r = memcpy_mcsafe(&sb_seq_count, &sb(wc)->seq_count, sizeof(uint64_t));
+	r = copy_mc_to_kernel(&sb_seq_count, &sb(wc)->seq_count,
+			      sizeof(uint64_t));
 	if (r) {
 		writecache_error(wc, r, "hardware memory error when reading superblock: %d", r);
 		sb_seq_count = cpu_to_le64(0);
@@ -1008,7 +1009,8 @@ static void writecache_resume(struct dm_
 			e->seq_count = -1;
 			continue;
 		}
-		r = memcpy_mcsafe(&wme, memory_entry(wc, e), sizeof(struct wc_memory_entry));
+		r = copy_mc_to_kernel(&wme, memory_entry(wc, e),
+				      sizeof(struct wc_memory_entry));
 		if (r) {
 			writecache_error(wc, r, "hardware memory error when reading metadata entry %lu: %d",
 					 (unsigned long)b, r);
@@ -1206,7 +1208,7 @@ static void bio_copy_block(struct dm_wri
 
 		if (rw == READ) {
 			int r;
-			r = memcpy_mcsafe(buf, data, size);
+			r = copy_mc_to_kernel(buf, data, size);
 			flush_dcache_page(bio_page(bio));
 			if (unlikely(r)) {
 				writecache_error(wc, r, "hardware memory error when reading data: %d", r);
@@ -2349,7 +2351,7 @@ invalid_optional:
 		}
 	}
 
-	r = memcpy_mcsafe(&s, sb(wc), sizeof(struct wc_memory_superblock));
+	r = copy_mc_to_kernel(&s, sb(wc), sizeof(struct wc_memory_superblock));
 	if (r) {
 		ti->error = "Hardware memory error when reading superblock";
 		goto bad;
@@ -2360,7 +2362,8 @@ invalid_optional:
 			ti->error = "Unable to initialize device";
 			goto bad;
 		}
-		r = memcpy_mcsafe(&s, sb(wc), sizeof(struct wc_memory_superblock));
+		r = copy_mc_to_kernel(&s, sb(wc),
+				      sizeof(struct wc_memory_superblock));
 		if (r) {
 			ti->error = "Hardware memory error when reading superblock";
 			goto bad;
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -268,7 +268,7 @@ static int nsio_rw_bytes(struct nd_names
 	if (rw == READ) {
 		if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align)))
 			return -EIO;
-		if (memcpy_mcsafe(buf, nsio->addr + offset, size) != 0)
+		if (copy_mc_to_kernel(buf, nsio->addr + offset, size) != 0)
 			return -EIO;
 		return 0;
 	}
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -125,7 +125,7 @@ static blk_status_t read_pmem(struct pag
 	while (len) {
 		mem = kmap_atomic(page);
 		chunk = min_t(unsigned int, len, PAGE_SIZE - off);
-		rem = memcpy_mcsafe(mem + off, pmem_addr, chunk);
+		rem = copy_mc_to_kernel(mem + off, pmem_addr, chunk);
 		kunmap_atomic(mem);
 		if (rem)
 			return BLK_STS_IOERR;
@@ -305,7 +305,7 @@ static long pmem_dax_direct_access(struc
 
 /*
  * Use the 'no check' versions of copy_from_iter_flushcache() and
- * copy_to_iter_mcsafe() to bypass HARDENED_USERCOPY overhead. Bounds
+ * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
  * checking, both file offset and device offset, is handled by
  * dax_iomap_actor()
  */
@@ -318,7 +318,7 @@ static size_t pmem_copy_from_iter(struct
 static size_t pmem_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff,
 		void *addr, size_t bytes, struct iov_iter *i)
 {
-	return _copy_to_iter_mcsafe(addr, bytes, i);
+	return _copy_mc_to_iter(addr, bytes, i);
 }
 
 static const struct dax_operations pmem_dax_ops = {
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -161,20 +161,13 @@ extern int bcmp(const void *,const void
 #ifndef __HAVE_ARCH_MEMCHR
 extern void * memchr(const void *,int,__kernel_size_t);
 #endif
-#ifndef __HAVE_ARCH_MEMCPY_MCSAFE
-static inline __must_check unsigned long memcpy_mcsafe(void *dst,
-		const void *src, size_t cnt)
-{
-	memcpy(dst, src, cnt);
-	return 0;
-}
-#endif
 #ifndef __HAVE_ARCH_MEMCPY_FLUSHCACHE
 static inline void memcpy_flushcache(void *dst, const void *src, size_t cnt)
 {
 	memcpy(dst, src, cnt);
 }
 #endif
+
 void *memchr_inv(const void *s, int c, size_t n);
 char *strreplace(char *s, char old, char new);
 
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -163,6 +163,19 @@ copy_in_user(void __user *to, const void
 }
 #endif
 
+#ifndef copy_mc_to_kernel
+/*
+ * Without arch opt-in this generic copy_mc_to_kernel() will not handle
+ * #MC (or arch equivalent) during source read.
+ */
+static inline unsigned long __must_check
+copy_mc_to_kernel(void *dst, const void *src, size_t cnt)
+{
+	memcpy(dst, src, cnt);
+	return 0;
+}
+#endif
+
 static __always_inline void pagefault_disabled_inc(void)
 {
 	current->pagefault_disabled++;
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -186,10 +186,10 @@ size_t _copy_from_iter_flushcache(void *
 #define _copy_from_iter_flushcache _copy_from_iter_nocache
 #endif
 
-#ifdef CONFIG_ARCH_HAS_UACCESS_MCSAFE
-size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i);
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
 #else
-#define _copy_to_iter_mcsafe _copy_to_iter
+#define _copy_mc_to_iter _copy_to_iter
 #endif
 
 static __always_inline __must_check
@@ -202,12 +202,12 @@ size_t copy_from_iter_flushcache(void *a
 }
 
 static __always_inline __must_check
-size_t copy_to_iter_mcsafe(void *addr, size_t bytes, struct iov_iter *i)
+size_t copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i)
 {
 	if (unlikely(!check_copy_size(addr, bytes, true)))
 		return 0;
 	else
-		return _copy_to_iter_mcsafe(addr, bytes, i);
+		return _copy_mc_to_iter(addr, bytes, i);
 }
 
 size_t iov_iter_zero(size_t bytes, struct iov_iter *);
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -631,7 +631,12 @@ config UACCESS_MEMCPY
 config ARCH_HAS_UACCESS_FLUSHCACHE
 	bool
 
-config ARCH_HAS_UACCESS_MCSAFE
+# arch has a concept of a recoverable synchronous exception due to a
+# memory-read error like x86 machine-check or ARM data-abort, and
+# implements copy_mc_to_{user,kernel} to abort and report
+# 'bytes-transferred' if that exception fires when accessing the source
+# buffer.
+config ARCH_HAS_COPY_MC
 	bool
 
 # Temporary. Goes away when all archs are cleaned up
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -636,30 +636,30 @@ size_t _copy_to_iter(const void *addr, s
 }
 EXPORT_SYMBOL(_copy_to_iter);
 
-#ifdef CONFIG_ARCH_HAS_UACCESS_MCSAFE
-static int copyout_mcsafe(void __user *to, const void *from, size_t n)
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+static int copyout_mc(void __user *to, const void *from, size_t n)
 {
 	if (access_ok(to, n)) {
 		instrument_copy_to_user(to, from, n);
-		n = copy_to_user_mcsafe((__force void *) to, from, n);
+		n = copy_mc_to_user((__force void *) to, from, n);
 	}
 	return n;
 }
 
-static unsigned long memcpy_mcsafe_to_page(struct page *page, size_t offset,
+static unsigned long copy_mc_to_page(struct page *page, size_t offset,
 		const char *from, size_t len)
 {
 	unsigned long ret;
 	char *to;
 
 	to = kmap_atomic(page);
-	ret = memcpy_mcsafe(to + offset, from, len);
+	ret = copy_mc_to_kernel(to + offset, from, len);
 	kunmap_atomic(to);
 
 	return ret;
 }
 
-static size_t copy_pipe_to_iter_mcsafe(const void *addr, size_t bytes,
+static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
 				struct iov_iter *i)
 {
 	struct pipe_inode_info *pipe = i->pipe;
@@ -677,7 +677,7 @@ static size_t copy_pipe_to_iter_mcsafe(c
 		size_t chunk = min_t(size_t, n, PAGE_SIZE - off);
 		unsigned long rem;
 
-		rem = memcpy_mcsafe_to_page(pipe->bufs[i_head & p_mask].page,
+		rem = copy_mc_to_page(pipe->bufs[i_head & p_mask].page,
 					    off, addr, chunk);
 		i->head = i_head;
 		i->iov_offset = off + chunk - rem;
@@ -694,18 +694,17 @@ static size_t copy_pipe_to_iter_mcsafe(c
 }
 
 /**
- * _copy_to_iter_mcsafe - copy to user with source-read error exception handling
+ * _copy_mc_to_iter - copy to iter with source memory error exception handling
  * @addr: source kernel address
  * @bytes: total transfer length
  * @iter: destination iterator
  *
- * The pmem driver arranges for filesystem-dax to use this facility via
- * dax_copy_to_iter() for protecting read/write to persistent memory.
- * Unless / until an architecture can guarantee identical performance
- * between _copy_to_iter_mcsafe() and _copy_to_iter() it would be a
- * performance regression to switch more users to the mcsafe version.
+ * The pmem driver deploys this for the dax operation
+ * (dax_copy_to_iter()) for dax reads (bypass page-cache and the
+ * block-layer). Upon #MC read(2) aborts and returns EIO or the bytes
+ * successfully copied.
  *
- * Otherwise, the main differences between this and typical _copy_to_iter().
+ * The main differences between this and typical _copy_to_iter().
  *
  * * Typical tail/residue handling after a fault retries the copy
  *   byte-by-byte until the fault happens again. Re-triggering machine
@@ -716,23 +715,22 @@ static size_t copy_pipe_to_iter_mcsafe(c
  * * ITER_KVEC, ITER_PIPE, and ITER_BVEC can return short copies.
  *   Compare to copy_to_iter() where only ITER_IOVEC attempts might return
  *   a short copy.
- *
- * See MCSAFE_TEST for self-test.
  */
-size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i)
+size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
 	const char *from = addr;
 	unsigned long rem, curr_addr, s_addr = (unsigned long) addr;
 
 	if (unlikely(iov_iter_is_pipe(i)))
-		return copy_pipe_to_iter_mcsafe(addr, bytes, i);
+		return copy_mc_pipe_to_iter(addr, bytes, i);
 	if (iter_is_iovec(i))
 		might_fault();
 	iterate_and_advance(i, bytes, v,
-		copyout_mcsafe(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
+		copyout_mc(v.iov_base, (from += v.iov_len) - v.iov_len,
+			   v.iov_len),
 		({
-		rem = memcpy_mcsafe_to_page(v.bv_page, v.bv_offset,
-                               (from += v.bv_len) - v.bv_len, v.bv_len);
+		rem = copy_mc_to_page(v.bv_page, v.bv_offset,
+				      (from += v.bv_len) - v.bv_len, v.bv_len);
 		if (rem) {
 			curr_addr = (unsigned long) from;
 			bytes = curr_addr - s_addr - rem;
@@ -740,8 +738,8 @@ size_t _copy_to_iter_mcsafe(const void *
 		}
 		}),
 		({
-		rem = memcpy_mcsafe(v.iov_base, (from += v.iov_len) - v.iov_len,
-				v.iov_len);
+		rem = copy_mc_to_kernel(v.iov_base, (from += v.iov_len)
+					- v.iov_len, v.iov_len);
 		if (rem) {
 			curr_addr = (unsigned long) from;
 			bytes = curr_addr - s_addr - rem;
@@ -752,8 +750,8 @@ size_t _copy_to_iter_mcsafe(const void *
 
 	return bytes;
 }
-EXPORT_SYMBOL_GPL(_copy_to_iter_mcsafe);
-#endif /* CONFIG_ARCH_HAS_UACCESS_MCSAFE */
+EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
+#endif /* CONFIG_ARCH_HAS_COPY_MC */
 
 size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 {
--- a/tools/arch/x86/include/asm/mcsafe_test.h
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _MCSAFE_TEST_H_
-#define _MCSAFE_TEST_H_
-
-.macro MCSAFE_TEST_CTL
-.endm
-
-.macro MCSAFE_TEST_SRC reg count target
-.endm
-
-.macro MCSAFE_TEST_DST reg count target
-.endm
-#endif /* _MCSAFE_TEST_H_ */
--- a/tools/arch/x86/lib/memcpy_64.S
+++ b/tools/arch/x86/lib/memcpy_64.S
@@ -4,7 +4,6 @@
 #include <linux/linkage.h>
 #include <asm/errno.h>
 #include <asm/cpufeatures.h>
-#include <asm/mcsafe_test.h>
 #include <asm/alternative-asm.h>
 #include <asm/export.h>
 
@@ -187,117 +186,3 @@ SYM_FUNC_START(memcpy_orig)
 SYM_FUNC_END(memcpy_orig)
 
 .popsection
-
-#ifndef CONFIG_UML
-
-MCSAFE_TEST_CTL
-
-/*
- * __memcpy_mcsafe - memory copy with machine check exception handling
- * Note that we only catch machine checks when reading the source addresses.
- * Writes to target are posted and don't generate machine checks.
- */
-SYM_FUNC_START(__memcpy_mcsafe)
-	cmpl $8, %edx
-	/* Less than 8 bytes? Go to byte copy loop */
-	jb .L_no_whole_words
-
-	/* Check for bad alignment of source */
-	testl $7, %esi
-	/* Already aligned */
-	jz .L_8byte_aligned
-
-	/* Copy one byte at a time until source is 8-byte aligned */
-	movl %esi, %ecx
-	andl $7, %ecx
-	subl $8, %ecx
-	negl %ecx
-	subl %ecx, %edx
-.L_read_leading_bytes:
-	movb (%rsi), %al
-	MCSAFE_TEST_SRC %rsi 1 .E_leading_bytes
-	MCSAFE_TEST_DST %rdi 1 .E_leading_bytes
-.L_write_leading_bytes:
-	movb %al, (%rdi)
-	incq %rsi
-	incq %rdi
-	decl %ecx
-	jnz .L_read_leading_bytes
-
-.L_8byte_aligned:
-	movl %edx, %ecx
-	andl $7, %edx
-	shrl $3, %ecx
-	jz .L_no_whole_words
-
-.L_read_words:
-	movq (%rsi), %r8
-	MCSAFE_TEST_SRC %rsi 8 .E_read_words
-	MCSAFE_TEST_DST %rdi 8 .E_write_words
-.L_write_words:
-	movq %r8, (%rdi)
-	addq $8, %rsi
-	addq $8, %rdi
-	decl %ecx
-	jnz .L_read_words
-
-	/* Any trailing bytes? */
-.L_no_whole_words:
-	andl %edx, %edx
-	jz .L_done_memcpy_trap
-
-	/* Copy trailing bytes */
-	movl %edx, %ecx
-.L_read_trailing_bytes:
-	movb (%rsi), %al
-	MCSAFE_TEST_SRC %rsi 1 .E_trailing_bytes
-	MCSAFE_TEST_DST %rdi 1 .E_trailing_bytes
-.L_write_trailing_bytes:
-	movb %al, (%rdi)
-	incq %rsi
-	incq %rdi
-	decl %ecx
-	jnz .L_read_trailing_bytes
-
-	/* Copy successful. Return zero */
-.L_done_memcpy_trap:
-	xorl %eax, %eax
-.L_done:
-	ret
-SYM_FUNC_END(__memcpy_mcsafe)
-EXPORT_SYMBOL_GPL(__memcpy_mcsafe)
-
-	.section .fixup, "ax"
-	/*
-	 * Return number of bytes not copied for any failure. Note that
-	 * there is no "tail" handling since the source buffer is 8-byte
-	 * aligned and poison is cacheline aligned.
-	 */
-.E_read_words:
-	shll	$3, %ecx
-.E_leading_bytes:
-	addl	%edx, %ecx
-.E_trailing_bytes:
-	mov	%ecx, %eax
-	jmp	.L_done
-
-	/*
-	 * For write fault handling, given the destination is unaligned,
-	 * we handle faults on multi-byte writes with a byte-by-byte
-	 * copy up to the write-protected page.
-	 */
-.E_write_words:
-	shll	$3, %ecx
-	addl	%edx, %ecx
-	movl	%ecx, %edx
-	jmp mcsafe_handle_tail
-
-	.previous
-
-	_ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes)
-	_ASM_EXTABLE_FAULT(.L_read_words, .E_read_words)
-	_ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes)
-	_ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes)
-	_ASM_EXTABLE(.L_write_words, .E_write_words)
-	_ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
-#endif
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -548,8 +548,8 @@ static const char *uaccess_safe_builtin[
 	"__ubsan_handle_shift_out_of_bounds",
 	/* misc */
 	"csum_partial_copy_generic",
-	"__memcpy_mcsafe",
-	"mcsafe_handle_tail",
+	"copy_mc_fragile",
+	"copy_mc_fragile_handle_tail",
 	"ftrace_likely_update", /* CONFIG_TRACE_BRANCH_PROFILING */
 	NULL
 };
--- a/tools/perf/bench/Build
+++ b/tools/perf/bench/Build
@@ -11,7 +11,6 @@ perf-y += epoll-ctl.o
 perf-y += synthesize.o
 perf-y += kallsyms-parse.o
 
-perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-lib.o
 perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o
 perf-$(CONFIG_X86_64) += mem-memset-x86-64-asm.o
 
--- a/tools/perf/bench/mem-memcpy-x86-64-lib.c
+++ /dev/null
@@ -1,24 +0,0 @@
-/*
- * From code in arch/x86/lib/usercopy_64.c, copied to keep tools/ copy
- * of the kernel's arch/x86/lib/memcpy_64.s used in 'perf bench mem memcpy'
- * happy.
- */
-#include <linux/types.h>
-
-unsigned long __memcpy_mcsafe(void *dst, const void *src, size_t cnt);
-unsigned long mcsafe_handle_tail(char *to, char *from, unsigned len);
-
-unsigned long mcsafe_handle_tail(char *to, char *from, unsigned len)
-{
-	for (; len; --len, to++, from++) {
-		/*
-		 * Call the assembly routine back directly since
-		 * memcpy_mcsafe() may silently fallback to memcpy.
-		 */
-		unsigned long rem = __memcpy_mcsafe(to, from, 1);
-
-		if (rem)
-			break;
-	}
-	return len;
-}
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -23,7 +23,8 @@
 #include "nfit_test.h"
 #include "../watermark.h"
 
-#include <asm/mcsafe_test.h>
+#include <asm/copy_mc_test.h>
+#include <asm/mce.h>
 
 /*
  * Generate an NFIT table to describe the following topology:
@@ -3052,7 +3053,7 @@ static struct platform_driver nfit_test_
 	.id_table = nfit_test_id,
 };
 
-static char mcsafe_buf[PAGE_SIZE] __attribute__((__aligned__(PAGE_SIZE)));
+static char copy_mc_buf[PAGE_SIZE] __attribute__((__aligned__(PAGE_SIZE)));
 
 enum INJECT {
 	INJECT_NONE,
@@ -3060,7 +3061,7 @@ enum INJECT {
 	INJECT_DST,
 };
 
-static void mcsafe_test_init(char *dst, char *src, size_t size)
+static void copy_mc_test_init(char *dst, char *src, size_t size)
 {
 	size_t i;
 
@@ -3069,7 +3070,7 @@ static void mcsafe_test_init(char *dst,
 		src[i] = (char) i;
 }
 
-static bool mcsafe_test_validate(unsigned char *dst, unsigned char *src,
+static bool copy_mc_test_validate(unsigned char *dst, unsigned char *src,
 		size_t size, unsigned long rem)
 {
 	size_t i;
@@ -3090,12 +3091,12 @@ static bool mcsafe_test_validate(unsigne
 	return true;
 }
 
-void mcsafe_test(void)
+void copy_mc_test(void)
 {
 	char *inject_desc[] = { "none", "source", "destination" };
 	enum INJECT inj;
 
-	if (IS_ENABLED(CONFIG_MCSAFE_TEST)) {
+	if (IS_ENABLED(CONFIG_COPY_MC_TEST)) {
 		pr_info("%s: run...\n", __func__);
 	} else {
 		pr_info("%s: disabled, skip.\n", __func__);
@@ -3113,31 +3114,31 @@ void mcsafe_test(void)
 
 			switch (inj) {
 			case INJECT_NONE:
-				mcsafe_inject_src(NULL);
-				mcsafe_inject_dst(NULL);
-				dst = &mcsafe_buf[2048];
-				src = &mcsafe_buf[1024 - i];
+				copy_mc_inject_src(NULL);
+				copy_mc_inject_dst(NULL);
+				dst = &copy_mc_buf[2048];
+				src = &copy_mc_buf[1024 - i];
 				expect = 0;
 				break;
 			case INJECT_SRC:
-				mcsafe_inject_src(&mcsafe_buf[1024]);
-				mcsafe_inject_dst(NULL);
-				dst = &mcsafe_buf[2048];
-				src = &mcsafe_buf[1024 - i];
+				copy_mc_inject_src(&copy_mc_buf[1024]);
+				copy_mc_inject_dst(NULL);
+				dst = &copy_mc_buf[2048];
+				src = &copy_mc_buf[1024 - i];
 				expect = 512 - i;
 				break;
 			case INJECT_DST:
-				mcsafe_inject_src(NULL);
-				mcsafe_inject_dst(&mcsafe_buf[2048]);
-				dst = &mcsafe_buf[2048 - i];
-				src = &mcsafe_buf[1024];
+				copy_mc_inject_src(NULL);
+				copy_mc_inject_dst(&copy_mc_buf[2048]);
+				dst = &copy_mc_buf[2048 - i];
+				src = &copy_mc_buf[1024];
 				expect = 512 - i;
 				break;
 			}
 
-			mcsafe_test_init(dst, src, 512);
-			rem = __memcpy_mcsafe(dst, src, 512);
-			valid = mcsafe_test_validate(dst, src, 512, expect);
+			copy_mc_test_init(dst, src, 512);
+			rem = copy_mc_fragile(dst, src, 512);
+			valid = copy_mc_test_validate(dst, src, 512, expect);
 			if (rem == expect && valid)
 				continue;
 			pr_info("%s: copy(%#lx, %#lx, %d) off: %d rem: %ld %s expect: %ld\n",
@@ -3149,8 +3150,8 @@ void mcsafe_test(void)
 		}
 	}
 
-	mcsafe_inject_src(NULL);
-	mcsafe_inject_dst(NULL);
+	copy_mc_inject_src(NULL);
+	copy_mc_inject_dst(NULL);
 }
 
 static __init int nfit_test_init(void)
@@ -3161,7 +3162,7 @@ static __init int nfit_test_init(void)
 	libnvdimm_test();
 	acpi_nfit_test();
 	device_dax_test();
-	mcsafe_test();
+	copy_mc_test();
 	dax_pmem_test();
 	dax_pmem_core_test();
 #ifdef CONFIG_DEV_DAX_PMEM_COMPAT
--- a/tools/testing/selftests/powerpc/copyloops/.gitignore
+++ b/tools/testing/selftests/powerpc/copyloops/.gitignore
@@ -12,4 +12,4 @@ memcpy_p7_t1
 copyuser_64_exc_t0
 copyuser_64_exc_t1
 copyuser_64_exc_t2
-memcpy_mcsafe_64
+copy_mc_64
--- a/tools/testing/selftests/powerpc/copyloops/Makefile
+++ b/tools/testing/selftests/powerpc/copyloops/Makefile
@@ -12,7 +12,7 @@ ASFLAGS = $(CFLAGS) -Wa,-mpower4
 TEST_GEN_PROGS := copyuser_64_t0 copyuser_64_t1 copyuser_64_t2 \
 		copyuser_p7_t0 copyuser_p7_t1 \
 		memcpy_64_t0 memcpy_64_t1 memcpy_64_t2 \
-		memcpy_p7_t0 memcpy_p7_t1 memcpy_mcsafe_64 \
+		memcpy_p7_t0 memcpy_p7_t1 copy_mc_64 \
 		copyuser_64_exc_t0 copyuser_64_exc_t1 copyuser_64_exc_t2
 
 EXTRA_SOURCES := validate.c ../harness.c stubs.S
@@ -45,9 +45,9 @@ $(OUTPUT)/memcpy_p7_t%:	memcpy_power7.S
 		-D SELFTEST_CASE=$(subst memcpy_p7_t,,$(notdir $@)) \
 		-o $@ $^
 
-$(OUTPUT)/memcpy_mcsafe_64: memcpy_mcsafe_64.S $(EXTRA_SOURCES)
+$(OUTPUT)/copy_mc_64: copy_mc_64.S $(EXTRA_SOURCES)
 	$(CC) $(CPPFLAGS) $(CFLAGS) \
-		-D COPY_LOOP=test_memcpy_mcsafe \
+		-D COPY_LOOP=test_copy_mc_generic \
 		-o $@ $^
 
 $(OUTPUT)/copyuser_64_exc_t%: copyuser_64.S exc_validate.c ../harness.c \
--- /dev/null
+++ b/tools/testing/selftests/powerpc/copyloops/copy_mc_64.S
@@ -0,0 +1,242 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) IBM Corporation, 2011
+ * Derived from copyuser_power7.s by Anton Blanchard <anton@au.ibm.com>
+ * Author - Balbir Singh <bsingharora@gmail.com>
+ */
+#include <asm/ppc_asm.h>
+#include <asm/errno.h>
+#include <asm/export.h>
+
+	.macro err1
+100:
+	EX_TABLE(100b,.Ldo_err1)
+	.endm
+
+	.macro err2
+200:
+	EX_TABLE(200b,.Ldo_err2)
+	.endm
+
+	.macro err3
+300:	EX_TABLE(300b,.Ldone)
+	.endm
+
+.Ldo_err2:
+	ld	r22,STK_REG(R22)(r1)
+	ld	r21,STK_REG(R21)(r1)
+	ld	r20,STK_REG(R20)(r1)
+	ld	r19,STK_REG(R19)(r1)
+	ld	r18,STK_REG(R18)(r1)
+	ld	r17,STK_REG(R17)(r1)
+	ld	r16,STK_REG(R16)(r1)
+	ld	r15,STK_REG(R15)(r1)
+	ld	r14,STK_REG(R14)(r1)
+	addi	r1,r1,STACKFRAMESIZE
+.Ldo_err1:
+	/* Do a byte by byte copy to get the exact remaining size */
+	mtctr	r7
+46:
+err3;	lbz	r0,0(r4)
+	addi	r4,r4,1
+err3;	stb	r0,0(r3)
+	addi	r3,r3,1
+	bdnz	46b
+	li	r3,0
+	blr
+
+.Ldone:
+	mfctr	r3
+	blr
+
+
+_GLOBAL(copy_mc_generic)
+	mr	r7,r5
+	cmpldi	r5,16
+	blt	.Lshort_copy
+
+.Lcopy:
+	/* Get the source 8B aligned */
+	neg	r6,r4
+	mtocrf	0x01,r6
+	clrldi	r6,r6,(64-3)
+
+	bf	cr7*4+3,1f
+err1;	lbz	r0,0(r4)
+	addi	r4,r4,1
+err1;	stb	r0,0(r3)
+	addi	r3,r3,1
+	subi	r7,r7,1
+
+1:	bf	cr7*4+2,2f
+err1;	lhz	r0,0(r4)
+	addi	r4,r4,2
+err1;	sth	r0,0(r3)
+	addi	r3,r3,2
+	subi	r7,r7,2
+
+2:	bf	cr7*4+1,3f
+err1;	lwz	r0,0(r4)
+	addi	r4,r4,4
+err1;	stw	r0,0(r3)
+	addi	r3,r3,4
+	subi	r7,r7,4
+
+3:	sub	r5,r5,r6
+	cmpldi	r5,128
+
+	mflr	r0
+	stdu	r1,-STACKFRAMESIZE(r1)
+	std	r14,STK_REG(R14)(r1)
+	std	r15,STK_REG(R15)(r1)
+	std	r16,STK_REG(R16)(r1)
+	std	r17,STK_REG(R17)(r1)
+	std	r18,STK_REG(R18)(r1)
+	std	r19,STK_REG(R19)(r1)
+	std	r20,STK_REG(R20)(r1)
+	std	r21,STK_REG(R21)(r1)
+	std	r22,STK_REG(R22)(r1)
+	std	r0,STACKFRAMESIZE+16(r1)
+
+	blt	5f
+	srdi	r6,r5,7
+	mtctr	r6
+
+	/* Now do cacheline (128B) sized loads and stores. */
+	.align	5
+4:
+err2;	ld	r0,0(r4)
+err2;	ld	r6,8(r4)
+err2;	ld	r8,16(r4)
+err2;	ld	r9,24(r4)
+err2;	ld	r10,32(r4)
+err2;	ld	r11,40(r4)
+err2;	ld	r12,48(r4)
+err2;	ld	r14,56(r4)
+err2;	ld	r15,64(r4)
+err2;	ld	r16,72(r4)
+err2;	ld	r17,80(r4)
+err2;	ld	r18,88(r4)
+err2;	ld	r19,96(r4)
+err2;	ld	r20,104(r4)
+err2;	ld	r21,112(r4)
+err2;	ld	r22,120(r4)
+	addi	r4,r4,128
+err2;	std	r0,0(r3)
+err2;	std	r6,8(r3)
+err2;	std	r8,16(r3)
+err2;	std	r9,24(r3)
+err2;	std	r10,32(r3)
+err2;	std	r11,40(r3)
+err2;	std	r12,48(r3)
+err2;	std	r14,56(r3)
+err2;	std	r15,64(r3)
+err2;	std	r16,72(r3)
+err2;	std	r17,80(r3)
+err2;	std	r18,88(r3)
+err2;	std	r19,96(r3)
+err2;	std	r20,104(r3)
+err2;	std	r21,112(r3)
+err2;	std	r22,120(r3)
+	addi	r3,r3,128
+	subi	r7,r7,128
+	bdnz	4b
+
+	clrldi	r5,r5,(64-7)
+
+	/* Up to 127B to go */
+5:	srdi	r6,r5,4
+	mtocrf	0x01,r6
+
+6:	bf	cr7*4+1,7f
+err2;	ld	r0,0(r4)
+err2;	ld	r6,8(r4)
+err2;	ld	r8,16(r4)
+err2;	ld	r9,24(r4)
+err2;	ld	r10,32(r4)
+err2;	ld	r11,40(r4)
+err2;	ld	r12,48(r4)
+err2;	ld	r14,56(r4)
+	addi	r4,r4,64
+err2;	std	r0,0(r3)
+err2;	std	r6,8(r3)
+err2;	std	r8,16(r3)
+err2;	std	r9,24(r3)
+err2;	std	r10,32(r3)
+err2;	std	r11,40(r3)
+err2;	std	r12,48(r3)
+err2;	std	r14,56(r3)
+	addi	r3,r3,64
+	subi	r7,r7,64
+
+7:	ld	r14,STK_REG(R14)(r1)
+	ld	r15,STK_REG(R15)(r1)
+	ld	r16,STK_REG(R16)(r1)
+	ld	r17,STK_REG(R17)(r1)
+	ld	r18,STK_REG(R18)(r1)
+	ld	r19,STK_REG(R19)(r1)
+	ld	r20,STK_REG(R20)(r1)
+	ld	r21,STK_REG(R21)(r1)
+	ld	r22,STK_REG(R22)(r1)
+	addi	r1,r1,STACKFRAMESIZE
+
+	/* Up to 63B to go */
+	bf	cr7*4+2,8f
+err1;	ld	r0,0(r4)
+err1;	ld	r6,8(r4)
+err1;	ld	r8,16(r4)
+err1;	ld	r9,24(r4)
+	addi	r4,r4,32
+err1;	std	r0,0(r3)
+err1;	std	r6,8(r3)
+err1;	std	r8,16(r3)
+err1;	std	r9,24(r3)
+	addi	r3,r3,32
+	subi	r7,r7,32
+
+	/* Up to 31B to go */
+8:	bf	cr7*4+3,9f
+err1;	ld	r0,0(r4)
+err1;	ld	r6,8(r4)
+	addi	r4,r4,16
+err1;	std	r0,0(r3)
+err1;	std	r6,8(r3)
+	addi	r3,r3,16
+	subi	r7,r7,16
+
+9:	clrldi	r5,r5,(64-4)
+
+	/* Up to 15B to go */
+.Lshort_copy:
+	mtocrf	0x01,r5
+	bf	cr7*4+0,12f
+err1;	lwz	r0,0(r4)	/* Less chance of a reject with word ops */
+err1;	lwz	r6,4(r4)
+	addi	r4,r4,8
+err1;	stw	r0,0(r3)
+err1;	stw	r6,4(r3)
+	addi	r3,r3,8
+	subi	r7,r7,8
+
+12:	bf	cr7*4+1,13f
+err1;	lwz	r0,0(r4)
+	addi	r4,r4,4
+err1;	stw	r0,0(r3)
+	addi	r3,r3,4
+	subi	r7,r7,4
+
+13:	bf	cr7*4+2,14f
+err1;	lhz	r0,0(r4)
+	addi	r4,r4,2
+err1;	sth	r0,0(r3)
+	addi	r3,r3,2
+	subi	r7,r7,2
+
+14:	bf	cr7*4+3,15f
+err1;	lbz	r0,0(r4)
+err1;	stb	r0,0(r3)
+
+15:	li	r3,0
+	blr
+
+EXPORT_SYMBOL_GPL(copy_mc_generic);



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 25/70] x86/copy_mc: Introduce copy_mc_enhanced_fast_string()
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 24/70] x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 26/70] efivarfs: Replace invalid slashes with exclamation marks in dentries Greg Kroah-Hartman
                   ` (47 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Erwin Tsaur, 0day robot,
	Dan Williams, Borislav Petkov, Tony Luck

From: Dan Williams <dan.j.williams@intel.com>

commit 5da8e4a658109e3b7e1f45ae672b7c06ac3e7158 upstream.

The motivations to go rework memcpy_mcsafe() are that the benefit of
doing slow and careful copies is obviated on newer CPUs, and that the
current opt-in list of CPUs to instrument recovery is broken relative to
those CPUs.  There is no need to keep an opt-in list up to date on an
ongoing basis if pmem/dax operations are instrumented for recovery by
default. With recovery enabled by default the old "mcsafe_key" opt-in to
careful copying can be made a "fragile" opt-out. Where the "fragile"
list takes steps to not consume poison across cachelines.

The discussion with Linus made clear that the current "_mcsafe" suffix
was imprecise to a fault. The operations that are needed by pmem/dax are
to copy from a source address that might throw #MC to a destination that
may write-fault, if it is a user page.

So copy_to_user_mcsafe() becomes copy_mc_to_user() to indicate
the separate precautions taken on source and destination.
copy_mc_to_kernel() is introduced as a non-SMAP version that does not
expect write-faults on the destination, but is still prepared to abort
with an error code upon taking #MC.

The original copy_mc_fragile() implementation had negative performance
implications since it did not use the fast-string instruction sequence
to perform copies. For this reason copy_mc_to_kernel() fell back to
plain memcpy() to preserve performance on platforms that did not indicate
the capability to recover from machine check exceptions. However, that
capability detection was not architectural and now that some platforms
can recover from fast-string consumption of memory errors the memcpy()
fallback now causes these more capable platforms to fail.

Introduce copy_mc_enhanced_fast_string() as the fast default
implementation of copy_mc_to_kernel() and finalize the transition of
copy_mc_fragile() to be a platform quirk to indicate 'copy-carefully'.
With this in place, copy_mc_to_kernel() is fast and recovery-ready by
default regardless of hardware capability.

Thanks to Vivek for identifying that copy_user_generic() is not suitable
as the copy_mc_to_user() backend since the #MC handler explicitly checks
ex_has_fault_handler(). Thanks to the 0day robot for catching a
performance bug in the x86/copy_mc_to_user implementation.

 [ bp: Add the "why" for this change from the 0/2th message, massage. ]

Fixes: 92b0729c34ca ("x86/mm, x86/mce: Add memcpy_mcsafe()")
Reported-by: Erwin Tsaur <erwin.tsaur@intel.com>
Reported-by: 0day robot <lkp@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Erwin Tsaur <erwin.tsaur@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/160195562556.2163339.18063423034951948973.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/lib/copy_mc.c    |   32 +++++++++++++++++++++++---------
 arch/x86/lib/copy_mc_64.S |   36 ++++++++++++++++++++++++++++++++++++
 tools/objtool/check.c     |    1 +
 3 files changed, 60 insertions(+), 9 deletions(-)

--- a/arch/x86/lib/copy_mc.c
+++ b/arch/x86/lib/copy_mc.c
@@ -45,6 +45,8 @@ void enable_copy_mc_fragile(void)
 #define copy_mc_fragile_enabled (0)
 #endif
 
+unsigned long copy_mc_enhanced_fast_string(void *dst, const void *src, unsigned len);
+
 /**
  * copy_mc_to_kernel - memory copy that handles source exceptions
  *
@@ -52,9 +54,11 @@ void enable_copy_mc_fragile(void)
  * @src:	source address
  * @len:	number of bytes to copy
  *
- * Call into the 'fragile' version on systems that have trouble
- * actually do machine check recovery. Everyone else can just
- * use memcpy().
+ * Call into the 'fragile' version on systems that benefit from avoiding
+ * corner case poison consumption scenarios, For example, accessing
+ * poison across 2 cachelines with a single instruction. Almost all
+ * other uses case can use copy_mc_enhanced_fast_string() for a fast
+ * recoverable copy, or fallback to plain memcpy.
  *
  * Return 0 for success, or number of bytes not copied if there was an
  * exception.
@@ -63,6 +67,8 @@ unsigned long __must_check copy_mc_to_ke
 {
 	if (copy_mc_fragile_enabled)
 		return copy_mc_fragile(dst, src, len);
+	if (static_cpu_has(X86_FEATURE_ERMS))
+		return copy_mc_enhanced_fast_string(dst, src, len);
 	memcpy(dst, src, len);
 	return 0;
 }
@@ -72,11 +78,19 @@ unsigned long __must_check copy_mc_to_us
 {
 	unsigned long ret;
 
-	if (!copy_mc_fragile_enabled)
-		return copy_user_generic(dst, src, len);
+	if (copy_mc_fragile_enabled) {
+		__uaccess_begin();
+		ret = copy_mc_fragile(dst, src, len);
+		__uaccess_end();
+		return ret;
+	}
+
+	if (static_cpu_has(X86_FEATURE_ERMS)) {
+		__uaccess_begin();
+		ret = copy_mc_enhanced_fast_string(dst, src, len);
+		__uaccess_end();
+		return ret;
+	}
 
-	__uaccess_begin();
-	ret = copy_mc_fragile(dst, src, len);
-	__uaccess_end();
-	return ret;
+	return copy_user_generic(dst, src, len);
 }
--- a/arch/x86/lib/copy_mc_64.S
+++ b/arch/x86/lib/copy_mc_64.S
@@ -124,4 +124,40 @@ EXPORT_SYMBOL_GPL(copy_mc_fragile)
 	_ASM_EXTABLE(.L_write_words, .E_write_words)
 	_ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
 #endif /* CONFIG_X86_MCE */
+
+/*
+ * copy_mc_enhanced_fast_string - memory copy with exception handling
+ *
+ * Fast string copy + fault / exception handling. If the CPU does
+ * support machine check exception recovery, but does not support
+ * recovering from fast-string exceptions then this CPU needs to be
+ * added to the copy_mc_fragile_key set of quirks. Otherwise, absent any
+ * machine check recovery support this version should be no slower than
+ * standard memcpy.
+ */
+SYM_FUNC_START(copy_mc_enhanced_fast_string)
+	movq %rdi, %rax
+	movq %rdx, %rcx
+.L_copy:
+	rep movsb
+	/* Copy successful. Return zero */
+	xorl %eax, %eax
+	ret
+SYM_FUNC_END(copy_mc_enhanced_fast_string)
+
+	.section .fixup, "ax"
+.E_copy:
+	/*
+	 * On fault %rcx is updated such that the copy instruction could
+	 * optionally be restarted at the fault position, i.e. it
+	 * contains 'bytes remaining'. A non-zero return indicates error
+	 * to copy_mc_generic() users, or indicate short transfers to
+	 * user-copy routines.
+	 */
+	movq %rcx, %rax
+	ret
+
+	.previous
+
+	_ASM_EXTABLE_FAULT(.L_copy, .E_copy)
 #endif /* !CONFIG_UML */
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -550,6 +550,7 @@ static const char *uaccess_safe_builtin[
 	"csum_partial_copy_generic",
 	"copy_mc_fragile",
 	"copy_mc_fragile_handle_tail",
+	"copy_mc_enhanced_fast_string",
 	"ftrace_likely_update", /* CONFIG_TRACE_BRANCH_PROFILING */
 	NULL
 };



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 26/70] efivarfs: Replace invalid slashes with exclamation marks in dentries.
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 25/70] x86/copy_mc: Introduce copy_mc_enhanced_fast_string() Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:35 ` [PATCH 5.8 27/70] bnxt_en: Check abort error state in bnxt_open_nic() Greg Kroah-Hartman
                   ` (46 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Michael Schaller, Ard Biesheuvel,
	dann frazier

From: Michael Schaller <misch@google.com>

commit 336af6a4686d885a067ecea8c3c3dd129ba4fc75 upstream.

Without this patch efivarfs_alloc_dentry creates dentries with slashes in
their name if the respective EFI variable has slashes in its name. This in
turn causes EIO on getdents64, which prevents a complete directory listing
of /sys/firmware/efi/efivars/.

This patch replaces the invalid shlashes with exclamation marks like
kobject_set_name_vargs does for /sys/firmware/efi/vars/ to have consistently
named dentries under /sys/firmware/efi/vars/ and /sys/firmware/efi/efivars/.

Signed-off-by: Michael Schaller <misch@google.com>
Link: https://lore.kernel.org/r/20200925074502.150448-1-misch@google.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/efivarfs/super.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/fs/efivarfs/super.c
+++ b/fs/efivarfs/super.c
@@ -141,6 +141,9 @@ static int efivarfs_callback(efi_char16_
 
 	name[len + EFI_VARIABLE_GUID_LEN+1] = '\0';
 
+	/* replace invalid slashes like kobject_set_name_vargs does for /sys/firmware/efi/vars. */
+	strreplace(name, '/', '!');
+
 	inode = efivarfs_get_inode(sb, d_inode(root), S_IFREG | 0644, 0,
 				   is_removable);
 	if (!inode)



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 27/70] bnxt_en: Check abort error state in bnxt_open_nic().
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 26/70] efivarfs: Replace invalid slashes with exclamation marks in dentries Greg Kroah-Hartman
@ 2020-10-31 11:35 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 28/70] bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one() Greg Kroah-Hartman
                   ` (45 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vasundhara Volam, Pavan Chebbi,
	Michael Chan, Jakub Kicinski

From: Michael Chan <michael.chan@broadcom.com>

[ Upstream commit a1301f08c5acf992d9c1fafddc84c3a822844b04 ]

bnxt_open_nic() is called during configuration changes that require
the NIC to be closed and then opened.  This call is protected by
rtnl_lock.  Firmware reset can be happening at the same time.  Only
critical portions of the entire firmware reset sequence are protected
by the rtnl_lock.  It is possible that bnxt_open_nic() can be called
when the firmware reset sequence is aborting.  In that case,
bnxt_open_nic() needs to check if the ABORT_ERR flag is set and
abort if it is.  The configuration change that resulted in the
bnxt_open_nic() call will fail but the NIC will be brought to a
consistent IF_DOWN state.

Without this patch, if bnxt_open_nic() were to continue in this error
state, it may crash like this:

[ 1648.659736] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1648.659768] IP: [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
[ 1648.659796] PGD 101e1b3067 PUD 101e1b2067 PMD 0
[ 1648.659813] Oops: 0000 [#1] SMP
[ 1648.659825] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dell_smbios dell_wmi_descriptor dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper vfat cryptd fat pcspkr ipmi_ssif sg k10temp i2c_piix4 wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci megaraid_sas crct10dif_pclmul crct10dif_common
[ 1648.660063]  tg3 libata crc32c_intel bnxt_en(OE) drm_panel_orientation_quirks devlink ptp pps_core dm_mirror dm_region_hash dm_log dm_mod fuse
[ 1648.660105] CPU: 13 PID: 3867 Comm: ethtool Kdump: loaded Tainted: G           OE  ------------   3.10.0-1152.el7.x86_64 #1
[ 1648.660911] Hardware name: Dell Inc. PowerEdge R7515/0R4CNN, BIOS 1.2.14 01/28/2020
[ 1648.661662] task: ffff94e64cbc9080 ti: ffff94f55df1c000 task.ti: ffff94f55df1c000
[ 1648.662409] RIP: 0010:[<ffffffffc01e9b3a>]  [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
[ 1648.663171] RSP: 0018:ffff94f55df1fba8  EFLAGS: 00010202
[ 1648.663927] RAX: 0000000000000000 RBX: ffff94e6827e0000 RCX: 0000000000000000
[ 1648.664684] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94e6827e08c0
[ 1648.665433] RBP: ffff94f55df1fc20 R08: 00000000000001ff R09: 0000000000000008
[ 1648.666184] R10: 0000000000000d53 R11: ffff94f55df1f7ce R12: ffff94e6827e08c0
[ 1648.666940] R13: ffff94e6827e08c0 R14: ffff94e6827e08c0 R15: ffffffffb9115e40
[ 1648.667695] FS:  00007f8aadba5740(0000) GS:ffff94f57eb40000(0000) knlGS:0000000000000000
[ 1648.668447] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1648.669202] CR2: 0000000000000000 CR3: 0000001022772000 CR4: 0000000000340fe0
[ 1648.669966] Call Trace:
[ 1648.670730]  [<ffffffffc01f1d5d>] ? bnxt_need_reserve_rings+0x9d/0x170 [bnxt_en]
[ 1648.671496]  [<ffffffffc01fa7ea>] __bnxt_open_nic+0x8a/0x9a0 [bnxt_en]
[ 1648.672263]  [<ffffffffc01f7479>] ? bnxt_close_nic+0x59/0x1b0 [bnxt_en]
[ 1648.673031]  [<ffffffffc01fb11b>] bnxt_open_nic+0x1b/0x50 [bnxt_en]
[ 1648.673793]  [<ffffffffc020037c>] bnxt_set_ringparam+0x6c/0xa0 [bnxt_en]
[ 1648.674550]  [<ffffffffb8a5f564>] dev_ethtool+0x1334/0x21a0
[ 1648.675306]  [<ffffffffb8a719ff>] dev_ioctl+0x1ef/0x5f0
[ 1648.676061]  [<ffffffffb8a324bd>] sock_do_ioctl+0x4d/0x60
[ 1648.676810]  [<ffffffffb8a326bb>] sock_ioctl+0x1eb/0x2d0
[ 1648.677548]  [<ffffffffb8663230>] do_vfs_ioctl+0x3a0/0x5b0
[ 1648.678282]  [<ffffffffb8b8e678>] ? __do_page_fault+0x238/0x500
[ 1648.679016]  [<ffffffffb86634e1>] SyS_ioctl+0xa1/0xc0
[ 1648.679745]  [<ffffffffb8b93f92>] system_call_fastpath+0x25/0x2a
[ 1648.680461] Code: 9e 60 01 00 00 0f 1f 40 00 45 8b 8e 48 01 00 00 31 c9 45 85 c9 0f 8e 73 01 00 00 66 0f 1f 44 00 00 49 8b 86 a8 00 00 00 48 63 d1 <48> 8b 14 d0 48 85 d2 0f 84 46 01 00 00 41 8b 86 44 01 00 00 c7
[ 1648.681986] RIP  [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
[ 1648.682724]  RSP <ffff94f55df1fba8>
[ 1648.683451] CR2: 0000000000000000

Fixes: ec5d31e3c15d ("bnxt_en: Handle firmware reset status during IF_UP.")
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -9247,7 +9247,10 @@ int bnxt_open_nic(struct bnxt *bp, bool
 {
 	int rc = 0;
 
-	rc = __bnxt_open_nic(bp, irq_re_init, link_re_init);
+	if (test_bit(BNXT_STATE_ABORT_ERR, &bp->state))
+		rc = -EIO;
+	if (!rc)
+		rc = __bnxt_open_nic(bp, irq_re_init, link_re_init);
 	if (rc) {
 		netdev_err(bp->dev, "nic open fail (rc: %x)\n", rc);
 		dev_close(bp->dev);



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 28/70] bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one().
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2020-10-31 11:35 ` [PATCH 5.8 27/70] bnxt_en: Check abort error state in bnxt_open_nic() Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 29/70] bnxt_en: Invoke cancel_delayed_work_sync() for PFs also Greg Kroah-Hartman
                   ` (44 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vasundhara Volam, Michael Chan,
	Jakub Kicinski

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

[ Upstream commit 21d6a11e2cadfb8446265a3efff0e2aad206e15e ]

A recent patch has moved the workqueue cleanup logic before
calling unregister_netdev() in bnxt_remove_one().  This caused a
regression because the workqueue can be restarted if the device is
still open.  Workqueue cleanup must be done after unregister_netdev().
The workqueue will not restart itself after the device is closed.

Call bnxt_cancel_sp_work() after unregister_netdev() and
call bnxt_dl_fw_reporters_destroy() after that.  This fixes the
regession and the original NULL ptr dereference issue.

Fixes: b16939b59cc0 ("bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -11508,15 +11508,16 @@ static void bnxt_remove_one(struct pci_d
 	if (BNXT_PF(bp))
 		bnxt_sriov_disable(bp);
 
+	if (BNXT_PF(bp))
+		devlink_port_type_clear(&bp->dl_port);
+	pci_disable_pcie_error_reporting(pdev);
+	unregister_netdev(dev);
 	clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+	/* Flush any pending tasks */
 	bnxt_cancel_sp_work(bp);
 	bp->sp_event = 0;
 
 	bnxt_dl_fw_reporters_destroy(bp, true);
-	if (BNXT_PF(bp))
-		devlink_port_type_clear(&bp->dl_port);
-	pci_disable_pcie_error_reporting(pdev);
-	unregister_netdev(dev);
 	bnxt_dl_unregister(bp);
 	bnxt_shutdown_tc(bp);
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 29/70] bnxt_en: Invoke cancel_delayed_work_sync() for PFs also.
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 28/70] bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one() Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 30/70] bnxt_en: Re-write PCI BARs after PCI fatal error Greg Kroah-Hartman
                   ` (43 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Pavan Chebbi, Andy Gospodarek,
	Vasundhara Volam, Michael Chan, Jakub Kicinski

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

[ Upstream commit 631ce27a3006fc0b732bfd589c6df505f62eadd9 ]

As part of the commit b148bb238c02
("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."),
cancel_delayed_work_sync() is called only for VFs to fix a possible
crash by cancelling any pending delayed work items. It was assumed
by mistake that the flush_workqueue() call on the PF would flush
delayed work items as well.

As flush_workqueue() does not cancel the delayed workqueue, extend
the fix for PFs. This fix will avoid the system crash, if there are
any pending delayed work items in fw_reset_task() during driver's
.remove() call.

Unify the workqueue cleanup logic for both PF and VF by calling
cancel_work_sync() and cancel_delayed_work_sync() directly in
bnxt_remove_one().

Fixes: b148bb238c02 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1158,16 +1158,6 @@ static void bnxt_queue_sp_work(struct bn
 		schedule_work(&bp->sp_task);
 }
 
-static void bnxt_cancel_sp_work(struct bnxt *bp)
-{
-	if (BNXT_PF(bp)) {
-		flush_workqueue(bnxt_pf_wq);
-	} else {
-		cancel_work_sync(&bp->sp_task);
-		cancel_delayed_work_sync(&bp->fw_reset_task);
-	}
-}
-
 static void bnxt_sched_reset(struct bnxt *bp, struct bnxt_rx_ring_info *rxr)
 {
 	if (!rxr->bnapi->in_reset) {
@@ -11514,7 +11504,8 @@ static void bnxt_remove_one(struct pci_d
 	unregister_netdev(dev);
 	clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
 	/* Flush any pending tasks */
-	bnxt_cancel_sp_work(bp);
+	cancel_work_sync(&bp->sp_task);
+	cancel_delayed_work_sync(&bp->fw_reset_task);
 	bp->sp_event = 0;
 
 	bnxt_dl_fw_reporters_destroy(bp, true);



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 30/70] bnxt_en: Re-write PCI BARs after PCI fatal error.
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 29/70] bnxt_en: Invoke cancel_delayed_work_sync() for PFs also Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 31/70] bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally Greg Kroah-Hartman
                   ` (42 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vasundhara Volam, Michael Chan,
	Jakub Kicinski

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

[ Upstream commit f75d9a0aa96721d20011cd5f8c7a24eb32728589 ]

When a PCIe fatal error occurs, the internal latched BAR addresses
in the chip get reset even though the BAR register values in config
space are retained.

pci_restore_state() will not rewrite the BAR addresses if the
BAR address values are valid, causing the chip's internal BAR addresses
to stay invalid.  So we need to zero the BAR registers during PCIe fatal
error to force pci_restore_state() to restore the BAR addresses.  These
write cycles to the BAR registers will cause the proper BAR addresses to
latch internally.

Fixes: 6316ea6db93d ("bnxt_en: Enable AER support.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   19 ++++++++++++++++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |    1 +
 2 files changed, 19 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -12233,6 +12233,9 @@ static pci_ers_result_t bnxt_io_error_de
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
 
+	if (state == pci_channel_io_frozen)
+		set_bit(BNXT_STATE_PCI_CHANNEL_IO_FROZEN, &bp->state);
+
 	if (netif_running(netdev))
 		bnxt_close(netdev);
 
@@ -12259,7 +12262,7 @@ static pci_ers_result_t bnxt_io_slot_res
 {
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct bnxt *bp = netdev_priv(netdev);
-	int err = 0;
+	int err = 0, off;
 	pci_ers_result_t result = PCI_ERS_RESULT_DISCONNECT;
 
 	netdev_info(bp->dev, "PCI Slot Reset\n");
@@ -12271,6 +12274,20 @@ static pci_ers_result_t bnxt_io_slot_res
 			"Cannot re-enable PCI device after reset.\n");
 	} else {
 		pci_set_master(pdev);
+		/* Upon fatal error, our device internal logic that latches to
+		 * BAR value is getting reset and will restore only upon
+		 * rewritting the BARs.
+		 *
+		 * As pci_restore_state() does not re-write the BARs if the
+		 * value is same as saved value earlier, driver needs to
+		 * write the BARs to 0 to force restore, in case of fatal error.
+		 */
+		if (test_and_clear_bit(BNXT_STATE_PCI_CHANNEL_IO_FROZEN,
+				       &bp->state)) {
+			for (off = PCI_BASE_ADDRESS_0;
+			     off <= PCI_BASE_ADDRESS_5; off += 4)
+				pci_write_config_dword(bp->pdev, off, 0);
+		}
 		pci_restore_state(pdev);
 		pci_save_state(pdev);
 
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1672,6 +1672,7 @@ struct bnxt {
 #define BNXT_STATE_ABORT_ERR	5
 #define BNXT_STATE_FW_FATAL_COND	6
 #define BNXT_STATE_DRV_REGISTERED	7
+#define BNXT_STATE_PCI_CHANNEL_IO_FROZEN	8
 
 #define BNXT_NO_FW_ACCESS(bp)					\
 	(test_bit(BNXT_STATE_FW_FATAL_COND, &(bp)->state) ||	\



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 31/70] bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally.
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 30/70] bnxt_en: Re-write PCI BARs after PCI fatal error Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 32/70] chelsio/chtls: fix deadlock issue Greg Kroah-Hartman
                   ` (41 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Edwin Peer, Vasundhara Volam,
	Michael Chan, Jakub Kicinski

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

[ Upstream commit 825741b071722f1c8ad692cead562c4b5f5eaa93 ]

In the AER or firmware reset flow, if we are in fatal error state or
if pci_channel_offline() is true, we don't send any commands to the
firmware because the commands will likely not reach the firmware and
most commands don't matter much because the firmware is likely to be
reset imminently.

However, the HWRM_FUNC_RESET command is different and we should always
attempt to send it.  In the AER flow for example, the .slot_reset()
call will trigger this fw command and we need to try to send it to
effect the proper reset.

Fixes: b340dc680ed4 ("bnxt_en: Avoid sending firmware messages when AER error is detected.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4188,7 +4188,8 @@ static int bnxt_hwrm_do_send_msg(struct
 	u32 bar_offset = BNXT_GRCPF_REG_CHIMP_COMM;
 	u16 dst = BNXT_HWRM_CHNL_CHIMP;
 
-	if (BNXT_NO_FW_ACCESS(bp))
+	if (BNXT_NO_FW_ACCESS(bp) &&
+	    le16_to_cpu(req->req_type) != HWRM_FUNC_RESET)
 		return -EBUSY;
 
 	if (msg_len > BNXT_HWRM_MAX_REQ_LEN) {



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 32/70] chelsio/chtls: fix deadlock issue
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 31/70] bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 33/70] chelsio/chtls: fix memory leaks in CPL handlers Greg Kroah-Hartman
                   ` (40 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vinay Kumar Yadav, Jakub Kicinski

From: Vinay Kumar Yadav <vinay.yadav@chelsio.com>

[ Upstream commit 28e9dcd9172028263c8225c15c4e329e08475e89 ]

In chtls_pass_establish() we hold child socket lock using bh_lock_sock
and we are again trying bh_lock_sock in add_to_reap_list, causing deadlock.
Remove bh_lock_sock in add_to_reap_list() as lock is already held.

Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201025193538.31112-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/crypto/chelsio/chtls/chtls_cm.c |    2 --
 1 file changed, 2 deletions(-)

--- a/drivers/crypto/chelsio/chtls/chtls_cm.c
+++ b/drivers/crypto/chelsio/chtls/chtls_cm.c
@@ -1513,7 +1513,6 @@ static void add_to_reap_list(struct sock
 	struct chtls_sock *csk = sk->sk_user_data;
 
 	local_bh_disable();
-	bh_lock_sock(sk);
 	release_tcp_port(sk); /* release the port immediately */
 
 	spin_lock(&reap_list_lock);
@@ -1522,7 +1521,6 @@ static void add_to_reap_list(struct sock
 	if (!csk->passive_reap_next)
 		schedule_work(&reap_task);
 	spin_unlock(&reap_list_lock);
-	bh_unlock_sock(sk);
 	local_bh_enable();
 }
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 33/70] chelsio/chtls: fix memory leaks in CPL handlers
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 32/70] chelsio/chtls: fix deadlock issue Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 34/70] chelsio/chtls: fix tls record info to user Greg Kroah-Hartman
                   ` (39 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vinay Kumar Yadav, Jakub Kicinski

From: Vinay Kumar Yadav <vinay.yadav@chelsio.com>

[ Upstream commit 6daa1da4e262b0cd52ef0acc1989ff22b5540264 ]

CPL handler functions chtls_pass_open_rpl() and
chtls_close_listsrv_rpl() should return CPL_RET_BUF_DONE
so that caller function will do skb free to avoid leak.

Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201025194228.31271-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/crypto/chelsio/chtls/chtls_cm.c |   27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

--- a/drivers/crypto/chelsio/chtls/chtls_cm.c
+++ b/drivers/crypto/chelsio/chtls/chtls_cm.c
@@ -772,14 +772,13 @@ static int chtls_pass_open_rpl(struct ch
 	if (rpl->status != CPL_ERR_NONE) {
 		pr_info("Unexpected PASS_OPEN_RPL status %u for STID %u\n",
 			rpl->status, stid);
-		return CPL_RET_BUF_DONE;
+	} else {
+		cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family);
+		sock_put(listen_ctx->lsk);
+		kfree(listen_ctx);
+		module_put(THIS_MODULE);
 	}
-	cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family);
-	sock_put(listen_ctx->lsk);
-	kfree(listen_ctx);
-	module_put(THIS_MODULE);
-
-	return 0;
+	return CPL_RET_BUF_DONE;
 }
 
 static int chtls_close_listsrv_rpl(struct chtls_dev *cdev, struct sk_buff *skb)
@@ -796,15 +795,13 @@ static int chtls_close_listsrv_rpl(struc
 	if (rpl->status != CPL_ERR_NONE) {
 		pr_info("Unexpected CLOSE_LISTSRV_RPL status %u for STID %u\n",
 			rpl->status, stid);
-		return CPL_RET_BUF_DONE;
+	} else {
+		cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family);
+		sock_put(listen_ctx->lsk);
+		kfree(listen_ctx);
+		module_put(THIS_MODULE);
 	}
-
-	cxgb4_free_stid(cdev->tids, stid, listen_ctx->lsk->sk_family);
-	sock_put(listen_ctx->lsk);
-	kfree(listen_ctx);
-	module_put(THIS_MODULE);
-
-	return 0;
+	return CPL_RET_BUF_DONE;
 }
 
 static void chtls_purge_wr_queue(struct sock *sk)



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 34/70] chelsio/chtls: fix tls record info to user
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 33/70] chelsio/chtls: fix memory leaks in CPL handlers Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 35/70] cxgb4: set up filter action after rewrites Greg Kroah-Hartman
                   ` (38 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Vinay Kumar Yadav, Jakub Kicinski

From: Vinay Kumar Yadav <vinay.yadav@chelsio.com>

[ Upstream commit 4f3391ce8f5a69e7e6d66d0a3fc654eb6dbdc919 ]

chtls_pt_recvmsg() receives a skb with tls header and subsequent
skb with data, need to finalize the data copy whenever next skb
with tls header is available. but here current tls header is
overwritten by next available tls header, ends up corrupting
user buffer data. fixing it by finalizing current record whenever
next skb contains tls header.

v1->v2:
- Improved commit message.

Fixes: 17a7d24aa89d ("crypto: chtls - generic handling of data and hdr")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Link: https://lore.kernel.org/r/20201022190556.21308-1-vinay.yadav@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/crypto/chelsio/chtls/chtls_io.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/drivers/crypto/chelsio/chtls/chtls_io.c
+++ b/drivers/crypto/chelsio/chtls/chtls_io.c
@@ -1585,6 +1585,7 @@ skip_copy:
 			tp->urg_data = 0;
 
 		if ((avail + offset) >= skb->len) {
+			struct sk_buff *next_skb;
 			if (ULP_SKB_CB(skb)->flags & ULPCB_FLAG_TLS_HDR) {
 				tp->copied_seq += skb->len;
 				hws->rcvpld = skb->hdr_len;
@@ -1595,8 +1596,10 @@ skip_copy:
 			chtls_free_skb(sk, skb);
 			buffers_freed++;
 			hws->copied_seq = 0;
-			if (copied >= target &&
-			    !skb_peek(&sk->sk_receive_queue))
+			next_skb = skb_peek(&sk->sk_receive_queue);
+			if (copied >= target && !next_skb)
+				break;
+			if (ULP_SKB_CB(next_skb)->flags & ULPCB_FLAG_TLS_HDR)
 				break;
 		}
 	} while (len > 0);



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 35/70] cxgb4: set up filter action after rewrites
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 34/70] chelsio/chtls: fix tls record info to user Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 36/70] gtp: fix an use-before-init in gtp_newlink() Greg Kroah-Hartman
                   ` (37 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Raju Rangoju, Jakub Kicinski

From: Raju Rangoju <rajur@chelsio.com>

[ Upstream commit 937d8420588421eaa5c7aa5c79b26b42abb288ef ]

The current code sets up the filter action field before
rewrites are set up. When the action 'switch' is used
with rewrites, this may result in initial few packets
that get switched out don't have rewrites applied
on them.

So, make sure filter action is set up along with rewrites
or only after everything else is set up for rewrites.

Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters")
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Link: https://lore.kernel.org/r/20201023115852.18262-1-rajur@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c |   56 ++++++++++------------
 drivers/net/ethernet/chelsio/cxgb4/t4_tcb.h       |    4 +
 2 files changed, 31 insertions(+), 29 deletions(-)

--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
@@ -145,13 +145,13 @@ static int configure_filter_smac(struct
 	int err;
 
 	/* do a set-tcb for smac-sel and CWR bit.. */
-	err = set_tcb_tflag(adap, f, f->tid, TF_CCTRL_CWR_S, 1, 1);
-	if (err)
-		goto smac_err;
-
 	err = set_tcb_field(adap, f, f->tid, TCB_SMAC_SEL_W,
 			    TCB_SMAC_SEL_V(TCB_SMAC_SEL_M),
 			    TCB_SMAC_SEL_V(f->smt->idx), 1);
+	if (err)
+		goto smac_err;
+
+	err = set_tcb_tflag(adap, f, f->tid, TF_CCTRL_CWR_S, 1, 1);
 	if (!err)
 		return 0;
 
@@ -865,6 +865,7 @@ int set_filter_wr(struct adapter *adapte
 		      FW_FILTER_WR_DIRSTEERHASH_V(f->fs.dirsteerhash) |
 		      FW_FILTER_WR_LPBK_V(f->fs.action == FILTER_SWITCH) |
 		      FW_FILTER_WR_DMAC_V(f->fs.newdmac) |
+		      FW_FILTER_WR_SMAC_V(f->fs.newsmac) |
 		      FW_FILTER_WR_INSVLAN_V(f->fs.newvlan == VLAN_INSERT ||
 					     f->fs.newvlan == VLAN_REWRITE) |
 		      FW_FILTER_WR_RMVLAN_V(f->fs.newvlan == VLAN_REMOVE ||
@@ -882,7 +883,7 @@ int set_filter_wr(struct adapter *adapte
 		 FW_FILTER_WR_OVLAN_VLD_V(f->fs.val.ovlan_vld) |
 		 FW_FILTER_WR_IVLAN_VLDM_V(f->fs.mask.ivlan_vld) |
 		 FW_FILTER_WR_OVLAN_VLDM_V(f->fs.mask.ovlan_vld));
-	fwr->smac_sel = 0;
+	fwr->smac_sel = f->smt->idx;
 	fwr->rx_chan_rx_rpl_iq =
 		htons(FW_FILTER_WR_RX_CHAN_V(0) |
 		      FW_FILTER_WR_RX_RPL_IQ_V(adapter->sge.fw_evtq.abs_id));
@@ -1321,11 +1322,8 @@ static void mk_act_open_req6(struct filt
 			    TX_QUEUE_V(f->fs.nat_mode) |
 			    T5_OPT_2_VALID_F |
 			    RX_CHANNEL_V(cxgb4_port_e2cchan(f->dev)) |
-			    CONG_CNTRL_V((f->fs.action == FILTER_DROP) |
-					 (f->fs.dirsteer << 1)) |
 			    PACE_V((f->fs.maskhash) |
-				   ((f->fs.dirsteerhash) << 1)) |
-			    CCTRL_ECN_V(f->fs.action == FILTER_SWITCH));
+				   ((f->fs.dirsteerhash) << 1)));
 }
 
 static void mk_act_open_req(struct filter_entry *f, struct sk_buff *skb,
@@ -1361,11 +1359,8 @@ static void mk_act_open_req(struct filte
 			    TX_QUEUE_V(f->fs.nat_mode) |
 			    T5_OPT_2_VALID_F |
 			    RX_CHANNEL_V(cxgb4_port_e2cchan(f->dev)) |
-			    CONG_CNTRL_V((f->fs.action == FILTER_DROP) |
-					 (f->fs.dirsteer << 1)) |
 			    PACE_V((f->fs.maskhash) |
-				   ((f->fs.dirsteerhash) << 1)) |
-			    CCTRL_ECN_V(f->fs.action == FILTER_SWITCH));
+				   ((f->fs.dirsteerhash) << 1)));
 }
 
 static int cxgb4_set_hash_filter(struct net_device *dev,
@@ -2037,6 +2032,20 @@ void hash_filter_rpl(struct adapter *ada
 			}
 			return;
 		}
+		switch (f->fs.action) {
+		case FILTER_PASS:
+			if (f->fs.dirsteer)
+				set_tcb_tflag(adap, f, tid,
+					      TF_DIRECT_STEER_S, 1, 1);
+			break;
+		case FILTER_DROP:
+			set_tcb_tflag(adap, f, tid, TF_DROP_S, 1, 1);
+			break;
+		case FILTER_SWITCH:
+			set_tcb_tflag(adap, f, tid, TF_LPBK_S, 1, 1);
+			break;
+		}
+
 		break;
 
 	default:
@@ -2104,22 +2113,11 @@ void filter_rpl(struct adapter *adap, co
 			if (ctx)
 				ctx->result = 0;
 		} else if (ret == FW_FILTER_WR_FLT_ADDED) {
-			int err = 0;
-
-			if (f->fs.newsmac)
-				err = configure_filter_smac(adap, f);
-
-			if (!err) {
-				f->pending = 0;  /* async setup completed */
-				f->valid = 1;
-				if (ctx) {
-					ctx->result = 0;
-					ctx->tid = idx;
-				}
-			} else {
-				clear_filter(adap, f);
-				if (ctx)
-					ctx->result = err;
+			f->pending = 0;  /* async setup completed */
+			f->valid = 1;
+			if (ctx) {
+				ctx->result = 0;
+				ctx->tid = idx;
 			}
 		} else {
 			/* Something went wrong.  Issue a warning about the
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_tcb.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_tcb.h
@@ -50,6 +50,10 @@
 #define TCB_T_FLAGS_M		0xffffffffffffffffULL
 #define TCB_T_FLAGS_V(x)	((__u64)(x) << TCB_T_FLAGS_S)
 
+#define TF_DROP_S		22
+#define TF_DIRECT_STEER_S	23
+#define TF_LPBK_S		59
+
 #define TF_CCTRL_ECE_S		60
 #define TF_CCTRL_CWR_S		61
 #define TF_CCTRL_RFR_S		62



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 36/70] gtp: fix an use-before-init in gtp_newlink()
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 35/70] cxgb4: set up filter action after rewrites Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 37/70] ibmveth: Fix use of ibmveth in a bridge Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Masahiro Fujiwara, Jakub Kicinski

From: Masahiro Fujiwara <fujiwara.masahiro@gmail.com>

[ Upstream commit 51467431200b91682b89d31317e35dcbca1469ce ]

*_pdp_find() from gtp_encap_recv() would trigger a crash when a peer
sends GTP packets while creating new GTP device.

RIP: 0010:gtp1_pdp_find.isra.0+0x68/0x90 [gtp]
<SNIP>
Call Trace:
 <IRQ>
 gtp_encap_recv+0xc2/0x2e0 [gtp]
 ? gtp1_pdp_find.isra.0+0x90/0x90 [gtp]
 udp_queue_rcv_one_skb+0x1fe/0x530
 udp_queue_rcv_skb+0x40/0x1b0
 udp_unicast_rcv_skb.isra.0+0x78/0x90
 __udp4_lib_rcv+0x5af/0xc70
 udp_rcv+0x1a/0x20
 ip_protocol_deliver_rcu+0xc5/0x1b0
 ip_local_deliver_finish+0x48/0x50
 ip_local_deliver+0xe5/0xf0
 ? ip_protocol_deliver_rcu+0x1b0/0x1b0

gtp_encap_enable() should be called after gtp_hastable_new() otherwise
*_pdp_find() will access the uninitialized hash table.

Fixes: 1e3a3abd8b28 ("gtp: make GTP sockets in gtp_newlink optional")
Signed-off-by: Masahiro Fujiwara <fujiwara.masahiro@gmail.com>
Link: https://lore.kernel.org/r/20201027114846.3924-1-fujiwara.masahiro@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/gtp.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -663,10 +663,6 @@ static int gtp_newlink(struct net *src_n
 
 	gtp = netdev_priv(dev);
 
-	err = gtp_encap_enable(gtp, data);
-	if (err < 0)
-		return err;
-
 	if (!data[IFLA_GTP_PDP_HASHSIZE]) {
 		hashsize = 1024;
 	} else {
@@ -677,12 +673,16 @@ static int gtp_newlink(struct net *src_n
 
 	err = gtp_hashtable_new(gtp, hashsize);
 	if (err < 0)
-		goto out_encap;
+		return err;
+
+	err = gtp_encap_enable(gtp, data);
+	if (err < 0)
+		goto out_hashtable;
 
 	err = register_netdevice(dev);
 	if (err < 0) {
 		netdev_dbg(dev, "failed to register new netdev %d\n", err);
-		goto out_hashtable;
+		goto out_encap;
 	}
 
 	gn = net_generic(dev_net(dev), gtp_net_id);
@@ -693,11 +693,11 @@ static int gtp_newlink(struct net *src_n
 
 	return 0;
 
+out_encap:
+	gtp_encap_disable(gtp);
 out_hashtable:
 	kfree(gtp->addr_hash);
 	kfree(gtp->tid_hash);
-out_encap:
-	gtp_encap_disable(gtp);
 	return err;
 }
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 37/70] ibmveth: Fix use of ibmveth in a bridge.
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 36/70] gtp: fix an use-before-init in gtp_newlink() Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 38/70] ibmvnic: fix ibmvnic_set_mac Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Michal Suchanek, Thomas Bogendoerfer,
	Jakub Kicinski

From: Thomas Bogendoerfer <tbogendoerfer@suse.de>

[ Upstream commit 2ac8af0967aaa2b67cb382727e784900d2f4d0da ]

The check for src mac address in ibmveth_is_packet_unsupported is wrong.
Commit 6f2275433a2f wanted to shut down messages for loopback packets,
but now suppresses bridged frames, which are accepted by the hypervisor
otherwise bridging won't work at all.

Fixes: 6f2275433a2f ("ibmveth: Detect unsupported packets before sending to the hypervisor")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Link: https://lore.kernel.org/r/20201026104221.26570-1-msuchanek@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/ibm/ibmveth.c |    6 ------
 1 file changed, 6 deletions(-)

--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1031,12 +1031,6 @@ static int ibmveth_is_packet_unsupported
 		ret = -EOPNOTSUPP;
 	}
 
-	if (!ether_addr_equal(ether_header->h_source, netdev->dev_addr)) {
-		netdev_dbg(netdev, "source packet MAC address does not match veth device's, dropping packet.\n");
-		netdev->stats.tx_dropped++;
-		ret = -EOPNOTSUPP;
-	}
-
 	return ret;
 }
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 38/70] ibmvnic: fix ibmvnic_set_mac
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 37/70] ibmveth: Fix use of ibmveth in a bridge Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 39/70] mlxsw: core: Fix memory leak on module removal Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Lijun Pan, Jakub Kicinski

From: Lijun Pan <ljp@linux.ibm.com>

[ Upstream commit 8fc3672a8ad3e782bac80e979bc2a2c10960cbe9 ]

Jakub Kicinski brought up a concern in ibmvnic_set_mac().
ibmvnic_set_mac() does this:

	ether_addr_copy(adapter->mac_addr, addr->sa_data);
	if (adapter->state != VNIC_PROBED)
		rc = __ibmvnic_set_mac(netdev, addr->sa_data);

So if state == VNIC_PROBED, the user can assign an invalid address to
adapter->mac_addr, and ibmvnic_set_mac() will still return 0.

The fix is to validate ethernet address at the beginning of
ibmvnic_set_mac(), and move the ether_addr_copy to
the case of "adapter->state != VNIC_PROBED".

Fixes: c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20201027220456.71450-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/ibm/ibmvnic.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1828,9 +1828,13 @@ static int ibmvnic_set_mac(struct net_de
 	int rc;
 
 	rc = 0;
-	ether_addr_copy(adapter->mac_addr, addr->sa_data);
-	if (adapter->state != VNIC_PROBED)
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EADDRNOTAVAIL;
+
+	if (adapter->state != VNIC_PROBED) {
+		ether_addr_copy(adapter->mac_addr, addr->sa_data);
 		rc = __ibmvnic_set_mac(netdev, addr->sa_data);
+	}
 
 	return rc;
 }



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 39/70] mlxsw: core: Fix memory leak on module removal
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (37 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 38/70] ibmvnic: fix ibmvnic_set_mac Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 40/70] netem: fix zero division in tabledist Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ido Schimmel, Vadim Pasternak,
	Oleksandr Shamray, Jakub Kicinski

From: Ido Schimmel <idosch@nvidia.com>

[ Upstream commit adc80b6cfedff6dad8b93d46a5ea2775fd5af9ec ]

Free the devlink instance during the teardown sequence in the non-reload
case to avoid the following memory leak.

unreferenced object 0xffff888232895000 (size 2048):
  comm "modprobe", pid 1073, jiffies 4295568857 (age 164.871s)
  hex dump (first 32 bytes):
    00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de  ........".......
    10 50 89 32 82 88 ff ff 10 50 89 32 82 88 ff ff  .P.2.....P.2....
  backtrace:
    [<00000000c704e9a6>] __kmalloc+0x13a/0x2a0
    [<00000000ee30129d>] devlink_alloc+0xff/0x760
    [<0000000092ab3e5d>] 0xffffffffa042e5b0
    [<000000004f3f8a31>] 0xffffffffa042f6ad
    [<0000000092800b4b>] 0xffffffffa0491df3
    [<00000000c4843903>] local_pci_probe+0xcb/0x170
    [<000000006993ded7>] pci_device_probe+0x2c2/0x4e0
    [<00000000a8e0de75>] really_probe+0x2c5/0xf90
    [<00000000d42ba75d>] driver_probe_device+0x1eb/0x340
    [<00000000bcc95e05>] device_driver_attach+0x294/0x300
    [<000000000e2bc177>] __driver_attach+0x167/0x2f0
    [<000000007d44cd6e>] bus_for_each_dev+0x148/0x1f0
    [<000000003cd5a91e>] driver_attach+0x45/0x60
    [<000000000041ce51>] bus_add_driver+0x3b8/0x720
    [<00000000f5215476>] driver_register+0x230/0x4e0
    [<00000000d79356f5>] __pci_register_driver+0x190/0x200

Fixes: a22712a96291 ("mlxsw: core: Fix devlink unregister flow")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Vadim Pasternak <vadimp@nvidia.com>
Tested-by: Oleksandr Shamray <oleksandrs@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1483,6 +1483,8 @@ void mlxsw_core_bus_device_unregister(st
 	if (!reload)
 		devlink_resources_unregister(devlink, NULL);
 	mlxsw_core->bus->fini(mlxsw_core->bus_priv);
+	if (!reload)
+		devlink_free(devlink);
 
 	return;
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 40/70] netem: fix zero division in tabledist
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (38 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 39/70] mlxsw: core: Fix memory leak on module removal Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 41/70] net: hns3: Clear the CMDQ registers before unmapping BAR region Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Aleksandr Nogikh,
	syzbot+ec762a6342ad0d3c0d8f, Stephen Hemminger, Jakub Kicinski

From: Aleksandr Nogikh <nogikh@google.com>

[ Upstream commit eadd1befdd778a1eca57fad058782bd22b4db804 ]

Currently it is possible to craft a special netlink RTM_NEWQDISC
command that can result in jitter being equal to 0x80000000. It is
enough to set the 32 bit jitter to 0x02000000 (it will later be
multiplied by 2^6) or just set the 64 bit jitter via
TCA_NETEM_JITTER64. This causes an overflow during the generation of
uniformly distributed numbers in tabledist(), which in turn leads to
division by zero (sigma != 0, but sigma * 2 is 0).

The related fragment of code needs 32-bit division - see commit
9b0ed89 ("netem: remove unnecessary 64 bit modulus"), so switching to
64 bit is not an option.

Fix the issue by keeping the value of jitter within the range that can
be adequately handled by tabledist() - [0;INT_MAX]. As negative std
deviation makes no sense, take the absolute value of the passed value
and cap it at INT_MAX. Inside tabledist(), switch to unsigned 32 bit
arithmetic in order to prevent overflows.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Reported-by: syzbot+ec762a6342ad0d3c0d8f@syzkaller.appspotmail.com
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20201028170731.1383332-1-aleksandrnogikh@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sched/sch_netem.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -330,7 +330,7 @@ static s64 tabledist(s64 mu, s32 sigma,
 
 	/* default uniform distribution */
 	if (dist == NULL)
-		return ((rnd % (2 * sigma)) + mu) - sigma;
+		return ((rnd % (2 * (u32)sigma)) + mu) - sigma;
 
 	t = dist->table[rnd % dist->size];
 	x = (sigma % NETEM_DIST_SCALE) * t;
@@ -812,6 +812,10 @@ static void get_slot(struct netem_sched_
 		q->slot_config.max_packets = INT_MAX;
 	if (q->slot_config.max_bytes == 0)
 		q->slot_config.max_bytes = INT_MAX;
+
+	/* capping dist_jitter to the range acceptable by tabledist() */
+	q->slot_config.dist_jitter = min_t(__s64, INT_MAX, abs(q->slot_config.dist_jitter));
+
 	q->slot.packets_left = q->slot_config.max_packets;
 	q->slot.bytes_left = q->slot_config.max_bytes;
 	if (q->slot_config.min_delay | q->slot_config.max_delay |
@@ -1037,6 +1041,9 @@ static int netem_change(struct Qdisc *sc
 	if (tb[TCA_NETEM_SLOT])
 		get_slot(q, tb[TCA_NETEM_SLOT]);
 
+	/* capping jitter to the range acceptable by tabledist() */
+	q->jitter = min_t(s64, abs(q->jitter), INT_MAX);
+
 	return ret;
 
 get_table_failure:



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 41/70] net: hns3: Clear the CMDQ registers before unmapping BAR region
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (39 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 40/70] netem: fix zero division in tabledist Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 42/70] net: ipa: command payloads already mapped Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Zenghui Yu, Jakub Kicinski

From: Zenghui Yu <yuzenghui@huawei.com>

[ Upstream commit e3364c5ff3ff975b943a7bf47e21a2a4bf20f3fe ]

When unbinding the hns3 driver with the HNS3 VF, I got the following
kernel panic:

[  265.709989] Unable to handle kernel paging request at virtual address ffff800054627000
[  265.717928] Mem abort info:
[  265.720740]   ESR = 0x96000047
[  265.723810]   EC = 0x25: DABT (current EL), IL = 32 bits
[  265.729126]   SET = 0, FnV = 0
[  265.732195]   EA = 0, S1PTW = 0
[  265.735351] Data abort info:
[  265.738227]   ISV = 0, ISS = 0x00000047
[  265.742071]   CM = 0, WnR = 1
[  265.745055] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000009b54000
[  265.751753] [ffff800054627000] pgd=0000202ffffff003, p4d=0000202ffffff003, pud=00002020020eb003, pmd=00000020a0dfc003, pte=0000000000000000
[  265.764314] Internal error: Oops: 96000047 [#1] SMP
[  265.830357] CPU: 61 PID: 20319 Comm: bash Not tainted 5.9.0+ #206
[  265.836423] Hardware name: Huawei TaiShan 2280 V2/BC82AMDDA, BIOS 1.05 09/18/2019
[  265.843873] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
[  265.843890] pc : hclgevf_cmd_uninit+0xbc/0x300
[  265.861988] lr : hclgevf_cmd_uninit+0xb0/0x300
[  265.861992] sp : ffff80004c983b50
[  265.881411] pmr_save: 000000e0
[  265.884453] x29: ffff80004c983b50 x28: ffff20280bbce500
[  265.889744] x27: 0000000000000000 x26: 0000000000000000
[  265.895034] x25: ffff800011a1f000 x24: ffff800011a1fe90
[  265.900325] x23: ffff0020ce9b00d8 x22: ffff0020ce9b0150
[  265.905616] x21: ffff800010d70e90 x20: ffff800010d70e90
[  265.910906] x19: ffff0020ce9b0080 x18: 0000000000000004
[  265.916198] x17: 0000000000000000 x16: ffff800011ae32e8
[  265.916201] x15: 0000000000000028 x14: 0000000000000002
[  265.916204] x13: ffff800011ae32e8 x12: 0000000000012ad8
[  265.946619] x11: ffff80004c983b50 x10: 0000000000000000
[  265.951911] x9 : ffff8000115d0888 x8 : 0000000000000000
[  265.951914] x7 : ffff800011890b20 x6 : c0000000ffff7fff
[  265.951917] x5 : ffff80004c983930 x4 : 0000000000000001
[  265.951919] x3 : ffffa027eec1b000 x2 : 2b78ccbbff369100
[  265.964487] x1 : 0000000000000000 x0 : ffff800054627000
[  265.964491] Call trace:
[  265.964494]  hclgevf_cmd_uninit+0xbc/0x300
[  265.964496]  hclgevf_uninit_ae_dev+0x9c/0xe8
[  265.964501]  hnae3_unregister_ae_dev+0xb0/0x130
[  265.964516]  hns3_remove+0x34/0x88 [hns3]
[  266.009683]  pci_device_remove+0x48/0xf0
[  266.009692]  device_release_driver_internal+0x114/0x1e8
[  266.030058]  device_driver_detach+0x28/0x38
[  266.034224]  unbind_store+0xd4/0x108
[  266.037784]  drv_attr_store+0x40/0x58
[  266.041435]  sysfs_kf_write+0x54/0x80
[  266.045081]  kernfs_fop_write+0x12c/0x250
[  266.049076]  vfs_write+0xc4/0x248
[  266.052378]  ksys_write+0x74/0xf8
[  266.055677]  __arm64_sys_write+0x24/0x30
[  266.059584]  el0_svc_common.constprop.3+0x84/0x270
[  266.064354]  do_el0_svc+0x34/0xa0
[  266.067658]  el0_svc+0x38/0x40
[  266.070700]  el0_sync_handler+0x8c/0xb0
[  266.074519]  el0_sync+0x140/0x180

It looks like the BAR memory region had already been unmapped before we
start clearing CMDQ registers in it, which is pretty bad and the kernel
happily kills itself because of a Current EL Data Abort (on arm64).

Moving the CMDQ uninitialization a bit early fixes the issue for me.

Fixes: 862d969a3a4d ("net: hns3: do VF's pci re-initialization while PF doing FLR")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20201023051550.793-1-yuzenghui@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -3146,8 +3146,8 @@ static void hclgevf_uninit_hdev(struct h
 		hclgevf_uninit_msi(hdev);
 	}
 
-	hclgevf_pci_uninit(hdev);
 	hclgevf_cmd_uninit(hdev);
+	hclgevf_pci_uninit(hdev);
 	hclgevf_uninit_mac_list(hdev);
 }
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 42/70] net: ipa: command payloads already mapped
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (40 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 41/70] net: hns3: Clear the CMDQ registers before unmapping BAR region Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 43/70] net/sched: act_mpls: Add softdep on mpls_gso.ko Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Stephen Boyd, Alex Elder, Jakub Kicinski

From: Alex Elder <elder@linaro.org>

[ Upstream commit df833050cced27e1b343cc8bc41f90191b289334 ]

IPA transactions describe actions to be performed by the IPA
hardware.  Three cases use IPA transactions:  transmitting a socket
buffer; providing a page to receive packet data; and issuing an IPA
immediate command.  An IPA transaction contains a scatter/gather
list (SGL) to hold the set of actions to be performed.

We map buffers in the SGL for DMA at the time they are added to the
transaction.  For skb TX transactions, we fill the SGL with a call
to skb_to_sgvec().  Page RX transactions involve a single page
pointer, and that is recorded in the SGL with sg_set_page().  In
both of these cases we then map the SGL for DMA with a call to
dma_map_sg().

Immediate commands are different.  The payload for an immediate
command comes from a region of coherent DMA memory, which must
*not* be mapped for DMA.  For that reason, gsi_trans_cmd_add()
sort of hand-crafts each SGL entry added to a command transaction.

This patch fixes a problem with the code that crafts the SGL entry
for an immediate command.  Previously a portion of the SGL entry was
updated using sg_set_buf().  However this is not valid because it
includes a call to virt_to_page() on the buffer, but the command
buffer pointer is not a linear address.

Since we never actually map the SGL for command transactions, there
are very few fields in the SGL we need to fill.  Specifically, we
only need to record the DMA address and the length, so they can be
used by __gsi_trans_commit() to fill a TRE.  We additionally need to
preserve the SGL flags so for_each_sg() still works.  For that we
can simply assign a null page pointer for command SGL entries.

Fixes: 9dd441e4ed575 ("soc: qcom: ipa: GSI transactions")
Reported-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20201022010029.11877-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ipa/gsi_trans.c |   21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

--- a/drivers/net/ipa/gsi_trans.c
+++ b/drivers/net/ipa/gsi_trans.c
@@ -398,15 +398,24 @@ void gsi_trans_cmd_add(struct gsi_trans
 
 	/* assert(which < trans->tre_count); */
 
-	/* Set the page information for the buffer.  We also need to fill in
-	 * the DMA address and length for the buffer (something dma_map_sg()
-	 * normally does).
+	/* Commands are quite different from data transfer requests.
+	 * Their payloads come from a pool whose memory is allocated
+	 * using dma_alloc_coherent().  We therefore do *not* map them
+	 * for DMA (unlike what we do for pages and skbs).
+	 *
+	 * When a transaction completes, the SGL is normally unmapped.
+	 * A command transaction has direction DMA_NONE, which tells
+	 * gsi_trans_complete() to skip the unmapping step.
+	 *
+	 * The only things we use directly in a command scatter/gather
+	 * entry are the DMA address and length.  We still need the SG
+	 * table flags to be maintained though, so assign a NULL page
+	 * pointer for that purpose.
 	 */
 	sg = &trans->sgl[which];
-
-	sg_set_buf(sg, buf, size);
+	sg_assign_page(sg, NULL);
 	sg_dma_address(sg) = addr;
-	sg_dma_len(sg) = sg->length;
+	sg_dma_len(sg) = size;
 
 	info = &trans->info[which];
 	info->opcode = opcode;



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 43/70] net/sched: act_mpls: Add softdep on mpls_gso.ko
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (41 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 42/70] net: ipa: command payloads already mapped Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 44/70] r8169: fix issue with forced threading in combination with shared interrupts Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Guillaume Nault, Jakub Kicinski

From: Guillaume Nault <gnault@redhat.com>

TCA_MPLS_ACT_PUSH and TCA_MPLS_ACT_MAC_PUSH might be used on gso
packets. Such packets will thus require mpls_gso.ko for segmentation.

v2: Drop dependency on CONFIG_NET_MPLS_GSO in Kconfig (from Jakub and
    David).

Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/1f6cab15bbd15666795061c55563aaf6a386e90e.1603708007.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sched/act_mpls.c |    1 +
 1 file changed, 1 insertion(+)

--- a/net/sched/act_mpls.c
+++ b/net/sched/act_mpls.c
@@ -408,6 +408,7 @@ static void __exit mpls_cleanup_module(v
 module_init(mpls_init_module);
 module_exit(mpls_cleanup_module);
 
+MODULE_SOFTDEP("post: mpls_gso");
 MODULE_AUTHOR("Netronome Systems <oss-drivers@netronome.com>");
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("MPLS manipulation actions");



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 44/70] r8169: fix issue with forced threading in combination with shared interrupts
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (42 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 43/70] net/sched: act_mpls: Add softdep on mpls_gso.ko Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 45/70] ravb: Fix bit fields checking in ravb_hwtstamp_get() Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Serge Belyshev, Heiner Kallweit,
	Jakub Kicinski

From: Heiner Kallweit <hkallweit1@gmail.com>

[ Upstream commit 2734a24e6e5d18522fbf599135c59b82ec9b2c9e ]

As reported by Serge flag IRQF_NO_THREAD causes an error if the
interrupt is actually shared and the other driver(s) don't have this
flag set. This situation can occur if a PCI(e) legacy interrupt is
used in combination with forced threading.
There's no good way to deal with this properly, therefore we have to
remove flag IRQF_NO_THREAD. For fixing the original forced threading
issue switch to napi_schedule().

Fixes: 424a646e072a ("r8169: fix operation under forced interrupt threading")
Link: https://www.spinics.net/lists/netdev/msg694960.html
Reported-by: Serge Belyshev <belyshev@depni.sinp.msu.ru>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Serge Belyshev <belyshev@depni.sinp.msu.ru>
Link: https://lore.kernel.org/r/b5b53bfe-35ac-3768-85bf-74d1290cf394@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/realtek/r8169_main.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -4559,7 +4559,7 @@ static irqreturn_t rtl8169_interrupt(int
 	}
 
 	rtl_irq_disable(tp);
-	napi_schedule_irqoff(&tp->napi);
+	napi_schedule(&tp->napi);
 out:
 	rtl_ack_events(tp, status);
 
@@ -4727,7 +4727,7 @@ static int rtl_open(struct net_device *d
 	rtl_request_firmware(tp);
 
 	retval = request_irq(pci_irq_vector(pdev, 0), rtl8169_interrupt,
-			     IRQF_NO_THREAD | IRQF_SHARED, dev->name, tp);
+			     IRQF_SHARED, dev->name, tp);
 	if (retval < 0)
 		goto err_release_fw_2;
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 45/70] ravb: Fix bit fields checking in ravb_hwtstamp_get()
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (43 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 44/70] r8169: fix issue with forced threading in combination with shared interrupts Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 46/70] tcp: Prevent low rmem stalls with SO_RCVLOWAT Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Julia Lawall, Andrew Gabbasov,
	Sergei Shtylyov, Jakub Kicinski

From: Andrew Gabbasov <andrew_gabbasov@mentor.com>

[ Upstream commit 68b9f0865b1ef545da180c57d54b82c94cb464a4 ]

In the function ravb_hwtstamp_get() in ravb_main.c with the existing
values for RAVB_RXTSTAMP_TYPE_V2_L2_EVENT (0x2) and RAVB_RXTSTAMP_TYPE_ALL
(0x6)

if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_V2_L2_EVENT)
	config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT;
else if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_ALL)
	config.rx_filter = HWTSTAMP_FILTER_ALL;

if the test on RAVB_RXTSTAMP_TYPE_ALL should be true,
it will never be reached.

This issue can be verified with 'hwtstamp_config' testing program
(tools/testing/selftests/net/hwtstamp_config.c). Setting filter type
to ALL and subsequent retrieving it gives incorrect value:

$ hwtstamp_config eth0 OFF ALL
flags = 0
tx_type = OFF
rx_filter = ALL
$ hwtstamp_config eth0
flags = 0
tx_type = OFF
rx_filter = PTP_V2_L2_EVENT

Correct this by converting if-else's to switch.

Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Link: https://lore.kernel.org/r/20201026102130.29368-1-andrew_gabbasov@mentor.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/renesas/ravb_main.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1747,12 +1747,16 @@ static int ravb_hwtstamp_get(struct net_
 	config.flags = 0;
 	config.tx_type = priv->tstamp_tx_ctrl ? HWTSTAMP_TX_ON :
 						HWTSTAMP_TX_OFF;
-	if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_V2_L2_EVENT)
+	switch (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE) {
+	case RAVB_RXTSTAMP_TYPE_V2_L2_EVENT:
 		config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT;
-	else if (priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE_ALL)
+		break;
+	case RAVB_RXTSTAMP_TYPE_ALL:
 		config.rx_filter = HWTSTAMP_FILTER_ALL;
-	else
+		break;
+	default:
 		config.rx_filter = HWTSTAMP_FILTER_NONE;
+	}
 
 	return copy_to_user(req->ifr_data, &config, sizeof(config)) ?
 		-EFAULT : 0;



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 46/70] tcp: Prevent low rmem stalls with SO_RCVLOWAT.
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (44 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 45/70] ravb: Fix bit fields checking in ravb_hwtstamp_get() Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 47/70] tipc: fix memory leak caused by tipc_buf_append() Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Arjun Roy, Soheil Hassas Yeganeh,
	Neal Cardwell, Eric Dumazet, Jakub Kicinski

From: Arjun Roy <arjunroy@google.com>

[ Upstream commit 435ccfa894e35e3d4a1799e6ac030e48a7b69ef5 ]

With SO_RCVLOWAT, under memory pressure,
it is possible to enter a state where:

1. We have not received enough bytes to satisfy SO_RCVLOWAT.
2. We have not entered buffer pressure (see tcp_rmem_pressure()).
3. But, we do not have enough buffer space to accept more packets.

In this case, we advertise 0 rwnd (due to #3) but the application does
not drain the receive queue (no wakeup because of #1 and #2) so the
flow stalls.

Modify the heuristic for SO_RCVLOWAT so that, if we are advertising
rwnd<=rcv_mss, force a wakeup to prevent a stall.

Without this patch, setting tcp_rmem to 6143 and disabling TCP
autotune causes a stalled flow. With this patch, no stall occurs. This
is with RPC-style traffic with large messages.

Fixes: 03f45c883c6f ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
Signed-off-by: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201023184709.217614-1-arjunroy.kdev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp.c       |    2 ++
 net/ipv4/tcp_input.c |    3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -483,6 +483,8 @@ static inline bool tcp_stream_is_readabl
 			return true;
 		if (tcp_rmem_pressure(sk))
 			return true;
+		if (tcp_receive_window(tp) <= inet_csk(sk)->icsk_ack.rcv_mss)
+			return true;
 	}
 	if (sk->sk_prot->stream_memory_read)
 		return sk->sk_prot->stream_memory_read(sk);
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4790,7 +4790,8 @@ void tcp_data_ready(struct sock *sk)
 	int avail = tp->rcv_nxt - tp->copied_seq;
 
 	if (avail < sk->sk_rcvlowat && !tcp_rmem_pressure(sk) &&
-	    !sock_flag(sk, SOCK_DONE))
+	    !sock_flag(sk, SOCK_DONE) &&
+	    tcp_receive_window(tp) > inet_csk(sk)->icsk_ack.rcv_mss)
 		return;
 
 	sk->sk_data_ready(sk);



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 47/70] tipc: fix memory leak caused by tipc_buf_append()
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (45 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 46/70] tcp: Prevent low rmem stalls with SO_RCVLOWAT Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 48/70] net: protect tcf_block_unbind with block lock Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jon Maloy, Thang Hoang Ngo,
	Tung Nguyen, Xin Long, Cong Wang, Jakub Kicinski

From: Tung Nguyen <tung.q.nguyen@dektech.com.au>

[ Upstream commit ceb1eb2fb609c88363e06618b8d4bbf7815a4e03 ]

Commit ed42989eab57 ("tipc: fix the skb_unshare() in tipc_buf_append()")
replaced skb_unshare() with skb_copy() to not reduce the data reference
counter of the original skb intentionally. This is not the correct
way to handle the cloned skb because it causes memory leak in 2
following cases:
 1/ Sending multicast messages via broadcast link
  The original skb list is cloned to the local skb list for local
  destination. After that, the data reference counter of each skb
  in the original list has the value of 2. This causes each skb not
  to be freed after receiving ACK:
  tipc_link_advance_transmq()
  {
   ...
   /* release skb */
   __skb_unlink(skb, &l->transmq);
   kfree_skb(skb); <-- memory exists after being freed
  }

 2/ Sending multicast messages via replicast link
  Similar to the above case, each skb cannot be freed after purging
  the skb list:
  tipc_mcast_xmit()
  {
   ...
   __skb_queue_purge(pkts); <-- memory exists after being freed
  }

This commit fixes this issue by using skb_unshare() instead. Besides,
to avoid use-after-free error reported by KASAN, the pointer to the
fragment is set to NULL before calling skb_unshare() to make sure that
the original skb is not freed after freeing the fragment 2 times in
case skb_unshare() returns NULL.

Fixes: ed42989eab57 ("tipc: fix the skb_unshare() in tipc_buf_append()")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Reported-by: Thang Hoang Ngo <thang.h.ngo@dektech.com.au>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://lore.kernel.org/r/20201027032403.1823-1-tung.q.nguyen@dektech.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/tipc/msg.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -150,12 +150,11 @@ int tipc_buf_append(struct sk_buff **hea
 	if (fragid == FIRST_FRAGMENT) {
 		if (unlikely(head))
 			goto err;
-		if (skb_cloned(frag))
-			frag = skb_copy(frag, GFP_ATOMIC);
+		*buf = NULL;
+		frag = skb_unshare(frag, GFP_ATOMIC);
 		if (unlikely(!frag))
 			goto err;
 		head = *headbuf = frag;
-		*buf = NULL;
 		TIPC_SKB_CB(head)->tail = NULL;
 		if (skb_is_nonlinear(head)) {
 			skb_walk_frags(head, tail) {



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 48/70] net: protect tcf_block_unbind with block lock
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (46 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 47/70] tipc: fix memory leak caused by tipc_buf_append() Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 49/70] erofs: avoid duplicated permission check for "trusted." xattrs Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Leon Romanovsky, Jakub Kicinski

From: Leon Romanovsky <leonro@nvidia.com>

[ Upstream commit d6535dca28859d8d9ef80894eb287b2ac35a32e8 ]

The tcf_block_unbind() expects that the caller will take block->cb_lock
before calling it, however the code took RTNL lock and dropped cb_lock
instead. This causes to the following kernel panic.

 WARNING: CPU: 1 PID: 13524 at net/sched/cls_api.c:1488 tcf_block_unbind+0x2db/0x420
 Modules linked in: mlx5_ib mlx5_core mlxfw ptp pps_core act_mirred act_tunnel_key cls_flower vxlan ip6_udp_tunnel udp_tunnel dummy sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad ib_ipoib rdma_cm iw_cm ib_cm ib_uverbs ib_core overlay [last unloaded: mlxfw]
 CPU: 1 PID: 13524 Comm: test-ecmp-add-v Tainted: G        W         5.9.0+ #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:tcf_block_unbind+0x2db/0x420
 Code: ff 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8d bc 24 30 01 00 00 be ff ff ff ff e8 7d 7f 70 00 85 c0 0f 85 7b fd ff ff <0f> 0b e9 74 fd ff ff 48 c7 c7 dc 6a 24 84 e8 02 ec fe fe e9 55 fd
 RSP: 0018:ffff888117d17968 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff88812f713c00 RCX: 1ffffffff0848d5b
 RDX: 0000000000000001 RSI: ffff88814fbc8130 RDI: ffff888107f2b878
 RBP: 1ffff11022fa2f3f R08: 0000000000000000 R09: ffffffff84115a87
 R10: fffffbfff0822b50 R11: ffff888107f2b898 R12: ffff88814fbc8000
 R13: ffff88812f713c10 R14: ffff888117d17a38 R15: ffff88814fbc80c0
 FS:  00007f6593d36740(0000) GS:ffff8882a4f00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00005607a00758f8 CR3: 0000000131aea006 CR4: 0000000000170ea0
 Call Trace:
  tc_block_indr_cleanup+0x3e0/0x5a0
  ? tcf_block_unbind+0x420/0x420
  ? __mutex_unlock_slowpath+0xe7/0x610
  flow_indr_dev_unregister+0x5e2/0x930
  ? mlx5e_restore_tunnel+0xdf0/0xdf0 [mlx5_core]
  ? mlx5e_restore_tunnel+0xdf0/0xdf0 [mlx5_core]
  ? flow_indr_block_cb_alloc+0x3c0/0x3c0
  ? mlx5_db_free+0x37c/0x4b0 [mlx5_core]
  mlx5e_cleanup_rep_tx+0x8b/0xc0 [mlx5_core]
  mlx5e_detach_netdev+0xe5/0x120 [mlx5_core]
  mlx5e_vport_rep_unload+0x155/0x260 [mlx5_core]
  esw_offloads_disable+0x227/0x2b0 [mlx5_core]
  mlx5_eswitch_disable_locked.cold+0x38e/0x699 [mlx5_core]
  mlx5_eswitch_disable+0x94/0xf0 [mlx5_core]
  mlx5_device_disable_sriov+0x183/0x1f0 [mlx5_core]
  mlx5_core_sriov_configure+0xfd/0x230 [mlx5_core]
  sriov_numvfs_store+0x261/0x2f0
  ? sriov_drivers_autoprobe_store+0x110/0x110
  ? sysfs_file_ops+0x170/0x170
  ? sysfs_file_ops+0x117/0x170
  ? sysfs_file_ops+0x170/0x170
  kernfs_fop_write+0x1ff/0x3f0
  ? rcu_read_lock_any_held+0x6e/0x90
  vfs_write+0x1f3/0x620
  ksys_write+0xf9/0x1d0
  ? __x64_sys_read+0xb0/0xb0
  ? lockdep_hardirqs_on_prepare+0x273/0x3f0
  ? syscall_enter_from_user_mode+0x1d/0x50
  do_syscall_64+0x2d/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

<...>

 ---[ end trace bfdd028ada702879 ]---

Fixes: 0fdcf78d5973 ("net: use flow_indr_dev_setup_offload()")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20201026123327.1141066-1-leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sched/cls_api.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -650,12 +650,12 @@ static void tc_block_indr_cleanup(struct
 			       block_cb->indr.binder_type,
 			       &block->flow_block, tcf_block_shared(block),
 			       &extack);
+	rtnl_lock();
 	down_write(&block->cb_lock);
 	list_del(&block_cb->driver_list);
 	list_move(&block_cb->list, &bo.cb_list);
-	up_write(&block->cb_lock);
-	rtnl_lock();
 	tcf_block_unbind(block, &bo);
+	up_write(&block->cb_lock);
 	rtnl_unlock();
 }
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 49/70] erofs: avoid duplicated permission check for "trusted." xattrs
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (47 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 48/70] net: protect tcf_block_unbind with block lock Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 50/70] arch/x86/amd/ibs: Fix re-arming IBS Fetch Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Hongyu Jin, Chao Yu, Gao Xiang

From: Gao Xiang <hsiangkao@redhat.com>

commit d578b46db69d125a654f509bdc9091d84e924dc8 upstream.

Don't recheck it since xattr_permission() already
checks CAP_SYS_ADMIN capability.

Just follow 5d3ce4f70172 ("f2fs: avoid duplicated permission check for "trusted." xattrs")

Reported-by: Hongyu Jin <hongyu.jin@unisoc.com>
[ Gao Xiang: since it could cause some complex Android overlay
  permission issue as well on android-5.4+, it'd be better to
  backport to 5.4+ rather than pure cleanup on mainline. ]
Cc: <stable@vger.kernel.org> # 5.4+
Link: https://lore.kernel.org/r/20200811070020.6339-1-hsiangkao@redhat.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/erofs/xattr.c |    2 --
 1 file changed, 2 deletions(-)

--- a/fs/erofs/xattr.c
+++ b/fs/erofs/xattr.c
@@ -473,8 +473,6 @@ static int erofs_xattr_generic_get(const
 			return -EOPNOTSUPP;
 		break;
 	case EROFS_XATTR_INDEX_TRUSTED:
-		if (!capable(CAP_SYS_ADMIN))
-			return -EPERM;
 		break;
 	case EROFS_XATTR_INDEX_SECURITY:
 		break;



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 50/70] arch/x86/amd/ibs: Fix re-arming IBS Fetch
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (48 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 49/70] erofs: avoid duplicated permission check for "trusted." xattrs Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 51/70] x86/traps: Fix #DE Oops message regression Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Stephane Eranian, Kim Phillips,
	Peter Zijlstra (Intel)

From: Kim Phillips <kim.phillips@amd.com>

commit 221bfce5ebbdf72ff08b3bf2510ae81058ee568b upstream.

Stephane Eranian found a bug in that IBS' current Fetch counter was not
being reset when the driver would write the new value to clear it along
with the enable bit set, and found that adding an MSR write that would
first disable IBS Fetch would make IBS Fetch reset its current count.

Indeed, the PPR for AMD Family 17h Model 31h B0 55803 Rev 0.54 - Sep 12,
2019 states "The periodic fetch counter is set to IbsFetchCnt [...] when
IbsFetchEn is changed from 0 to 1."

Explicitly set IbsFetchEn to 0 and then to 1 when re-enabling IBS Fetch,
so the driver properly resets the internal counter to 0 and IBS
Fetch starts counting again.

A family 15h machine tested does not have this problem, and the extra
wrmsr is also not needed on Family 19h, so only do the extra wrmsr on
families 16h through 18h.

Reported-by: Stephane Eranian <stephane.eranian@google.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
[peterz: optimized]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/events/amd/ibs.c |   15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -89,6 +89,7 @@ struct perf_ibs {
 	u64				max_period;
 	unsigned long			offset_mask[1];
 	int				offset_max;
+	unsigned int			fetch_count_reset_broken : 1;
 	struct cpu_perf_ibs __percpu	*pcpu;
 
 	struct attribute		**format_attrs;
@@ -363,7 +364,12 @@ perf_ibs_event_update(struct perf_ibs *p
 static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
 					 struct hw_perf_event *hwc, u64 config)
 {
-	wrmsrl(hwc->config_base, hwc->config | config | perf_ibs->enable_mask);
+	u64 tmp = hwc->config | config;
+
+	if (perf_ibs->fetch_count_reset_broken)
+		wrmsrl(hwc->config_base, tmp & ~perf_ibs->enable_mask);
+
+	wrmsrl(hwc->config_base, tmp | perf_ibs->enable_mask);
 }
 
 /*
@@ -733,6 +739,13 @@ static __init void perf_event_ibs_init(v
 {
 	struct attribute **attr = ibs_op_format_attrs;
 
+	/*
+	 * Some chips fail to reset the fetch count when it is written; instead
+	 * they need a 0-1 transition of IbsFetchEn.
+	 */
+	if (boot_cpu_data.x86 >= 0x16 && boot_cpu_data.x86 <= 0x18)
+		perf_ibs_fetch.fetch_count_reset_broken = 1;
+
 	perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
 
 	if (ibs_caps & IBS_CAPS_OPCNT) {



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 51/70] x86/traps: Fix #DE Oops message regression
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (49 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 50/70] arch/x86/amd/ibs: Fix re-arming IBS Fetch Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 52/70] x86/xen: disable Firmware First mode for correctable memory errors Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dmitry Vyukov, Thomas Gleixner

From: Thomas Gleixner <tglx@linutronix.de>

commit 5f1ec1fd32252af5130dac23b5542e8e66fe0bcb upstream.

The conversion of #DE to the idtentry mechanism introduced a change in the
Ooops message which confuses tools which parse crash information in dmesg.

Remove the underscore from 'divide_error' to restore previous behaviour.

Fixes: 9d06c4027f21 ("x86/entry: Convert Divide Error to IDTENTRY")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/CACT4Y+bTZFkuZd7+bPArowOv-7Die+WZpfOWnEO_Wgs3U59+oA@mail.gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/kernel/traps.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -196,7 +196,7 @@ static __always_inline void __user *erro
 
 DEFINE_IDTENTRY(exc_divide_error)
 {
-	do_error_trap(regs, 0, "divide_error", X86_TRAP_DE, SIGFPE,
+	do_error_trap(regs, 0, "divide error", X86_TRAP_DE, SIGFPE,
 		      FPE_INTDIV, error_get_trap_addr(regs));
 }
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 52/70] x86/xen: disable Firmware First mode for correctable memory errors
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (50 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 51/70] x86/traps: Fix #DE Oops message regression Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 53/70] PCI: aardvark: Fix initialization with old Marvells Arm Trusted Firmware Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Juergen Gross, Boris Ostrovsky

From: Juergen Gross <jgross@suse.com>

commit d759af38572f97321112a0852353613d18126038 upstream.

When running as Xen dom0 the kernel isn't responsible for selecting the
error handling mode, this should be handled by the hypervisor.

So disable setting FF mode when running as Xen pv guest. Not doing so
might result in boot splats like:

[    7.509696] HEST: Enabling Firmware First mode for corrected errors.
[    7.510382] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 2.
[    7.510383] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 3.
[    7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 4.
[    7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 5.
[    7.510385] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 6.
[    7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 7.
[    7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 8.

Reason is that the HEST ACPI table contains the real number of MCA
banks, while the hypervisor is emulating only 2 banks for guests.

Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20200925140751.31381-1-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/xen/enlighten_pv.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1438,6 +1438,15 @@ asmlinkage __visible void __init xen_sta
 		x86_init.mpparse.get_smp_config = x86_init_uint_noop;
 
 		xen_boot_params_init_edd();
+
+#ifdef CONFIG_ACPI
+		/*
+		 * Disable selecting "Firmware First mode" for correctable
+		 * memory errors, as this is the duty of the hypervisor to
+		 * decide.
+		 */
+		acpi_disable_cmcff = 1;
+#endif
 	}
 
 	if (!boot_params.screen_info.orig_video_isVGA)



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 53/70] PCI: aardvark: Fix initialization with old Marvells Arm Trusted Firmware
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (51 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 52/70] x86/xen: disable Firmware First mode for correctable memory errors Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 54/70] ata: ahci: mvebu: Make SATA PHY optional for Armada 3720 Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tomasz Maciej Nowak, Pali Rohár,
	Lorenzo Pieralisi, Rob Herring

From: Pali Rohár <pali@kernel.org>

commit b0c6ae0f8948a2be6bf4e8b4bbab9ca1343289b6 upstream.

Old ATF automatically power on pcie phy and does not provide SMC call for
phy power on functionality which leads to aardvark initialization failure:

[    0.330134] mvebu-a3700-comphy d0018300.phy: unsupported SMC call, try updating your firmware
[    0.338846] phy phy-d0018300.phy.1: phy poweron failed --> -95
[    0.344753] advk-pcie d0070000.pcie: Failed to initialize PHY (-95)
[    0.351160] advk-pcie: probe of d0070000.pcie failed with error -95

This patch fixes above failure by ignoring 'not supported' error in
aardvark driver. In this case it is expected that phy is already power on.

Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Link: https://lore.kernel.org/r/20200902144344.16684-3-pali@kernel.org
Fixes: 366697018c9a ("PCI: aardvark: Add PHY support")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: <stable@vger.kernel.org> # 5.8+: ea17a0f153af: phy: marvell: comphy: Convert internal SMCC firmware return codes to errno
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/pci/controller/pci-aardvark.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -1068,7 +1068,9 @@ static int advk_pcie_enable_phy(struct a
 	}
 
 	ret = phy_power_on(pcie->phy);
-	if (ret) {
+	if (ret == -EOPNOTSUPP) {
+		dev_warn(&pcie->pdev->dev, "PHY unsupported by firmware\n");
+	} else if (ret) {
 		phy_exit(pcie->phy);
 		return ret;
 	}



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 54/70] ata: ahci: mvebu: Make SATA PHY optional for Armada 3720
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (52 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 53/70] PCI: aardvark: Fix initialization with old Marvells Arm Trusted Firmware Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 55/70] fuse: fix page dereference after free Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Pali Rohár, Tomasz Maciej Nowak,
	Jens Axboe

From: Pali Rohár <pali@kernel.org>

commit 45aefe3d2251e4e229d7662052739f96ad1d08d9 upstream.

Older ATF does not provide SMC call for SATA phy power on functionality and
therefore initialization of ahci_mvebu is failing when older version of ATF
is using. In this case phy_power_on() function returns -EOPNOTSUPP.

This patch adds a new hflag AHCI_HFLAG_IGN_NOTSUPP_POWER_ON which cause
that ahci_platform_enable_phys() would ignore -EOPNOTSUPP errors from
phy_power_on() call.

It fixes initialization of ahci_mvebu on Espressobin boards where is older
Marvell's Arm Trusted Firmware without SMC call for SATA phy power.

This is regression introduced in commit 8e18c8e58da64 ("arm64: dts: marvell:
armada-3720-espressobin: declare SATA PHY property") where SATA phy was
defined and therefore ahci_platform_enable_phys() on Espressobin started
failing.

Fixes: 8e18c8e58da64 ("arm64: dts: marvell: armada-3720-espressobin: declare SATA PHY property")
Signed-off-by: Pali Rohár <pali@kernel.org>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Cc: <stable@vger.kernel.org> # 5.1+: ea17a0f153af: phy: marvell: comphy: Convert internal SMCC firmware return codes to errno
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/ata/ahci.h             |    2 ++
 drivers/ata/ahci_mvebu.c       |    2 +-
 drivers/ata/libahci_platform.c |    2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/ata/ahci.h
+++ b/drivers/ata/ahci.h
@@ -240,6 +240,8 @@ enum {
 							as default lpm_policy */
 	AHCI_HFLAG_SUSPEND_PHYS		= (1 << 26), /* handle PHYs during
 							suspend/resume */
+	AHCI_HFLAG_IGN_NOTSUPP_POWER_ON	= (1 << 27), /* ignore -EOPNOTSUPP
+							from phy_power_on() */
 
 	/* ap->flags bits */
 
--- a/drivers/ata/ahci_mvebu.c
+++ b/drivers/ata/ahci_mvebu.c
@@ -227,7 +227,7 @@ static const struct ahci_mvebu_plat_data
 
 static const struct ahci_mvebu_plat_data ahci_mvebu_armada_3700_plat_data = {
 	.plat_config = ahci_mvebu_armada_3700_config,
-	.flags = AHCI_HFLAG_SUSPEND_PHYS,
+	.flags = AHCI_HFLAG_SUSPEND_PHYS | AHCI_HFLAG_IGN_NOTSUPP_POWER_ON,
 };
 
 static const struct of_device_id ahci_mvebu_of_match[] = {
--- a/drivers/ata/libahci_platform.c
+++ b/drivers/ata/libahci_platform.c
@@ -59,7 +59,7 @@ int ahci_platform_enable_phys(struct ahc
 		}
 
 		rc = phy_power_on(hpriv->phys[i]);
-		if (rc) {
+		if (rc && !(rc == -EOPNOTSUPP && (hpriv->flags & AHCI_HFLAG_IGN_NOTSUPP_POWER_ON))) {
 			phy_exit(hpriv->phys[i]);
 			goto disable_phys;
 		}



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 55/70] fuse: fix page dereference after free
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (53 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 54/70] ata: ahci: mvebu: Make SATA PHY optional for Armada 3720 Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 56/70] bpf: Fix comment for helper bpf_current_task_under_cgroup() Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Pradeep P V K, Miklos Szeredi

From: Miklos Szeredi <mszeredi@redhat.com>

commit d78092e4937de9ce55edcb4ee4c5e3c707be0190 upstream.

After unlock_request() pages from the ap->pages[] array may be put (e.g. by
aborting the connection) and the pages can be freed.

Prevent use after free by grabbing a reference to the page before calling
unlock_request().

The original patch was created by Pradeep P V K.

Reported-by: Pradeep P V K <ppvk@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/fuse/dev.c |   28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -785,15 +785,16 @@ static int fuse_try_move_page(struct fus
 	struct page *newpage;
 	struct pipe_buffer *buf = cs->pipebufs;
 
+	get_page(oldpage);
 	err = unlock_request(cs->req);
 	if (err)
-		return err;
+		goto out_put_old;
 
 	fuse_copy_finish(cs);
 
 	err = pipe_buf_confirm(cs->pipe, buf);
 	if (err)
-		return err;
+		goto out_put_old;
 
 	BUG_ON(!cs->nr_segs);
 	cs->currbuf = buf;
@@ -833,7 +834,7 @@ static int fuse_try_move_page(struct fus
 	err = replace_page_cache_page(oldpage, newpage, GFP_KERNEL);
 	if (err) {
 		unlock_page(newpage);
-		return err;
+		goto out_put_old;
 	}
 
 	get_page(newpage);
@@ -852,14 +853,19 @@ static int fuse_try_move_page(struct fus
 	if (err) {
 		unlock_page(newpage);
 		put_page(newpage);
-		return err;
+		goto out_put_old;
 	}
 
 	unlock_page(oldpage);
+	/* Drop ref for ap->pages[] array */
 	put_page(oldpage);
 	cs->len = 0;
 
-	return 0;
+	err = 0;
+out_put_old:
+	/* Drop ref obtained in this function */
+	put_page(oldpage);
+	return err;
 
 out_fallback_unlock:
 	unlock_page(newpage);
@@ -868,10 +874,10 @@ out_fallback:
 	cs->offset = buf->offset;
 
 	err = lock_request(cs->req);
-	if (err)
-		return err;
+	if (!err)
+		err = 1;
 
-	return 1;
+	goto out_put_old;
 }
 
 static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page,
@@ -883,14 +889,16 @@ static int fuse_ref_page(struct fuse_cop
 	if (cs->nr_segs >= cs->pipe->max_usage)
 		return -EIO;
 
+	get_page(page);
 	err = unlock_request(cs->req);
-	if (err)
+	if (err) {
+		put_page(page);
 		return err;
+	}
 
 	fuse_copy_finish(cs);
 
 	buf = cs->pipebufs;
-	get_page(page);
 	buf->page = page;
 	buf->offset = offset;
 	buf->len = count;



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 56/70] bpf: Fix comment for helper bpf_current_task_under_cgroup()
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (54 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 55/70] fuse: fix page dereference after free Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 57/70] evm: Check size of security.evm before using it Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Song Liu, Alexei Starovoitov

From: Song Liu <songliubraving@fb.com>

commit 1aef5b4391f0c75c0a1523706a7b0311846ee12f upstream.

This should be "current" not "skb".

Fixes: c6b5fb8690fa ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/bpf/20200910203314.70018-1-songliubraving@fb.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/uapi/linux/bpf.h       |    4 ++--
 tools/include/uapi/linux/bpf.h |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1416,8 +1416,8 @@ union bpf_attr {
  * 	Return
  * 		The return value depends on the result of the test, and can be:
  *
- * 		* 0, if the *skb* task belongs to the cgroup2.
- * 		* 1, if the *skb* task does not belong to the cgroup2.
+ *		* 0, if current task belongs to the cgroup2.
+ *		* 1, if current task does not belong to the cgroup2.
  * 		* A negative error code, if an error occurred.
  *
  * int bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1416,8 +1416,8 @@ union bpf_attr {
  * 	Return
  * 		The return value depends on the result of the test, and can be:
  *
- * 		* 0, if the *skb* task belongs to the cgroup2.
- * 		* 1, if the *skb* task does not belong to the cgroup2.
+ *		* 0, if current task belongs to the cgroup2.
+ *		* 1, if current task does not belong to the cgroup2.
  * 		* A negative error code, if an error occurred.
  *
  * int bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 57/70] evm: Check size of security.evm before using it
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (55 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 56/70] bpf: Fix comment for helper bpf_current_task_under_cgroup() Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 58/70] p54: avoid accessing the data mapped to streaming DMA Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Roberto Sassu, Mimi Zohar

From: Roberto Sassu <roberto.sassu@huawei.com>

commit 455b6c9112eff8d249e32ba165742085678a80a4 upstream.

This patch checks the size for the EVM_IMA_XATTR_DIGSIG and
EVM_XATTR_PORTABLE_DIGSIG types to ensure that the algorithm is read from
the buffer returned by vfs_getxattr_alloc().

Cc: stable@vger.kernel.org # 4.19.x
Fixes: 5feeb61183dde ("evm: Allow non-SHA1 digital signatures")
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 security/integrity/evm/evm_main.c |    6 ++++++
 1 file changed, 6 insertions(+)

--- a/security/integrity/evm/evm_main.c
+++ b/security/integrity/evm/evm_main.c
@@ -181,6 +181,12 @@ static enum integrity_status evm_verify_
 		break;
 	case EVM_IMA_XATTR_DIGSIG:
 	case EVM_XATTR_PORTABLE_DIGSIG:
+		/* accept xattr with non-empty signature field */
+		if (xattr_len <= sizeof(struct signature_v2_hdr)) {
+			evm_status = INTEGRITY_FAIL;
+			goto out;
+		}
+
 		hdr = (struct signature_v2_hdr *)xattr_data;
 		digest.hdr.algo = hdr->hash_algo;
 		rc = evm_calc_hash(dentry, xattr_name, xattr_value,



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 58/70] p54: avoid accessing the data mapped to streaming DMA
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (56 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 57/70] evm: Check size of security.evm before using it Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 59/70] cxl: Rework error message for incompatible slots Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jia-Ju Bai, Christian Lamparter, Kalle Valo

From: Jia-Ju Bai <baijiaju@tsinghua.edu.cn>

commit 478762855b5ae9f68fa6ead1edf7abada70fcd5f upstream.

In p54p_tx(), skb->data is mapped to streaming DMA on line 337:
  mapping = pci_map_single(..., skb->data, ...);

Then skb->data is accessed on line 349:
  desc->device_addr = ((struct p54_hdr *)skb->data)->req_id;

This access may cause data inconsistency between CPU cache and hardware.

To fix this problem, ((struct p54_hdr *)skb->data)->req_id is stored in
a local variable before DMA mapping, and then the driver accesses this
local variable instead of skb->data.

Cc: <stable@vger.kernel.org>
Signed-off-by: Jia-Ju Bai <baijiaju@tsinghua.edu.cn>
Acked-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20200802132949.26788-1-baijiaju@tsinghua.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/net/wireless/intersil/p54/p54pci.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/net/wireless/intersil/p54/p54pci.c
+++ b/drivers/net/wireless/intersil/p54/p54pci.c
@@ -329,10 +329,12 @@ static void p54p_tx(struct ieee80211_hw
 	struct p54p_desc *desc;
 	dma_addr_t mapping;
 	u32 idx, i;
+	__le32 device_addr;
 
 	spin_lock_irqsave(&priv->lock, flags);
 	idx = le32_to_cpu(ring_control->host_idx[1]);
 	i = idx % ARRAY_SIZE(ring_control->tx_data);
+	device_addr = ((struct p54_hdr *)skb->data)->req_id;
 
 	mapping = pci_map_single(priv->pdev, skb->data, skb->len,
 				 PCI_DMA_TODEVICE);
@@ -346,7 +348,7 @@ static void p54p_tx(struct ieee80211_hw
 
 	desc = &ring_control->tx_data[i];
 	desc->host_addr = cpu_to_le32(mapping);
-	desc->device_addr = ((struct p54_hdr *)skb->data)->req_id;
+	desc->device_addr = device_addr;
 	desc->len = cpu_to_le16(skb->len);
 	desc->flags = 0;
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 59/70] cxl: Rework error message for incompatible slots
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (57 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 58/70] p54: avoid accessing the data mapped to streaming DMA Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 60/70] RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel() Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Frederic Barrat, Andrew Donnellan,
	Michael Ellerman

From: Frederic Barrat <fbarrat@linux.ibm.com>

commit 40ac790d99c6dd16b367d5c2339e446a5f1b0593 upstream.

Improve the error message shown if a capi adapter is plugged on a
capi-incompatible slot directly under the PHB (no intermediate switch).

Fixes: 5632874311db ("cxl: Add support for POWER9 DD2")
Cc: stable@vger.kernel.org # 4.14+
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200407115601.25453-1-fbarrat@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/misc/cxl/pci.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -393,8 +393,8 @@ int cxl_calc_capp_routing(struct pci_dev
 	*capp_unit_id = get_capp_unit_id(np, *phb_index);
 	of_node_put(np);
 	if (!*capp_unit_id) {
-		pr_err("cxl: invalid capp unit id (phb_index: %d)\n",
-		       *phb_index);
+		pr_err("cxl: No capp unit found for PHB[%lld,%d]. Make sure the adapter is on a capi-compatible slot\n",
+		       *chipid, *phb_index);
 		return -ENODEV;
 	}
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 60/70] RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel()
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (58 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 59/70] cxl: Rework error message for incompatible slots Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 61/70] mtd: lpddr: Fix bad logic in print_drs_error Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dan Aloni, Leon Romanovsky, Jason Gunthorpe

From: Jason Gunthorpe <jgg@nvidia.com>

commit 2ee9bf346fbfd1dad0933b9eb3a4c2c0979b633e upstream.

This three thread race can result in the work being run once the callback
becomes NULL:

       CPU1                 CPU2                   CPU3
 netevent_callback()
                     process_one_req()       rdma_addr_cancel()
                      [..]
     spin_lock_bh()
  	set_timeout()
     spin_unlock_bh()

						spin_lock_bh()
						list_del_init(&req->list);
						spin_unlock_bh()

		     req->callback = NULL
		     spin_lock_bh()
		       if (!list_empty(&req->list))
                         // Skipped!
		         // cancel_delayed_work(&req->work);
		     spin_unlock_bh()

		    process_one_req() // again
		     req->callback() // BOOM
						cancel_delayed_work_sync()

The solution is to always cancel the work once it is completed so any
in between set_timeout() does not result in it running again.

Cc: stable@vger.kernel.org
Fixes: 44e75052bc2a ("RDMA/rdma_cm: Make rdma_addr_cancel into a fence")
Link: https://lore.kernel.org/r/20200930072007.1009692-1-leon@kernel.org
Reported-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/infiniband/core/addr.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -647,13 +647,12 @@ static void process_one_req(struct work_
 	req->callback = NULL;
 
 	spin_lock_bh(&lock);
+	/*
+	 * Although the work will normally have been canceled by the workqueue,
+	 * it can still be requeued as long as it is on the req_list.
+	 */
+	cancel_delayed_work(&req->work);
 	if (!list_empty(&req->list)) {
-		/*
-		 * Although the work will normally have been canceled by the
-		 * workqueue, it can still be requeued as long as it is on the
-		 * req_list.
-		 */
-		cancel_delayed_work(&req->work);
 		list_del_init(&req->list);
 		kfree(req);
 	}



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 61/70] mtd: lpddr: Fix bad logic in print_drs_error
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (59 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 60/70] RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel() Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 62/70] drm/i915/gem: Serialise debugfs i915_gem_objects with ctx->mutex Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Joe Perches, Gustavo A. R. Silva,
	Miquel Raynal

From: Gustavo A. R. Silva <gustavo@embeddedor.com>

commit 1c9c02bb22684f6949d2e7ddc0a3ff364fd5a6fc upstream.

Update logic for broken test. Use a more common logging style.

It appears the logic in this function is broken for the
consecutive tests of

        if (prog_status & 0x3)
                ...
        else if (prog_status & 0x2)
                ...
        else (prog_status & 0x1)
                ...

Likely the first test should be

        if ((prog_status & 0x3) == 0x3)

Found by inspection of include files using printk.

Fixes: eb3db27507f7 ("[MTD] LPDDR PFOW definition")
Cc: stable@vger.kernel.org
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/3fb0e29f5b601db8be2938a01d974b00c8788501.1588016644.git.gustavo@embeddedor.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/mtd/pfow.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/mtd/pfow.h
+++ b/include/linux/mtd/pfow.h
@@ -128,7 +128,7 @@ static inline void print_drs_error(unsig
 
 	if (!(dsr & DSR_AVAILABLE))
 		printk(KERN_NOTICE"DSR.15: (0) Device not Available\n");
-	if (prog_status & 0x03)
+	if ((prog_status & 0x03) == 0x03)
 		printk(KERN_NOTICE"DSR.9,8: (11) Attempt to program invalid "
 						"half with 41h command\n");
 	else if (prog_status & 0x02)



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 62/70] drm/i915/gem: Serialise debugfs i915_gem_objects with ctx->mutex
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (60 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 61/70] mtd: lpddr: Fix bad logic in print_drs_error Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 63/70] serial: qcom_geni_serial: To correct QUP Version detection logic Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Chris Wilson, CQ Tang, Daniel Vetter,
	Tvrtko Ursulin, Rodrigo Vivi

From: Chris Wilson <chris@chris-wilson.co.uk>

commit 4fe9af8e881d946bf60790eeb37a7c4f96e28382 upstream.

Since the debugfs may peek into the GEM contexts as the corresponding
client/fd is being closed, we may try and follow a dangling pointer.
However, the context closure itself is serialised with the ctx->mutex,
so if we hold that mutex as we inspect the state coupled in the context,
we know the pointers within the context are stable and will remain valid
as we inspect their tables.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: CQ Tang <cq.tang@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200723172119.17649-3-chris@chris-wilson.co.uk
(cherry picked from commit 102f5aa491f262c818e607fc4fee08a724a76c69)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/i915/i915_debugfs.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -323,6 +323,7 @@ static void print_context_stats(struct s
 		}
 		i915_gem_context_unlock_engines(ctx);
 
+		mutex_lock(&ctx->mutex);
 		if (!IS_ERR_OR_NULL(ctx->file_priv)) {
 			struct file_stats stats = {
 				.vm = rcu_access_pointer(ctx->vm),
@@ -343,6 +344,7 @@ static void print_context_stats(struct s
 
 			print_file_stats(m, name, stats);
 		}
+		mutex_unlock(&ctx->mutex);
 
 		spin_lock(&i915->gem.contexts.lock);
 		list_safe_reset_next(ctx, cn, link);



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 63/70] serial: qcom_geni_serial: To correct QUP Version detection logic
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (61 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 62/70] drm/i915/gem: Serialise debugfs i915_gem_objects with ctx->mutex Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 64/70] serial: pl011: Fix lockdep splat when handling magic-sysrq interrupt Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Paras Sharma, Akash Asthana

From: Paras Sharma <parashar@codeaurora.org>

commit c9ca43d42ed8d5fd635d327a664ed1d8579eb2af upstream.

For QUP IP versions 2.5 and above the oversampling rate is
halved from 32 to 16.

Commit ce734600545f ("tty: serial: qcom_geni_serial: Update
the oversampling rate") is pushed to handle this scenario.
But the existing logic is failing to classify QUP Version 3.0
into the correct group ( 2.5 and above).

As result Serial Engine clocks are not configured properly for
baud rate and garbage data is sampled to FIFOs from the line.

So, fix the logic to detect QUP with versions 2.5 and above.

Fixes: ce734600545f ("tty: serial: qcom_geni_serial: Update the oversampling rate")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Paras Sharma <parashar@codeaurora.org>
Reviewed-by: Akash Asthana <akashast@codeaurora.org>
Link: https://lore.kernel.org/r/1601445926-23673-1-git-send-email-parashar@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/tty/serial/qcom_geni_serial.c |    2 +-
 include/linux/qcom-geni-se.h          |    3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -954,7 +954,7 @@ static void qcom_geni_serial_set_termios
 	sampling_rate = UART_OVERSAMPLING;
 	/* Sampling rate is halved for IP versions >= 2.5 */
 	ver = geni_se_get_qup_hw_version(&port->se);
-	if (GENI_SE_VERSION_MAJOR(ver) >= 2 && GENI_SE_VERSION_MINOR(ver) >= 5)
+	if (ver >= QUP_SE_VERSION_2_5)
 		sampling_rate /= 2;
 
 	clk_rate = get_clk_div_rate(baud, sampling_rate, &clk_div);
--- a/include/linux/qcom-geni-se.h
+++ b/include/linux/qcom-geni-se.h
@@ -229,6 +229,9 @@ struct geni_se {
 #define GENI_SE_VERSION_MINOR(ver) ((ver & HW_VER_MINOR_MASK) >> HW_VER_MINOR_SHFT)
 #define GENI_SE_VERSION_STEP(ver) (ver & HW_VER_STEP_MASK)
 
+/* QUP SE VERSION value for major number 2 and minor number 5 */
+#define QUP_SE_VERSION_2_5                  0x20050000
+
 #if IS_ENABLED(CONFIG_QCOM_GENI_SE)
 
 u32 geni_se_get_qup_hw_version(struct geni_se *se);



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 64/70] serial: pl011: Fix lockdep splat when handling magic-sysrq interrupt
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (62 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 63/70] serial: qcom_geni_serial: To correct QUP Version detection logic Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 65/70] PM: runtime: Fix timer_expires data type on 32-bit arches Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Russell King, Jiri Slaby,
	Peter Zijlstra, Will Deacon

From: Peter Zijlstra <peterz@infradead.org>

commit 534cf755d9df99e214ddbe26b91cd4d81d2603e2 upstream.

Issuing a magic-sysrq via the PL011 causes the following lockdep splat,
which is easily reproducible under QEMU:

  | sysrq: Changing Loglevel
  | sysrq: Loglevel set to 9
  |
  | ======================================================
  | WARNING: possible circular locking dependency detected
  | 5.9.0-rc7 #1 Not tainted
  | ------------------------------------------------------
  | systemd-journal/138 is trying to acquire lock:
  | ffffab133ad950c0 (console_owner){-.-.}-{0:0}, at: console_lock_spinning_enable+0x34/0x70
  |
  | but task is already holding lock:
  | ffff0001fd47b098 (&port_lock_key){-.-.}-{2:2}, at: pl011_int+0x40/0x488
  |
  | which lock already depends on the new lock.

  [...]

  |  Possible unsafe locking scenario:
  |
  |        CPU0                    CPU1
  |        ----                    ----
  |   lock(&port_lock_key);
  |                                lock(console_owner);
  |                                lock(&port_lock_key);
  |   lock(console_owner);
  |
  |  *** DEADLOCK ***

The issue being that CPU0 takes 'port_lock' on the irq path in pl011_int()
before taking 'console_owner' on the printk() path, whereas CPU1 takes
the two locks in the opposite order on the printk() path due to setting
the "console_owner" prior to calling into into the actual console driver.

Fix this in the same way as the msm-serial driver by dropping 'port_lock'
before handling the sysrq.

Cc: <stable@vger.kernel.org> # 4.19+
Cc: Russell King <linux@armlinux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20200811101313.GA6970@willie-the-truck
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200930120432.16551-1-will@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/tty/serial/amba-pl011.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -308,8 +308,9 @@ static void pl011_write(unsigned int val
  */
 static int pl011_fifo_to_tty(struct uart_amba_port *uap)
 {
-	u16 status;
 	unsigned int ch, flag, fifotaken;
+	int sysrq;
+	u16 status;
 
 	for (fifotaken = 0; fifotaken != 256; fifotaken++) {
 		status = pl011_read(uap, REG_FR);
@@ -344,10 +345,12 @@ static int pl011_fifo_to_tty(struct uart
 				flag = TTY_FRAME;
 		}
 
-		if (uart_handle_sysrq_char(&uap->port, ch & 255))
-			continue;
+		spin_unlock(&uap->port.lock);
+		sysrq = uart_handle_sysrq_char(&uap->port, ch & 255);
+		spin_lock(&uap->port.lock);
 
-		uart_insert_char(&uap->port, ch, UART011_DR_OE, ch, flag);
+		if (!sysrq)
+			uart_insert_char(&uap->port, ch, UART011_DR_OE, ch, flag);
 	}
 
 	return fifotaken;



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 65/70] PM: runtime: Fix timer_expires data type on 32-bit arches
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (63 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 64/70] serial: pl011: Fix lockdep splat when handling magic-sysrq interrupt Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 66/70] ata: sata_rcar: Fix DMA boundary mask Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Grygorii Strashko, Pavel Machek,
	Vincent Guittot, Rafael J. Wysocki

From: Grygorii Strashko <grygorii.strashko@ti.com>

commit 6b61d49a55796dbbc479eeb4465e59fd656c719c upstream.

Commit 8234f6734c5d ("PM-runtime: Switch autosuspend over to using
hrtimers") switched PM runtime autosuspend to use hrtimers and all
related time accounting in ns, but missed to update the timer_expires
data type in struct dev_pm_info to u64.

This causes the timer_expires value to be truncated on 32-bit
architectures when assignment is done from u64 values:

rpm_suspend()
|- dev->power.timer_expires = expires;

Fix it by changing the timer_expires type to u64.

Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: 5.0+ <stable@vger.kernel.org> # 5.0+
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/pm.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -584,7 +584,7 @@ struct dev_pm_info {
 #endif
 #ifdef CONFIG_PM
 	struct hrtimer		suspend_timer;
-	unsigned long		timer_expires;
+	u64			timer_expires;
 	struct work_struct	work;
 	wait_queue_head_t	wait_queue;
 	struct wake_irq		*wakeirq;



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 66/70] ata: sata_rcar: Fix DMA boundary mask
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (64 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 65/70] PM: runtime: Fix timer_expires data type on 32-bit arches Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 67/70] xen/gntdev.c: Mark pages as dirty Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Geert Uytterhoeven, Lad Prabhakar,
	Yoshihiro Shimoda, Christoph Hellwig, Sergei Shtylyov,
	Ulf Hansson, Jens Axboe

From: Geert Uytterhoeven <geert+renesas@glider.be>

commit df9c590986fdb6db9d5636d6cd93bc919c01b451 upstream.

Before commit 9495b7e92f716ab2 ("driver core: platform: Initialize
dma_parms for platform devices"), the R-Car SATA device didn't have DMA
parameters.  Hence the DMA boundary mask supplied by its driver was
silently ignored, as __scsi_init_queue() doesn't check the return value
of dma_set_seg_boundary(), and the default value of 0xffffffff was used.

Now the device has gained DMA parameters, the driver-supplied value is
used, and the following warning is printed on Salvator-XS:

    DMA-API: sata_rcar ee300000.sata: mapping sg segment across boundary [start=0x00000000ffffe000] [end=0x00000000ffffefff] [boundary=0x000000001ffffffe]
    WARNING: CPU: 5 PID: 38 at kernel/dma/debug.c:1233 debug_dma_map_sg+0x298/0x300

(the range of start/end values depend on whether IOMMU support is
 enabled or not)

The issue here is that SATA_RCAR_DMA_BOUNDARY doesn't have bit 0 set, so
any typical end value, which is odd, will trigger the check.

Fix this by increasing the DMA boundary value by 1.

This also fixes the following WRITE DMA EXT timeout issue:

    # dd if=/dev/urandom of=/mnt/de1/file1-1024M bs=1M count=1024
    ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
    ata1.00: failed command: WRITE DMA EXT
    ata1.00: cmd 35/00:00:00:e6:0c/00:0a:00:00:00/e0 tag 0 dma 1310720 out
    res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
    ata1.00: status: { DRDY }

as seen by Shimoda-san since commit 429120f3df2dba2b ("block: fix
splitting segments on boundary masks").

Fixes: 8bfbeed58665dbbf ("sata_rcar: correct 'sata_rcar_sht'")
Fixes: 9495b7e92f716ab2 ("driver core: platform: Initialize dma_parms for platform devices")
Fixes: 429120f3df2dba2b ("block: fix splitting segments on boundary masks")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/ata/sata_rcar.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/ata/sata_rcar.c
+++ b/drivers/ata/sata_rcar.c
@@ -120,7 +120,7 @@
 /* Descriptor table word 0 bit (when DTA32M = 1) */
 #define SATA_RCAR_DTEND			BIT(0)
 
-#define SATA_RCAR_DMA_BOUNDARY		0x1FFFFFFEUL
+#define SATA_RCAR_DMA_BOUNDARY		0x1FFFFFFFUL
 
 /* Gen2 Physical Layer Control Registers */
 #define RCAR_GEN2_PHY_CTL1_REG		0x1704



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 67/70] xen/gntdev.c: Mark pages as dirty
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (65 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 66/70] ata: sata_rcar: Fix DMA boundary mask Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 68/70] openrisc: Fix issue with get_user for 64-bit values Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Boris Ostrovsky, Souptick Joarder,
	John Hubbard, Juergen Gross, David Vrabel

From: Souptick Joarder <jrdr.linux@gmail.com>

commit 779055842da5b2e508f3ccf9a8153cb1f704f566 upstream.

There seems to be a bug in the original code when gntdev_get_page()
is called with writeable=true then the page needs to be marked dirty
before being put.

To address this, a bool writeable is added in gnt_dev_copy_batch, set
it in gntdev_grant_copy_seg() (and drop `writeable` argument to
gntdev_get_page()) and then, based on batch->writeable, use
set_page_dirty_lock().

Fixes: a4cdb556cae0 (xen/gntdev: add ioctl for grant copy)
Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1599375114-32360-1-git-send-email-jrdr.linux@gmail.com
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/xen/gntdev.c |   17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -720,17 +720,18 @@ struct gntdev_copy_batch {
 	s16 __user *status[GNTDEV_COPY_BATCH];
 	unsigned int nr_ops;
 	unsigned int nr_pages;
+	bool writeable;
 };
 
 static int gntdev_get_page(struct gntdev_copy_batch *batch, void __user *virt,
-			   bool writeable, unsigned long *gfn)
+				unsigned long *gfn)
 {
 	unsigned long addr = (unsigned long)virt;
 	struct page *page;
 	unsigned long xen_pfn;
 	int ret;
 
-	ret = get_user_pages_fast(addr, 1, writeable ? FOLL_WRITE : 0, &page);
+	ret = get_user_pages_fast(addr, 1, batch->writeable ? FOLL_WRITE : 0, &page);
 	if (ret < 0)
 		return ret;
 
@@ -746,9 +747,13 @@ static void gntdev_put_pages(struct gntd
 {
 	unsigned int i;
 
-	for (i = 0; i < batch->nr_pages; i++)
+	for (i = 0; i < batch->nr_pages; i++) {
+		if (batch->writeable && !PageDirty(batch->pages[i]))
+			set_page_dirty_lock(batch->pages[i]);
 		put_page(batch->pages[i]);
+	}
 	batch->nr_pages = 0;
+	batch->writeable = false;
 }
 
 static int gntdev_copy(struct gntdev_copy_batch *batch)
@@ -837,8 +842,9 @@ static int gntdev_grant_copy_seg(struct
 			virt = seg->source.virt + copied;
 			off = (unsigned long)virt & ~XEN_PAGE_MASK;
 			len = min(len, (size_t)XEN_PAGE_SIZE - off);
+			batch->writeable = false;
 
-			ret = gntdev_get_page(batch, virt, false, &gfn);
+			ret = gntdev_get_page(batch, virt, &gfn);
 			if (ret < 0)
 				return ret;
 
@@ -856,8 +862,9 @@ static int gntdev_grant_copy_seg(struct
 			virt = seg->dest.virt + copied;
 			off = (unsigned long)virt & ~XEN_PAGE_MASK;
 			len = min(len, (size_t)XEN_PAGE_SIZE - off);
+			batch->writeable = true;
 
-			ret = gntdev_get_page(batch, virt, true, &gfn);
+			ret = gntdev_get_page(batch, virt, &gfn);
 			if (ret < 0)
 				return ret;
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 68/70] openrisc: Fix issue with get_user for 64-bit values
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (66 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 67/70] xen/gntdev.c: Mark pages as dirty Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 69/70] misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp() Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Stafford Horne, Luc Van Oostenryck

From: Stafford Horne <shorne@gmail.com>

commit d877322bc1adcab9850732275670409e8bcca4c4 upstream.

A build failure was raised by kbuild with the following error.

    drivers/android/binder.c: Assembler messages:
    drivers/android/binder.c:3861: Error: unrecognized keyword/register name `l.lwz ?ap,4(r24)'
    drivers/android/binder.c:3866: Error: unrecognized keyword/register name `l.addi ?ap,r0,0'

The issue is with 64-bit get_user() calls on openrisc.  I traced this to
a problem where in the internally in the get_user macros there is a cast
to long __gu_val this causes GCC to think the get_user call is 32-bit.
This binder code is really long and GCC allocates register r30, which
triggers the issue. The 64-bit get_user asm tries to get the 64-bit pair
register, which for r30 overflows the general register names and returns
the dummy register ?ap.

The fix here is to move the temporary variables into the asm macros.  We
use a 32-bit __gu_tmp for 32-bit and smaller macro and a 64-bit tmp in
the 64-bit macro.  The cast in the 64-bit macro has a trick of casting
through __typeof__((x)-(x)) which avoids the below warning.  This was
barrowed from riscv.

    arch/openrisc/include/asm/uaccess.h:240:8: warning: cast to pointer from integer of different size

I tested this in a small unit test to check reading between 64-bit and
32-bit pointers to 64-bit and 32-bit values in all combinations.  Also I
ran make C=1 to confirm no new sparse warnings came up.  It all looks
clean to me.

Link: https://lore.kernel.org/lkml/202008200453.ohnhqkjQ%25lkp@intel.com/
Signed-off-by: Stafford Horne <shorne@gmail.com>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 arch/openrisc/include/asm/uaccess.h |   35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

--- a/arch/openrisc/include/asm/uaccess.h
+++ b/arch/openrisc/include/asm/uaccess.h
@@ -164,19 +164,19 @@ struct __large_struct {
 
 #define __get_user_nocheck(x, ptr, size)			\
 ({								\
-	long __gu_err, __gu_val;				\
-	__get_user_size(__gu_val, (ptr), (size), __gu_err);	\
-	(x) = (__force __typeof__(*(ptr)))__gu_val;		\
+	long __gu_err;						\
+	__get_user_size((x), (ptr), (size), __gu_err);		\
 	__gu_err;						\
 })
 
 #define __get_user_check(x, ptr, size)					\
 ({									\
-	long __gu_err = -EFAULT, __gu_val = 0;				\
-	const __typeof__(*(ptr)) * __gu_addr = (ptr);			\
-	if (access_ok(__gu_addr, size))			\
-		__get_user_size(__gu_val, __gu_addr, (size), __gu_err);	\
-	(x) = (__force __typeof__(*(ptr)))__gu_val;			\
+	long __gu_err = -EFAULT;					\
+	const __typeof__(*(ptr)) *__gu_addr = (ptr);			\
+	if (access_ok(__gu_addr, size))					\
+		__get_user_size((x), __gu_addr, (size), __gu_err);	\
+	else								\
+		(x) = (__typeof__(*(ptr))) 0;				\
 	__gu_err;							\
 })
 
@@ -190,11 +190,13 @@ do {									\
 	case 2: __get_user_asm(x, ptr, retval, "l.lhz"); break;		\
 	case 4: __get_user_asm(x, ptr, retval, "l.lwz"); break;		\
 	case 8: __get_user_asm2(x, ptr, retval); break;			\
-	default: (x) = __get_user_bad();				\
+	default: (x) = (__typeof__(*(ptr)))__get_user_bad();		\
 	}								\
 } while (0)
 
 #define __get_user_asm(x, addr, err, op)		\
+{							\
+	unsigned long __gu_tmp;				\
 	__asm__ __volatile__(				\
 		"1:	"op" %1,0(%2)\n"		\
 		"2:\n"					\
@@ -208,10 +210,14 @@ do {									\
 		"	.align 2\n"			\
 		"	.long 1b,3b\n"			\
 		".previous"				\
-		: "=r"(err), "=r"(x)			\
-		: "r"(addr), "i"(-EFAULT), "0"(err))
+		: "=r"(err), "=r"(__gu_tmp)		\
+		: "r"(addr), "i"(-EFAULT), "0"(err));	\
+	(x) = (__typeof__(*(addr)))__gu_tmp;		\
+}
 
 #define __get_user_asm2(x, addr, err)			\
+{							\
+	unsigned long long __gu_tmp;			\
 	__asm__ __volatile__(				\
 		"1:	l.lwz %1,0(%2)\n"		\
 		"2:	l.lwz %H1,4(%2)\n"		\
@@ -228,8 +234,11 @@ do {									\
 		"	.long 1b,4b\n"			\
 		"	.long 2b,4b\n"			\
 		".previous"				\
-		: "=r"(err), "=&r"(x)			\
-		: "r"(addr), "i"(-EFAULT), "0"(err))
+		: "=r"(err), "=&r"(__gu_tmp)		\
+		: "r"(addr), "i"(-EFAULT), "0"(err));	\
+	(x) = (__typeof__(*(addr)))(			\
+		(__typeof__((x)-(x)))__gu_tmp);		\
+}
 
 /* more complex routines */
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 69/70] misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp()
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (67 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 68/70] openrisc: Fix issue with get_user for 64-bit values Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 11:36 ` [PATCH 5.8 70/70] phy: marvell: comphy: Convert internal SMCC firmware return codes to errno Greg Kroah-Hartman
                   ` (3 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Ricky Wu, Chris Clayton

From: Ricky Wu <ricky_wu@realtek.com>

commit 551b6729578a8981c46af964c10bf7d5d9ddca83 upstream.

this power saving action in rtsx_pci_init_ocp() cause INTEL-NUC6 platform
missing card reader

Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
Link: https://lore.kernel.org/r/20200824030006.30033-1-ricky_wu@realtek.com
Cc: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/misc/cardreader/rtsx_pcr.c |    4 ----
 1 file changed, 4 deletions(-)

--- a/drivers/misc/cardreader/rtsx_pcr.c
+++ b/drivers/misc/cardreader/rtsx_pcr.c
@@ -1172,10 +1172,6 @@ void rtsx_pci_init_ocp(struct rtsx_pcr *
 			rtsx_pci_write_register(pcr, REG_OCPGLITCH,
 				SD_OCP_GLITCH_MASK, pcr->hw_param.ocp_glitch);
 			rtsx_pci_enable_ocp(pcr);
-		} else {
-			/* OC power down */
-			rtsx_pci_write_register(pcr, FPDCTL, OC_POWER_DOWN,
-				OC_POWER_DOWN);
 		}
 	}
 }



^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5.8 70/70] phy: marvell: comphy: Convert internal SMCC firmware return codes to errno
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (68 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 69/70] misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp() Greg Kroah-Hartman
@ 2020-10-31 11:36 ` Greg Kroah-Hartman
  2020-10-31 20:08 ` [PATCH 5.8 00/70] 5.8.18-rc1 review Guenter Roeck
                   ` (2 subsequent siblings)
  72 siblings, 0 replies; 74+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-31 11:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Tomasz Maciej Nowak, Pali Rohár,
	Lorenzo Pieralisi, Rob Herring

From: Pali Rohár <pali@kernel.org>

commit ea17a0f153af2cd890e4ce517130dcccaa428c13 upstream.

Driver ->power_on and ->power_off callbacks leaks internal SMCC firmware
return codes to phy caller. This patch converts SMCC error codes to
standard linux errno codes. Include file linux/arm-smccc.h already provides
defines for SMCC error codes, so use them instead of custom driver defines.
Note that return value is signed 32bit, but stored in unsigned long type
with zero padding.

Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Link: https://lore.kernel.org/r/20200902144344.16684-2-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/phy/marvell/phy-mvebu-a3700-comphy.c |   14 +++++++++++---
 drivers/phy/marvell/phy-mvebu-cp110-comphy.c |   14 +++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)

--- a/drivers/phy/marvell/phy-mvebu-a3700-comphy.c
+++ b/drivers/phy/marvell/phy-mvebu-a3700-comphy.c
@@ -26,7 +26,6 @@
 #define COMPHY_SIP_POWER_ON			0x82000001
 #define COMPHY_SIP_POWER_OFF			0x82000002
 #define COMPHY_SIP_PLL_LOCK			0x82000003
-#define COMPHY_FW_NOT_SUPPORTED			(-1)
 
 #define COMPHY_FW_MODE_SATA			0x1
 #define COMPHY_FW_MODE_SGMII			0x2
@@ -112,10 +111,19 @@ static int mvebu_a3700_comphy_smc(unsign
 				  unsigned long mode)
 {
 	struct arm_smccc_res res;
+	s32 ret;
 
 	arm_smccc_smc(function, lane, mode, 0, 0, 0, 0, 0, &res);
+	ret = res.a0;
 
-	return res.a0;
+	switch (ret) {
+	case SMCCC_RET_SUCCESS:
+		return 0;
+	case SMCCC_RET_NOT_SUPPORTED:
+		return -EOPNOTSUPP;
+	default:
+		return -EINVAL;
+	}
 }
 
 static int mvebu_a3700_comphy_get_fw_mode(int lane, int port,
@@ -220,7 +228,7 @@ static int mvebu_a3700_comphy_power_on(s
 	}
 
 	ret = mvebu_a3700_comphy_smc(COMPHY_SIP_POWER_ON, lane->id, fw_param);
-	if (ret == COMPHY_FW_NOT_SUPPORTED)
+	if (ret == -EOPNOTSUPP)
 		dev_err(lane->dev,
 			"unsupported SMC call, try updating your firmware\n");
 
--- a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
+++ b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
@@ -123,7 +123,6 @@
 
 #define COMPHY_SIP_POWER_ON	0x82000001
 #define COMPHY_SIP_POWER_OFF	0x82000002
-#define COMPHY_FW_NOT_SUPPORTED	(-1)
 
 /*
  * A lane is described by the following bitfields:
@@ -273,10 +272,19 @@ static int mvebu_comphy_smc(unsigned lon
 			    unsigned long lane, unsigned long mode)
 {
 	struct arm_smccc_res res;
+	s32 ret;
 
 	arm_smccc_smc(function, phys, lane, mode, 0, 0, 0, 0, &res);
+	ret = res.a0;
 
-	return res.a0;
+	switch (ret) {
+	case SMCCC_RET_SUCCESS:
+		return 0;
+	case SMCCC_RET_NOT_SUPPORTED:
+		return -EOPNOTSUPP;
+	default:
+		return -EINVAL;
+	}
 }
 
 static int mvebu_comphy_get_mode(bool fw_mode, int lane, int port,
@@ -819,7 +827,7 @@ static int mvebu_comphy_power_on(struct
 	if (!ret)
 		return ret;
 
-	if (ret == COMPHY_FW_NOT_SUPPORTED)
+	if (ret == -EOPNOTSUPP)
 		dev_err(priv->dev,
 			"unsupported SMC call, try updating your firmware\n");
 



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5.8 00/70] 5.8.18-rc1 review
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (69 preceding siblings ...)
  2020-10-31 11:36 ` [PATCH 5.8 70/70] phy: marvell: comphy: Convert internal SMCC firmware return codes to errno Greg Kroah-Hartman
@ 2020-10-31 20:08 ` Guenter Roeck
  2020-11-01  7:19 ` Naresh Kamboju
  2020-11-02  9:53 ` Jon Hunter
  72 siblings, 0 replies; 74+ messages in thread
From: Guenter Roeck @ 2020-10-31 20:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, torvalds, akpm, shuah, patches, lkft-triage, pavel, stable

On Sat, Oct 31, 2020 at 12:35:32PM +0100, Greg Kroah-Hartman wrote:
> -------------------------
> Note, this is going to be the LAST 5.8.y kernel release.  After this
> one, this branch is now end-of-life.  Please move to the 5.9.y branch at
> this point in time.
> -------------------------
> 
> This is the start of the stable review cycle for the 5.8.18 release.
> There are 70 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000.
> Anything received after that time might be too late.
> 

Build results:
	total: 154 pass: 154 fail: 0
Qemu test results:
	total: 426 pass: 426 fail: 0

Tested-by: Guenter Roeck <linux@roeck-us.net>

Guenter

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5.8 00/70] 5.8.18-rc1 review
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (70 preceding siblings ...)
  2020-10-31 20:08 ` [PATCH 5.8 00/70] 5.8.18-rc1 review Guenter Roeck
@ 2020-11-01  7:19 ` Naresh Kamboju
  2020-11-02  9:53 ` Jon Hunter
  72 siblings, 0 replies; 74+ messages in thread
From: Naresh Kamboju @ 2020-11-01  7:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: open list, Linus Torvalds, Andrew Morton, Guenter Roeck,
	Shuah Khan, patches, lkft-triage, pavel, linux- stable

On Sat, 31 Oct 2020 at 17:11, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> -------------------------
> Note, this is going to be the LAST 5.8.y kernel release.  After this
> one, this branch is now end-of-life.  Please move to the 5.9.y branch at
> this point in time.
> -------------------------
>
> This is the start of the stable review cycle for the 5.8.18 release.
> There are 70 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
>         https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.18-rc1.gz
> or in the git tree and branch at:
>         git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.8.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h


Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>

NOTE:
LTP version upgrade to 20200930. Due to this change we have noticed few test
failures and fixes which are not related to kernel changes.

Summary
------------------------------------------------------------------------

kernel: 5.8.18-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-5.8.y
git commit: 46e8244bb94fd0c961f1df918b14b2d3a3970398
git describe: v5.8.17-71-g46e8244bb94f
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.8.y/build/v5.8.17-71-g46e8244bb94f

No regressions (compared to build v5.8.17)

No fixes (compared to build v5.8.17)

Fixes (compared to LTP 20200515)
These fixes are coming from LTP upgrade 20200930.

  ltp-commands-tests:
    * logrotate_sh

  ltp-containers-tests:
    * netns_netlink

  ltp-controllers-tests:
    * cpuset_hotplug

  ltp-crypto-tests:
    * af_alg02

  ltp-cve-tests:
    * cve-2017-17805
    * cve-2018-1000199

  ltp-syscalls-tests:
    * clock_gettime03
    * clone302
    * copy_file_range02
    * mknod07
    * ptrace08
    * syslog01
    * syslog02
    * syslog03
    * syslog04
    * syslog05
    * syslog07
    * syslog08
    * syslog09
    * syslog10

  ltp-open-posix-tests:
    * clocks_invaliddates

-- 
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5.8 00/70] 5.8.18-rc1 review
  2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
                   ` (71 preceding siblings ...)
  2020-11-01  7:19 ` Naresh Kamboju
@ 2020-11-02  9:53 ` Jon Hunter
  72 siblings, 0 replies; 74+ messages in thread
From: Jon Hunter @ 2020-11-02  9:53 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Greg Kroah-Hartman, torvalds, akpm, linux, shuah, patches,
	lkft-triage, pavel, stable, linux-tegra

On Sat, 31 Oct 2020 12:35:32 +0100, Greg Kroah-Hartman wrote:
> -------------------------
> Note, this is going to be the LAST 5.8.y kernel release.  After this
> one, this branch is now end-of-life.  Please move to the 5.9.y branch at
> this point in time.
> -------------------------
> 
> This is the start of the stable review cycle for the 5.8.18 release.
> There are 70 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Mon, 02 Nov 2020 11:34:42 +0000.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.8.18-rc1.gz
> or in the git tree and branch at:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.8.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h

All tests passing for Tegra ...

Test results for stable-v5.8:
    15 builds:	15 pass, 0 fail
    26 boots:	26 pass, 0 fail
    60 tests:	60 pass, 0 fail

Linux version:	5.8.18-rc1-g46e8244bb94f
Boards tested:	tegra124-jetson-tk1, tegra186-p2771-0000,
                tegra194-p2972-0000, tegra20-ventana,
                tegra210-p2371-2180, tegra210-p3450-0000,
                tegra30-cardhu-a04

Tested-by: Jon Hunter <jonathanh@nvidia.com>

Jon

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2020-11-02  9:53 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-31 11:35 [PATCH 5.8 00/70] 5.8.18-rc1 review Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 01/70] netfilter: nftables_offload: KASAN slab-out-of-bounds Read in nft_flow_rule_create Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 02/70] io_uring: dont run task work on an exiting task Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 03/70] io_uring: allow timeout/poll/files killing to take task into account Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 04/70] io_uring: move dropping of files into separate helper Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 05/70] io_uring: stash ctx task reference for SQPOLL Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 06/70] io_uring: unconditionally grab req->task Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 07/70] io_uring: return cancelation status from poll/timeout/files handlers Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 08/70] io_uring: enable task/files specific overflow flushing Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 09/70] io_uring: dont rely on weak ->files references Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 10/70] io_uring: reference ->nsproxy for file table commands Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 11/70] io_wq: Make io_wqe::lock a raw_spinlock_t Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 12/70] io-wq: fix use-after-free in io_wq_worker_running Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 13/70] io_uring: no need to call xa_destroy() on empty xarray Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 14/70] io_uring: Fix use of XArray in __io_uring_files_cancel Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 15/70] io_uring: Fix XArray usage in io_uring_add_task_file Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 16/70] io_uring: Convert advanced XArray uses to the normal API Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 17/70] scripts/setlocalversion: make git describe output more reliable Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 18/70] efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 19/70] fs/kernel_read_file: Remove FIRMWARE_EFI_EMBEDDED enum Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 20/70] arm64: Run ARCH_WORKAROUND_1 enabling code on all CPUs Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 21/70] arm64: Run ARCH_WORKAROUND_2 " Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 22/70] arm64: link with -z norelro regardless of CONFIG_RELOCATABLE Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 23/70] x86/PCI: Fix intel_mid_pci.c build error when ACPI is not enabled Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 24/70] x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 25/70] x86/copy_mc: Introduce copy_mc_enhanced_fast_string() Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 26/70] efivarfs: Replace invalid slashes with exclamation marks in dentries Greg Kroah-Hartman
2020-10-31 11:35 ` [PATCH 5.8 27/70] bnxt_en: Check abort error state in bnxt_open_nic() Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 28/70] bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one() Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 29/70] bnxt_en: Invoke cancel_delayed_work_sync() for PFs also Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 30/70] bnxt_en: Re-write PCI BARs after PCI fatal error Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 31/70] bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 32/70] chelsio/chtls: fix deadlock issue Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 33/70] chelsio/chtls: fix memory leaks in CPL handlers Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 34/70] chelsio/chtls: fix tls record info to user Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 35/70] cxgb4: set up filter action after rewrites Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 36/70] gtp: fix an use-before-init in gtp_newlink() Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 37/70] ibmveth: Fix use of ibmveth in a bridge Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 38/70] ibmvnic: fix ibmvnic_set_mac Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 39/70] mlxsw: core: Fix memory leak on module removal Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 40/70] netem: fix zero division in tabledist Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 41/70] net: hns3: Clear the CMDQ registers before unmapping BAR region Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 42/70] net: ipa: command payloads already mapped Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 43/70] net/sched: act_mpls: Add softdep on mpls_gso.ko Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 44/70] r8169: fix issue with forced threading in combination with shared interrupts Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 45/70] ravb: Fix bit fields checking in ravb_hwtstamp_get() Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 46/70] tcp: Prevent low rmem stalls with SO_RCVLOWAT Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 47/70] tipc: fix memory leak caused by tipc_buf_append() Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 48/70] net: protect tcf_block_unbind with block lock Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 49/70] erofs: avoid duplicated permission check for "trusted." xattrs Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 50/70] arch/x86/amd/ibs: Fix re-arming IBS Fetch Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 51/70] x86/traps: Fix #DE Oops message regression Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 52/70] x86/xen: disable Firmware First mode for correctable memory errors Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 53/70] PCI: aardvark: Fix initialization with old Marvells Arm Trusted Firmware Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 54/70] ata: ahci: mvebu: Make SATA PHY optional for Armada 3720 Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 55/70] fuse: fix page dereference after free Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 56/70] bpf: Fix comment for helper bpf_current_task_under_cgroup() Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 57/70] evm: Check size of security.evm before using it Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 58/70] p54: avoid accessing the data mapped to streaming DMA Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 59/70] cxl: Rework error message for incompatible slots Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 60/70] RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel() Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 61/70] mtd: lpddr: Fix bad logic in print_drs_error Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 62/70] drm/i915/gem: Serialise debugfs i915_gem_objects with ctx->mutex Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 63/70] serial: qcom_geni_serial: To correct QUP Version detection logic Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 64/70] serial: pl011: Fix lockdep splat when handling magic-sysrq interrupt Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 65/70] PM: runtime: Fix timer_expires data type on 32-bit arches Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 66/70] ata: sata_rcar: Fix DMA boundary mask Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 67/70] xen/gntdev.c: Mark pages as dirty Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 68/70] openrisc: Fix issue with get_user for 64-bit values Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 69/70] misc: rtsx: do not setting OC_POWER_DOWN reg in rtsx_pci_init_ocp() Greg Kroah-Hartman
2020-10-31 11:36 ` [PATCH 5.8 70/70] phy: marvell: comphy: Convert internal SMCC firmware return codes to errno Greg Kroah-Hartman
2020-10-31 20:08 ` [PATCH 5.8 00/70] 5.8.18-rc1 review Guenter Roeck
2020-11-01  7:19 ` Naresh Kamboju
2020-11-02  9:53 ` Jon Hunter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.