All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 5.4 00/34] 5.4.197-rc1 review
@ 2022-06-03 17:42 Greg Kroah-Hartman
  2022-06-03 17:42 ` [PATCH 5.4 01/34] lockdown: also lock down previous kgdb use Greg Kroah-Hartman
                   ` (37 more replies)
  0 siblings, 38 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, torvalds, akpm, linux, shuah,
	patches, lkft-triage, pavel, jonathanh, f.fainelli,
	sudipm.mukherjee, slade

This is the start of the stable review cycle for the 5.4.197 release.
There are 34 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.197-rc1.gz
or in the git tree and branch at:
	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Linux 5.4.197-rc1

Liu Jian <liujian56@huawei.com>
    bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes

Chuck Lever <chuck.lever@oracle.com>
    NFSD: Fix possible sleep during nfsd4_release_lockowner()

Trond Myklebust <trond.myklebust@hammerspace.com>
    NFS: Memory allocation failures are not server fatal errors

Akira Yokosawa <akiyks@gmail.com>
    docs: submitting-patches: Fix crossref to 'The canonical patch format'

Xiu Jianfeng <xiujianfeng@huawei.com>
    tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe()

Stefan Mahnke-Hartmann <stefan.mahnke-hartmann@infineon.com>
    tpm: Fix buffer access in tpm2_get_tpm_pt()

Marek Maślanka <mm@semihalf.com>
    HID: multitouch: Add support for Google Whiskers Touchpad

Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
    raid5: introduce MD_BROKEN

Sarthak Kukreti <sarthakkukreti@google.com>
    dm verity: set DM_TARGET_IMMUTABLE feature flag

Mikulas Patocka <mpatocka@redhat.com>
    dm stats: add cond_resched when looping over entries

Mikulas Patocka <mpatocka@redhat.com>
    dm crypt: make printing of the key constant-time

Dan Carpenter <dan.carpenter@oracle.com>
    dm integrity: fix error code in dm_integrity_ctr()

Sultan Alsawaf <sultan@kerneltoast.com>
    zsmalloc: fix races between asynchronous zspage free and page migration

Vitaly Chikunov <vt@altlinux.org>
    crypto: ecrdsa - Fix incorrect use of vli_cmp

Florian Westphal <fw@strlen.de>
    netfilter: conntrack: re-fetch conntrack after insertion

Kees Cook <keescook@chromium.org>
    exec: Force single empty string when argv is empty

Gustavo A. R. Silva <gustavoars@kernel.org>
    drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency()

Miri Korenblit <miriam.rachel.korenblit@intel.com>
    cfg80211: set custom regdomain after wiphy registration

Stephen Brennan <stephen.s.brennan@oracle.com>
    assoc_array: Fix BUG_ON during garbage collect

Piyush Malgujar <pmalgujar@marvell.com>
    drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers

Mika Westerberg <mika.westerberg@linux.intel.com>
    i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging

Joel Stanley <joel@jms.id.au>
    net: ftgmac100: Disable hardware checksum on AST2600

Thomas Bartschies <thomas.bartschies@cvk.de>
    net: af_key: check encryption module availability consistency

IotaHydrae <writeforever@foxmail.com>
    pinctrl: sunxi: fix f1c100s uart2 function

Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
    ACPI: sysfs: Fix BERT error region memory mapping

Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    ACPI: sysfs: Make sparse happy about address space in use

Hans Verkuil <hverkuil-cisco@xs4all.nl>
    media: vim2m: initialize the media device earlier

Sakari Ailus <sakari.ailus@linux.intel.com>
    media: vim2m: Register video device after setting up internals

Willy Tarreau <w@1wt.eu>
    secure_seq: use the 64 bits of the siphash for port offset calculation

Eric Dumazet <edumazet@google.com>
    tcp: change source port randomizarion at connect() time

Dmitry Mastykin <dmastykin@astralinux.ru>
    Input: goodix - fix spurious key release events

Denis Efremov (Oracle) <efremov@linux.com>
    staging: rtl8723bs: prevent ->Ssid overflow in rtw_wx_set_scan()

Thomas Gleixner <tglx@linutronix.de>
    x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests

Daniel Thompson <daniel.thompson@linaro.org>
    lockdown: also lock down previous kgdb use


-------------

Diffstat:

 Documentation/process/submitting-patches.rst   |  2 +-
 Makefile                                       |  4 +-
 arch/x86/pci/xen.c                             |  5 +++
 crypto/ecrdsa.c                                |  8 ++--
 drivers/acpi/sysfs.c                           | 23 +++++++---
 drivers/char/tpm/tpm2-cmd.c                    | 11 ++++-
 drivers/char/tpm/tpm_ibmvtpm.c                 |  1 +
 drivers/gpu/drm/i915/intel_pm.c                |  2 +-
 drivers/hid/hid-multitouch.c                   |  3 ++
 drivers/i2c/busses/i2c-ismt.c                  | 14 ++++++
 drivers/i2c/busses/i2c-thunderx-pcidrv.c       |  1 +
 drivers/input/touchscreen/goodix.c             |  2 +-
 drivers/md/dm-crypt.c                          | 14 ++++--
 drivers/md/dm-integrity.c                      |  2 -
 drivers/md/dm-stats.c                          |  8 ++++
 drivers/md/dm-verity-target.c                  |  1 +
 drivers/md/raid5.c                             | 47 +++++++++----------
 drivers/media/platform/vim2m.c                 | 22 +++++----
 drivers/net/ethernet/faraday/ftgmac100.c       |  5 +++
 drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c  |  2 +-
 drivers/staging/rtl8723bs/os_dep/ioctl_linux.c |  6 ++-
 fs/exec.c                                      | 25 ++++++++++-
 fs/nfs/internal.h                              |  1 +
 fs/nfsd/nfs4state.c                            | 12 ++---
 include/linux/security.h                       |  2 +
 include/net/inet_hashtables.h                  |  2 +-
 include/net/netfilter/nf_conntrack_core.h      |  7 ++-
 include/net/secure_seq.h                       |  4 +-
 kernel/debug/debug_core.c                      | 24 ++++++++++
 kernel/debug/kdb/kdb_main.c                    | 62 ++++++++++++++++++++++++--
 lib/assoc_array.c                              |  8 ++++
 mm/zsmalloc.c                                  | 37 +++++++++++++--
 net/core/filter.c                              |  4 +-
 net/core/secure_seq.c                          |  4 +-
 net/ipv4/inet_hashtables.c                     | 28 +++++++++---
 net/ipv6/inet6_hashtables.c                    |  4 +-
 net/key/af_key.c                               |  6 +--
 net/wireless/core.c                            |  8 ++--
 net/wireless/reg.c                             |  1 +
 security/lockdown/lockdown.c                   |  2 +
 40 files changed, 327 insertions(+), 97 deletions(-)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 01/34] lockdown: also lock down previous kgdb use
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
@ 2022-06-03 17:42 ` Greg Kroah-Hartman
  2022-06-03 17:42 ` [PATCH 5.4 02/34] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests Greg Kroah-Hartman
                   ` (36 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Stephen Brennan, Douglas Anderson,
	Daniel Thompson, Linus Torvalds

From: Daniel Thompson <daniel.thompson@linaro.org>

commit eadb2f47a3ced5c64b23b90fd2a3463f63726066 upstream.

KGDB and KDB allow read and write access to kernel memory, and thus
should be restricted during lockdown.  An attacker with access to a
serial port (for example, via a hypervisor console, which some cloud
vendors provide over the network) could trigger the debugger so it is
important that the debugger respect the lockdown mode when/if it is
triggered.

Fix this by integrating lockdown into kdb's existing permissions
mechanism.  Unfortunately kgdb does not have any permissions mechanism
(although it certainly could be added later) so, for now, kgdb is simply
and brutally disabled by immediately exiting the gdb stub without taking
any action.

For lockdowns established early in the boot (e.g. the normal case) then
this should be fine but on systems where kgdb has set breakpoints before
the lockdown is enacted than "bad things" will happen.

CVE: CVE-2022-21499
Co-developed-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/security.h     |    2 +
 kernel/debug/debug_core.c    |   24 ++++++++++++++++
 kernel/debug/kdb/kdb_main.c  |   62 ++++++++++++++++++++++++++++++++++++++++---
 security/lockdown/lockdown.c |    2 +
 4 files changed, 87 insertions(+), 3 deletions(-)

--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -118,10 +118,12 @@ enum lockdown_reason {
 	LOCKDOWN_MMIOTRACE,
 	LOCKDOWN_DEBUGFS,
 	LOCKDOWN_XMON_WR,
+	LOCKDOWN_DBG_WRITE_KERNEL,
 	LOCKDOWN_INTEGRITY_MAX,
 	LOCKDOWN_KCORE,
 	LOCKDOWN_KPROBES,
 	LOCKDOWN_BPF_READ,
+	LOCKDOWN_DBG_READ_KERNEL,
 	LOCKDOWN_PERF,
 	LOCKDOWN_TRACEFS,
 	LOCKDOWN_XMON_RW,
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -56,6 +56,7 @@
 #include <linux/vmacache.h>
 #include <linux/rcupdate.h>
 #include <linux/irq.h>
+#include <linux/security.h>
 
 #include <asm/cacheflush.h>
 #include <asm/byteorder.h>
@@ -685,6 +686,29 @@ cpu_master_loop:
 				continue;
 			kgdb_connected = 0;
 		} else {
+			/*
+			 * This is a brutal way to interfere with the debugger
+			 * and prevent gdb being used to poke at kernel memory.
+			 * This could cause trouble if lockdown is applied when
+			 * there is already an active gdb session. For now the
+			 * answer is simply "don't do that". Typically lockdown
+			 * *will* be applied before the debug core gets started
+			 * so only developers using kgdb for fairly advanced
+			 * early kernel debug can be biten by this. Hopefully
+			 * they are sophisticated enough to take care of
+			 * themselves, especially with help from the lockdown
+			 * message printed on the console!
+			 */
+			if (security_locked_down(LOCKDOWN_DBG_WRITE_KERNEL)) {
+				if (IS_ENABLED(CONFIG_KGDB_KDB)) {
+					/* Switch back to kdb if possible... */
+					dbg_kdb_mode = 1;
+					continue;
+				} else {
+					/* ... otherwise just bail */
+					break;
+				}
+			}
 			error = gdb_serial_stub(ks);
 		}
 
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -45,6 +45,7 @@
 #include <linux/proc_fs.h>
 #include <linux/uaccess.h>
 #include <linux/slab.h>
+#include <linux/security.h>
 #include "kdb_private.h"
 
 #undef	MODULE_PARAM_PREFIX
@@ -198,10 +199,62 @@ struct task_struct *kdb_curr_task(int cp
 }
 
 /*
- * Check whether the flags of the current command and the permissions
- * of the kdb console has allow a command to be run.
+ * Update the permissions flags (kdb_cmd_enabled) to match the
+ * current lockdown state.
+ *
+ * Within this function the calls to security_locked_down() are "lazy". We
+ * avoid calling them if the current value of kdb_cmd_enabled already excludes
+ * flags that might be subject to lockdown. Additionally we deliberately check
+ * the lockdown flags independently (even though read lockdown implies write
+ * lockdown) since that results in both simpler code and clearer messages to
+ * the user on first-time debugger entry.
+ *
+ * The permission masks during a read+write lockdown permits the following
+ * flags: INSPECT, SIGNAL, REBOOT (and ALWAYS_SAFE).
+ *
+ * The INSPECT commands are not blocked during lockdown because they are
+ * not arbitrary memory reads. INSPECT covers the backtrace family (sometimes
+ * forcing them to have no arguments) and lsmod. These commands do expose
+ * some kernel state but do not allow the developer seated at the console to
+ * choose what state is reported. SIGNAL and REBOOT should not be controversial,
+ * given these are allowed for root during lockdown already.
+ */
+static void kdb_check_for_lockdown(void)
+{
+	const int write_flags = KDB_ENABLE_MEM_WRITE |
+				KDB_ENABLE_REG_WRITE |
+				KDB_ENABLE_FLOW_CTRL;
+	const int read_flags = KDB_ENABLE_MEM_READ |
+			       KDB_ENABLE_REG_READ;
+
+	bool need_to_lockdown_write = false;
+	bool need_to_lockdown_read = false;
+
+	if (kdb_cmd_enabled & (KDB_ENABLE_ALL | write_flags))
+		need_to_lockdown_write =
+			security_locked_down(LOCKDOWN_DBG_WRITE_KERNEL);
+
+	if (kdb_cmd_enabled & (KDB_ENABLE_ALL | read_flags))
+		need_to_lockdown_read =
+			security_locked_down(LOCKDOWN_DBG_READ_KERNEL);
+
+	/* De-compose KDB_ENABLE_ALL if required */
+	if (need_to_lockdown_write || need_to_lockdown_read)
+		if (kdb_cmd_enabled & KDB_ENABLE_ALL)
+			kdb_cmd_enabled = KDB_ENABLE_MASK & ~KDB_ENABLE_ALL;
+
+	if (need_to_lockdown_write)
+		kdb_cmd_enabled &= ~write_flags;
+
+	if (need_to_lockdown_read)
+		kdb_cmd_enabled &= ~read_flags;
+}
+
+/*
+ * Check whether the flags of the current command, the permissions of the kdb
+ * console and the lockdown state allow a command to be run.
  */
-static inline bool kdb_check_flags(kdb_cmdflags_t flags, int permissions,
+static bool kdb_check_flags(kdb_cmdflags_t flags, int permissions,
 				   bool no_args)
 {
 	/* permissions comes from userspace so needs massaging slightly */
@@ -1188,6 +1241,9 @@ static int kdb_local(kdb_reason_t reason
 		kdb_curr_task(raw_smp_processor_id());
 
 	KDB_DEBUG_STATE("kdb_local 1", reason);
+
+	kdb_check_for_lockdown();
+
 	kdb_go_count = 0;
 	if (reason == KDB_REASON_DEBUG) {
 		/* special case below */
--- a/security/lockdown/lockdown.c
+++ b/security/lockdown/lockdown.c
@@ -33,10 +33,12 @@ static const char *const lockdown_reason
 	[LOCKDOWN_MMIOTRACE] = "unsafe mmio",
 	[LOCKDOWN_DEBUGFS] = "debugfs access",
 	[LOCKDOWN_XMON_WR] = "xmon write access",
+	[LOCKDOWN_DBG_WRITE_KERNEL] = "use of kgdb/kdb to write kernel RAM",
 	[LOCKDOWN_INTEGRITY_MAX] = "integrity",
 	[LOCKDOWN_KCORE] = "/proc/kcore access",
 	[LOCKDOWN_KPROBES] = "use of kprobes",
 	[LOCKDOWN_BPF_READ] = "use of bpf to read kernel RAM",
+	[LOCKDOWN_DBG_READ_KERNEL] = "use of kgdb/kdb to read kernel RAM",
 	[LOCKDOWN_PERF] = "unsafe use of perf",
 	[LOCKDOWN_TRACEFS] = "use of tracefs",
 	[LOCKDOWN_XMON_RW] = "xmon read and write access",



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 02/34] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
  2022-06-03 17:42 ` [PATCH 5.4 01/34] lockdown: also lock down previous kgdb use Greg Kroah-Hartman
@ 2022-06-03 17:42 ` Greg Kroah-Hartman
  2022-06-03 17:42 ` [PATCH 5.4 03/34] staging: rtl8723bs: prevent ->Ssid overflow in rtw_wx_set_scan() Greg Kroah-Hartman
                   ` (35 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jeremi Piotrowski, Dusty Mabe,
	Salvatore Bonaccorso, Thomas Gleixner, Noah Meyerhans,
	Noah Meyerhans

From: Thomas Gleixner <tglx@linutronix.de>

commit 7e0815b3e09986d2fe651199363e135b9358132a upstream.

When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then
PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to
XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI
layer.

This can lead to a situation where the PCI/MSI layer masks an MSI[-X]
interrupt and the hypervisor grants the write despite the fact that it
already requested the interrupt. As a consequence interrupt delivery on the
affected device is not happening ever.

Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests
already.

Fixes: 809f9267bbab ("xen: map MSIs into pirqs")
Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Reported-by: Dusty Mabe <dustymabe@redhat.com>
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Noah Meyerhans <noahm@debian.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuaduxj5.ffs@tglx
[nmeyerha@amazon.com: backported to 5.4]
Signed-off-by: Noah Meyerhans <nmeyerha@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/pci/xen.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -442,6 +442,11 @@ void __init xen_msi_init(void)
 
 	x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs;
 	x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+	/*
+	 * With XEN PIRQ/Eventchannels in use PCI/MSI[-X] masking is solely
+	 * controlled by the hypervisor.
+	 */
+	pci_msi_ignore_mask = 1;
 }
 #endif
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 03/34] staging: rtl8723bs: prevent ->Ssid overflow in rtw_wx_set_scan()
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
  2022-06-03 17:42 ` [PATCH 5.4 01/34] lockdown: also lock down previous kgdb use Greg Kroah-Hartman
  2022-06-03 17:42 ` [PATCH 5.4 02/34] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests Greg Kroah-Hartman
@ 2022-06-03 17:42 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 04/34] Input: goodix - fix spurious key release events Greg Kroah-Hartman
                   ` (34 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Denis Efremov (Oracle)

From: "Denis Efremov (Oracle)" <efremov@linux.com>

This code has a check to prevent read overflow but it needs another
check to prevent writing beyond the end of the ->Ssid[] array.

Fixes: 554c0a3abf21 ("staging: Add rtl8723bs sdio wifi driver")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Denis Efremov (Oracle) <efremov@linux.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/staging/rtl8723bs/os_dep/ioctl_linux.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/staging/rtl8723bs/os_dep/ioctl_linux.c
+++ b/drivers/staging/rtl8723bs/os_dep/ioctl_linux.c
@@ -1351,9 +1351,11 @@ static int rtw_wx_set_scan(struct net_de
 
 					sec_len = *(pos++); len-= 1;
 
-					if (sec_len>0 && sec_len<=len) {
+					if (sec_len > 0 &&
+					    sec_len <= len &&
+					    sec_len <= 32) {
 						ssid[ssid_index].SsidLength = sec_len;
-						memcpy(ssid[ssid_index].Ssid, pos, ssid[ssid_index].SsidLength);
+						memcpy(ssid[ssid_index].Ssid, pos, sec_len);
 						/* DBG_871X("%s COMBO_SCAN with specific ssid:%s, %d\n", __func__ */
 						/* 	, ssid[ssid_index].Ssid, ssid[ssid_index].SsidLength); */
 						ssid_index++;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 04/34] Input: goodix - fix spurious key release events
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (2 preceding siblings ...)
  2022-06-03 17:42 ` [PATCH 5.4 03/34] staging: rtl8723bs: prevent ->Ssid overflow in rtw_wx_set_scan() Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 05/34] tcp: change source port randomizarion at connect() time Greg Kroah-Hartman
                   ` (33 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dmitry Mastykin, Bastien Nocera,
	Dmitry Torokhov, Fabio Estevam

From: Dmitry Mastykin <dmastykin@astralinux.ru>

commit 24ef83f6e31d20fc121a7cd732b04b498475fca3 upstream.

The goodix panel sends spurious interrupts after a 'finger up' event,
which always cause a timeout.
We were exiting the interrupt handler by reporting touch_num == 0, but
this was still processed as valid and caused the code to use the
uninitialised point_data, creating spurious key release events.

Report an error from the interrupt handler so as to avoid processing
invalid point_data further.

Signed-off-by: Dmitry Mastykin <dmastykin@astralinux.ru>
Reviewed-by: Bastien Nocera <hadess@hadess.net>
Link: https://lore.kernel.org/r/20200316075302.3759-2-dmastykin@astralinux.ru
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/input/touchscreen/goodix.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -335,7 +335,7 @@ static int goodix_ts_read_input_report(s
 	 * The Goodix panel will send spurious interrupts after a
 	 * 'finger up' event, which will always cause a timeout.
 	 */
-	return 0;
+	return -ENOMSG;
 }
 
 static void goodix_ts_report_touch_8b(struct goodix_ts_data *ts, u8 *coor_data)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 05/34] tcp: change source port randomizarion at connect() time
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (3 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 04/34] Input: goodix - fix spurious key release events Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 06/34] secure_seq: use the 64 bits of the siphash for port offset calculation Greg Kroah-Hartman
                   ` (32 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David Dworken,
	Willem de Bruijn, David S. Miller, Stefan Ghinea

From: Eric Dumazet <edumazet@google.com>

commit 190cc82489f46f9d88e73c81a47e14f80a791e1a upstream.

RFC 6056 (Recommendations for Transport-Protocol Port Randomization)
provides good summary of why source selection needs extra care.

David Dworken reminded us that linux implements Algorithm 3
as described in RFC 6056 3.3.3

Quoting David :
   In the context of the web, this creates an interesting info leak where
   websites can count how many TCP connections a user's computer is
   establishing over time. For example, this allows a website to count
   exactly how many subresources a third party website loaded.
   This also allows:
   - Distinguishing between different users behind a VPN based on
       distinct source port ranges.
   - Tracking users over time across multiple networks.
   - Covert communication channels between different browsers/browser
       profiles running on the same computer
   - Tracking what applications are running on a computer based on
       the pattern of how fast source ports are getting incremented.

Section 3.3.4 describes an enhancement, that reduces
attackers ability to use the basic information currently
stored into the shared 'u32 hint'.

This change also decreases collision rate when
multiple applications need to connect() to
different destinations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: David Dworken <ddworken@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stefan Ghinea <stefan.ghinea@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/inet_hashtables.c |   20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -671,6 +671,17 @@ unlock:
 }
 EXPORT_SYMBOL_GPL(inet_unhash);
 
+/* RFC 6056 3.3.4.  Algorithm 4: Double-Hash Port Selection Algorithm
+ * Note that we use 32bit integers (vs RFC 'short integers')
+ * because 2^16 is not a multiple of num_ephemeral and this
+ * property might be used by clever attacker.
+ * RFC claims using TABLE_LENGTH=10 buckets gives an improvement,
+ * we use 256 instead to really give more isolation and
+ * privacy, this only consumes 1 KB of kernel memory.
+ */
+#define INET_TABLE_PERTURB_SHIFT 8
+static u32 table_perturb[1 << INET_TABLE_PERTURB_SHIFT];
+
 int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 		struct sock *sk, u32 port_offset,
 		int (*check_established)(struct inet_timewait_death_row *,
@@ -684,8 +695,8 @@ int __inet_hash_connect(struct inet_time
 	struct inet_bind_bucket *tb;
 	u32 remaining, offset;
 	int ret, i, low, high;
-	static u32 hint;
 	int l3mdev;
+	u32 index;
 
 	if (port) {
 		head = &hinfo->bhash[inet_bhashfn(net, port,
@@ -712,7 +723,10 @@ int __inet_hash_connect(struct inet_time
 	if (likely(remaining > 1))
 		remaining &= ~1U;
 
-	offset = (hint + port_offset) % remaining;
+	net_get_random_once(table_perturb, sizeof(table_perturb));
+	index = hash_32(port_offset, INET_TABLE_PERTURB_SHIFT);
+
+	offset = (READ_ONCE(table_perturb[index]) + port_offset) % remaining;
 	/* In first pass we try ports of @low parity.
 	 * inet_csk_get_port() does the opposite choice.
 	 */
@@ -766,7 +780,7 @@ next_port:
 	return -EADDRNOTAVAIL;
 
 ok:
-	hint += i + 2;
+	WRITE_ONCE(table_perturb[index], READ_ONCE(table_perturb[index]) + i + 2);
 
 	/* Head lock still held and bh's disabled */
 	inet_bind_hash(sk, tb, port);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 06/34] secure_seq: use the 64 bits of the siphash for port offset calculation
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (4 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 05/34] tcp: change source port randomizarion at connect() time Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 07/34] media: vim2m: Register video device after setting up internals Greg Kroah-Hartman
                   ` (31 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jason A. Donenfeld, Moshe Kol,
	Yossi Gilad, Amit Klein, Eric Dumazet, Willy Tarreau,
	Jakub Kicinski, Stefan Ghinea

From: Willy Tarreau <w@1wt.eu>

commit b2d057560b8107c633b39aabe517ff9d93f285e3 upstream.

SipHash replaced MD5 in secure_ipv{4,6}_port_ephemeral() via commit
7cd23e5300c1 ("secure_seq: use SipHash in place of MD5"), but the output
remained truncated to 32-bit only. In order to exploit more bits from the
hash, let's make the functions return the full 64-bit of siphash_3u32().
We also make sure the port offset calculation in __inet_hash_connect()
remains done on 32-bit to avoid the need for div_u64_rem() and an extra
cost on 32-bit systems.

Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Moshe Kol <moshe.kol@mail.huji.ac.il>
Cc: Yossi Gilad <yossi.gilad@mail.huji.ac.il>
Cc: Amit Klein <aksecurity@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[SG: Adjusted context]
Signed-off-by: Stefan Ghinea <stefan.ghinea@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/inet_hashtables.h |    2 +-
 include/net/secure_seq.h      |    4 ++--
 net/core/secure_seq.c         |    4 ++--
 net/ipv4/inet_hashtables.c    |   10 ++++++----
 net/ipv6/inet6_hashtables.c   |    4 ++--
 5 files changed, 13 insertions(+), 11 deletions(-)

--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -420,7 +420,7 @@ static inline void sk_rcv_saddr_set(stru
 }
 
 int __inet_hash_connect(struct inet_timewait_death_row *death_row,
-			struct sock *sk, u32 port_offset,
+			struct sock *sk, u64 port_offset,
 			int (*check_established)(struct inet_timewait_death_row *,
 						 struct sock *, __u16,
 						 struct inet_timewait_sock **));
--- a/include/net/secure_seq.h
+++ b/include/net/secure_seq.h
@@ -4,8 +4,8 @@
 
 #include <linux/types.h>
 
-u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
-u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
+u64 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
+u64 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport);
 u32 secure_tcp_seq(__be32 saddr, __be32 daddr,
 		   __be16 sport, __be16 dport);
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -97,7 +97,7 @@ u32 secure_tcpv6_seq(const __be32 *saddr
 }
 EXPORT_SYMBOL(secure_tcpv6_seq);
 
-u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
+u64 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport)
 {
 	const struct {
@@ -147,7 +147,7 @@ u32 secure_tcp_seq(__be32 saddr, __be32
 }
 EXPORT_SYMBOL_GPL(secure_tcp_seq);
 
-u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
+u64 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
 	net_secret_init();
 	return siphash_4u32((__force u32)saddr, (__force u32)daddr,
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -464,7 +464,7 @@ not_unique:
 	return -EADDRNOTAVAIL;
 }
 
-static u32 inet_sk_port_offset(const struct sock *sk)
+static u64 inet_sk_port_offset(const struct sock *sk)
 {
 	const struct inet_sock *inet = inet_sk(sk);
 
@@ -683,7 +683,7 @@ EXPORT_SYMBOL_GPL(inet_unhash);
 static u32 table_perturb[1 << INET_TABLE_PERTURB_SHIFT];
 
 int __inet_hash_connect(struct inet_timewait_death_row *death_row,
-		struct sock *sk, u32 port_offset,
+		struct sock *sk, u64 port_offset,
 		int (*check_established)(struct inet_timewait_death_row *,
 			struct sock *, __u16, struct inet_timewait_sock **))
 {
@@ -726,7 +726,9 @@ int __inet_hash_connect(struct inet_time
 	net_get_random_once(table_perturb, sizeof(table_perturb));
 	index = hash_32(port_offset, INET_TABLE_PERTURB_SHIFT);
 
-	offset = (READ_ONCE(table_perturb[index]) + port_offset) % remaining;
+	offset = READ_ONCE(table_perturb[index]) + port_offset;
+	offset %= remaining;
+
 	/* In first pass we try ports of @low parity.
 	 * inet_csk_get_port() does the opposite choice.
 	 */
@@ -803,7 +805,7 @@ ok:
 int inet_hash_connect(struct inet_timewait_death_row *death_row,
 		      struct sock *sk)
 {
-	u32 port_offset = 0;
+	u64 port_offset = 0;
 
 	if (!inet_sk(sk)->inet_num)
 		port_offset = inet_sk_port_offset(sk);
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -262,7 +262,7 @@ not_unique:
 	return -EADDRNOTAVAIL;
 }
 
-static u32 inet6_sk_port_offset(const struct sock *sk)
+static u64 inet6_sk_port_offset(const struct sock *sk)
 {
 	const struct inet_sock *inet = inet_sk(sk);
 
@@ -274,7 +274,7 @@ static u32 inet6_sk_port_offset(const st
 int inet6_hash_connect(struct inet_timewait_death_row *death_row,
 		       struct sock *sk)
 {
-	u32 port_offset = 0;
+	u64 port_offset = 0;
 
 	if (!inet_sk(sk)->inet_num)
 		port_offset = inet6_sk_port_offset(sk);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 07/34] media: vim2m: Register video device after setting up internals
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (5 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 06/34] secure_seq: use the 64 bits of the siphash for port offset calculation Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 08/34] media: vim2m: initialize the media device earlier Greg Kroah-Hartman
                   ` (30 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sakari Ailus, Hans Verkuil,
	Mauro Carvalho Chehab, Mark-PK Tsai

From: Sakari Ailus <sakari.ailus@linux.intel.com>

commit cf7f34777a5b4100a3a44ff95f3d949c62892bdd upstream.

Prevent NULL (or close to NULL) pointer dereference in various places by
registering the video device only when the V4L2 m2m framework has been set
up.

Fixes: commit 96d8eab5d0a1 ("V4L/DVB: [v5,2/2] v4l: Add a mem-to-mem videobuf framework test device")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/media/platform/vim2m.c |   20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

--- a/drivers/media/platform/vim2m.c
+++ b/drivers/media/platform/vim2m.c
@@ -1333,12 +1333,6 @@ static int vim2m_probe(struct platform_d
 	vfd->lock = &dev->dev_mutex;
 	vfd->v4l2_dev = &dev->v4l2_dev;
 
-	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
-	if (ret) {
-		v4l2_err(&dev->v4l2_dev, "Failed to register video device\n");
-		goto error_v4l2;
-	}
-
 	video_set_drvdata(vfd, dev);
 	v4l2_info(&dev->v4l2_dev,
 		  "Device registered as /dev/video%d\n", vfd->num);
@@ -1353,6 +1347,12 @@ static int vim2m_probe(struct platform_d
 		goto error_dev;
 	}
 
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		v4l2_err(&dev->v4l2_dev, "Failed to register video device\n");
+		goto error_m2m;
+	}
+
 #ifdef CONFIG_MEDIA_CONTROLLER
 	dev->mdev.dev = &pdev->dev;
 	strscpy(dev->mdev.model, "vim2m", sizeof(dev->mdev.model));
@@ -1366,7 +1366,7 @@ static int vim2m_probe(struct platform_d
 						 MEDIA_ENT_F_PROC_VIDEO_SCALER);
 	if (ret) {
 		v4l2_err(&dev->v4l2_dev, "Failed to init mem2mem media controller\n");
-		goto error_dev;
+		goto error_v4l2;
 	}
 
 	ret = media_device_register(&dev->mdev);
@@ -1381,11 +1381,13 @@ static int vim2m_probe(struct platform_d
 error_m2m_mc:
 	v4l2_m2m_unregister_media_controller(dev->m2m_dev);
 #endif
-error_dev:
+error_v4l2:
 	video_unregister_device(&dev->vfd);
 	/* vim2m_device_release called by video_unregister_device to release various objects */
 	return ret;
-error_v4l2:
+error_m2m:
+	v4l2_m2m_release(dev->m2m_dev);
+error_dev:
 	v4l2_device_unregister(&dev->v4l2_dev);
 error_free:
 	kfree(dev);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 08/34] media: vim2m: initialize the media device earlier
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (6 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 07/34] media: vim2m: Register video device after setting up internals Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 09/34] ACPI: sysfs: Make sparse happy about address space in use Greg Kroah-Hartman
                   ` (29 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Hans Verkuil, Mauro Carvalho Chehab,
	Mark-PK Tsai

From: Hans Verkuil <hverkuil-cisco@xs4all.nl>

commit 1a28dce222a6ece725689ad58c0cf4a1b48894f4 upstream.

Before the video device node is registered, the v4l2_dev.mdev
pointer must be set in order to correctly associate the video
device with the media device. Move the initialization of the
media device up.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/media/platform/vim2m.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

--- a/drivers/media/platform/vim2m.c
+++ b/drivers/media/platform/vim2m.c
@@ -1347,12 +1347,6 @@ static int vim2m_probe(struct platform_d
 		goto error_dev;
 	}
 
-	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
-	if (ret) {
-		v4l2_err(&dev->v4l2_dev, "Failed to register video device\n");
-		goto error_m2m;
-	}
-
 #ifdef CONFIG_MEDIA_CONTROLLER
 	dev->mdev.dev = &pdev->dev;
 	strscpy(dev->mdev.model, "vim2m", sizeof(dev->mdev.model));
@@ -1361,7 +1355,15 @@ static int vim2m_probe(struct platform_d
 	media_device_init(&dev->mdev);
 	dev->mdev.ops = &m2m_media_ops;
 	dev->v4l2_dev.mdev = &dev->mdev;
+#endif
 
+	ret = video_register_device(vfd, VFL_TYPE_GRABBER, 0);
+	if (ret) {
+		v4l2_err(&dev->v4l2_dev, "Failed to register video device\n");
+		goto error_m2m;
+	}
+
+#ifdef CONFIG_MEDIA_CONTROLLER
 	ret = v4l2_m2m_register_media_controller(dev->m2m_dev, vfd,
 						 MEDIA_ENT_F_PROC_VIDEO_SCALER);
 	if (ret) {



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 09/34] ACPI: sysfs: Make sparse happy about address space in use
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (7 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 08/34] media: vim2m: initialize the media device earlier Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 10/34] ACPI: sysfs: Fix BERT error region memory mapping Greg Kroah-Hartman
                   ` (28 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Andy Shevchenko, Rafael J. Wysocki,
	dann frazier

From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

commit bdd56d7d8931e842775d2e5b93d426a8d1940e33 upstream.

Sparse is not happy about address space in use in acpi_data_show():

drivers/acpi/sysfs.c:428:14: warning: incorrect type in assignment (different address spaces)
drivers/acpi/sysfs.c:428:14:    expected void [noderef] __iomem *base
drivers/acpi/sysfs.c:428:14:    got void *
drivers/acpi/sysfs.c:431:59: warning: incorrect type in argument 4 (different address spaces)
drivers/acpi/sysfs.c:431:59:    expected void const *from
drivers/acpi/sysfs.c:431:59:    got void [noderef] __iomem *base
drivers/acpi/sysfs.c:433:30: warning: incorrect type in argument 1 (different address spaces)
drivers/acpi/sysfs.c:433:30:    expected void *logical_address
drivers/acpi/sysfs.c:433:30:    got void [noderef] __iomem *base

Indeed, acpi_os_map_memory() returns a void pointer with dropped specific
address space. Hence, we don't need to carry out __iomem in acpi_data_show().

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/acpi/sysfs.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/acpi/sysfs.c
+++ b/drivers/acpi/sysfs.c
@@ -438,7 +438,7 @@ static ssize_t acpi_data_show(struct fil
 			      loff_t offset, size_t count)
 {
 	struct acpi_data_attr *data_attr;
-	void __iomem *base;
+	void *base;
 	ssize_t rc;
 
 	data_attr = container_of(bin_attr, struct acpi_data_attr, attr);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 10/34] ACPI: sysfs: Fix BERT error region memory mapping
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (8 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 09/34] ACPI: sysfs: Make sparse happy about address space in use Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 11/34] pinctrl: sunxi: fix f1c100s uart2 function Greg Kroah-Hartman
                   ` (27 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Lorenzo Pieralisi, Veronika Kabatova,
	Aristeu Rozanski, Ard Biesheuvel, Rafael J. Wysocki,
	dann frazier

From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

commit 1bbc21785b7336619fb6a67f1fff5afdaf229acc upstream.

Currently the sysfs interface maps the BERT error region as "memory"
(through acpi_os_map_memory()) in order to copy the error records into
memory buffers through memory operations (eg memory_read_from_buffer()).

The OS system cannot detect whether the BERT error region is part of
system RAM or it is "device memory" (eg BMC memory) and therefore it
cannot detect which memory attributes the bus to memory support (and
corresponding kernel mapping, unless firmware provides the required
information).

The acpi_os_map_memory() arch backend implementation determines the
mapping attributes. On arm64, if the BERT error region is not present in
the EFI memory map, the error region is mapped as device-nGnRnE; this
triggers alignment faults since memcpy unaligned accesses are not
allowed in device-nGnRnE regions.

The ACPI sysfs code cannot therefore map by default the BERT error
region with memory semantics but should use a safer default.

Change the sysfs code to map the BERT error region as MMIO (through
acpi_os_map_iomem()) and use the memcpy_fromio() interface to read the
error region into the kernel buffer.

Link: https://lore.kernel.org/linux-arm-kernel/31ffe8fc-f5ee-2858-26c5-0fd8bdd68702@arm.com
Link: https://lore.kernel.org/linux-acpi/CAJZ5v0g+OVbhuUUDrLUCfX_mVqY_e8ubgLTU98=jfjTeb4t+Pw@mail.gmail.com
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Veronika Kabatova <vkabatov@redhat.com>
Tested-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/acpi/sysfs.c |   25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

--- a/drivers/acpi/sysfs.c
+++ b/drivers/acpi/sysfs.c
@@ -438,19 +438,30 @@ static ssize_t acpi_data_show(struct fil
 			      loff_t offset, size_t count)
 {
 	struct acpi_data_attr *data_attr;
-	void *base;
-	ssize_t rc;
+	void __iomem *base;
+	ssize_t size;
 
 	data_attr = container_of(bin_attr, struct acpi_data_attr, attr);
+	size = data_attr->attr.size;
 
-	base = acpi_os_map_memory(data_attr->addr, data_attr->attr.size);
+	if (offset < 0)
+		return -EINVAL;
+
+	if (offset >= size)
+		return 0;
+
+	if (count > size - offset)
+		count = size - offset;
+
+	base = acpi_os_map_iomem(data_attr->addr, size);
 	if (!base)
 		return -ENOMEM;
-	rc = memory_read_from_buffer(buf, count, &offset, base,
-				     data_attr->attr.size);
-	acpi_os_unmap_memory(base, data_attr->attr.size);
 
-	return rc;
+	memcpy_fromio(buf, base + offset, count);
+
+	acpi_os_unmap_iomem(base, size);
+
+	return count;
 }
 
 static int acpi_bert_data_init(void *th, struct acpi_data_attr *data_attr)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 11/34] pinctrl: sunxi: fix f1c100s uart2 function
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (9 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 10/34] ACPI: sysfs: Fix BERT error region memory mapping Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 12/34] net: af_key: check encryption module availability consistency Greg Kroah-Hartman
                   ` (26 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, IotaHydrae, Andre Przywara,
	Linus Walleij, Sasha Levin

From: IotaHydrae <writeforever@foxmail.com>

[ Upstream commit fa8785e5931367e2b43f2c507f26bcf3e281c0ca ]

Change suniv f1c100s pinctrl,PD14 multiplexing function lvds1 to uart2

When the pin PD13 and PD14 is setting up to uart2 function in dts,
there's an error occurred:
1c20800.pinctrl: unsupported function uart2 on pin PD14

Because 'uart2' is not any one multiplexing option of PD14,
and pinctrl don't know how to configure it.

So change the pin PD14 lvds1 function to uart2.

Signed-off-by: IotaHydrae <writeforever@foxmail.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/tencent_70C1308DDA794C81CAEF389049055BACEC09@qq.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c b/drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c
index 2801ca706273..68a5b627fb9b 100644
--- a/drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c
+++ b/drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c
@@ -204,7 +204,7 @@ static const struct sunxi_desc_pin suniv_f1c100s_pins[] = {
 		  SUNXI_FUNCTION(0x0, "gpio_in"),
 		  SUNXI_FUNCTION(0x1, "gpio_out"),
 		  SUNXI_FUNCTION(0x2, "lcd"),		/* D20 */
-		  SUNXI_FUNCTION(0x3, "lvds1"),		/* RX */
+		  SUNXI_FUNCTION(0x3, "uart2"),		/* RX */
 		  SUNXI_FUNCTION_IRQ_BANK(0x6, 0, 14)),
 	SUNXI_PIN(SUNXI_PINCTRL_PIN(D, 15),
 		  SUNXI_FUNCTION(0x0, "gpio_in"),
-- 
2.35.1




^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 5.4 12/34] net: af_key: check encryption module availability consistency
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (10 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 11/34] pinctrl: sunxi: fix f1c100s uart2 function Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 13/34] net: ftgmac100: Disable hardware checksum on AST2600 Greg Kroah-Hartman
                   ` (25 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Thomas Bartschies, Steffen Klassert,
	Sasha Levin

From: Thomas Bartschies <thomas.bartschies@cvk.de>

[ Upstream commit 015c44d7bff3f44d569716117becd570c179ca32 ]

Since the recent introduction supporting the SM3 and SM4 hash algos for IPsec, the kernel
produces invalid pfkey acquire messages, when these encryption modules are disabled. This
happens because the availability of the algos wasn't checked in all necessary functions.
This patch adds these checks.

Signed-off-by: Thomas Bartschies <thomas.bartschies@cvk.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/key/af_key.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index f67d3ba72c49..dd064d5eff6e 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -2904,7 +2904,7 @@ static int count_ah_combs(const struct xfrm_tmpl *t)
 			break;
 		if (!aalg->pfkey_supported)
 			continue;
-		if (aalg_tmpl_set(t, aalg))
+		if (aalg_tmpl_set(t, aalg) && aalg->available)
 			sz += sizeof(struct sadb_comb);
 	}
 	return sz + sizeof(struct sadb_prop);
@@ -2922,7 +2922,7 @@ static int count_esp_combs(const struct xfrm_tmpl *t)
 		if (!ealg->pfkey_supported)
 			continue;
 
-		if (!(ealg_tmpl_set(t, ealg)))
+		if (!(ealg_tmpl_set(t, ealg) && ealg->available))
 			continue;
 
 		for (k = 1; ; k++) {
@@ -2933,7 +2933,7 @@ static int count_esp_combs(const struct xfrm_tmpl *t)
 			if (!aalg->pfkey_supported)
 				continue;
 
-			if (aalg_tmpl_set(t, aalg))
+			if (aalg_tmpl_set(t, aalg) && aalg->available)
 				sz += sizeof(struct sadb_comb);
 		}
 	}
-- 
2.35.1




^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 5.4 13/34] net: ftgmac100: Disable hardware checksum on AST2600
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (11 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 12/34] net: af_key: check encryption module availability consistency Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 14/34] i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging Greg Kroah-Hartman
                   ` (24 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, David Wilder, Dylan Hung,
	Joel Stanley, David S. Miller, Sasha Levin

From: Joel Stanley <joel@jms.id.au>

[ Upstream commit 6fd45e79e8b93b8d22fb8fe22c32fbad7e9190bd ]

The AST2600 when using the i210 NIC over NC-SI has been observed to
produce incorrect checksum results with specific MTU values. This was
first observed when sending data across a long distance set of networks.

On a local network, the following test was performed using a 1MB file of
random data.

On the receiver run this script:

 #!/bin/bash
 while [ 1 ]; do
        # Zero the stats
        nstat -r  > /dev/null
        nc -l 9899 > test-file
        # Check for checksum errors
        TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
        if [ -z "$TcpInCsumErrors" ]; then
                echo No TcpInCsumErrors
        else
                echo TcpInCsumErrors = $TcpInCsumErrors
        fi
 done

On an AST2600 system:

 # nc <IP of  receiver host> 9899 < test-file

The test was repeated with various MTU values:

 # ip link set mtu 1410 dev eth0

The observed results:

 1500 - good
 1434 - bad
 1400 - good
 1410 - bad
 1420 - good

The test was repeated after disabling tx checksumming:

 # ethtool -K eth0 tx-checksumming off

And all MTU values tested resulted in transfers without error.

An issue with the driver cannot be ruled out, however there has been no
bug discovered so far.

David has done the work to take the original bug report of slow data
transfer between long distance connections and triaged it down to this
test case.

The vendor suspects this this is a hardware issue when using NC-SI. The
fixes line refers to the patch that introduced AST2600 support.

Reported-by: David Wilder <wilder@us.ibm.com>
Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index 2c06cdcd3e75..d7478d332820 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1880,6 +1880,11 @@ static int ftgmac100_probe(struct platform_device *pdev)
 	/* AST2400  doesn't have working HW checksum generation */
 	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
 		netdev->hw_features &= ~NETIF_F_HW_CSUM;
+
+	/* AST2600 tx checksum with NCSI is broken */
+	if (priv->use_ncsi && of_device_is_compatible(np, "aspeed,ast2600-mac"))
+		netdev->hw_features &= ~NETIF_F_HW_CSUM;
+
 	if (np && of_get_property(np, "no-hw-checksum", NULL))
 		netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM);
 	netdev->features |= netdev->hw_features;
-- 
2.35.1




^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 5.4 14/34] i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (12 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 13/34] net: ftgmac100: Disable hardware checksum on AST2600 Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 15/34] drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers Greg Kroah-Hartman
                   ` (23 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mika Westerberg,
	From: Andy Shevchenko, Wolfram Sang, Sasha Levin

From: Mika Westerberg <mika.westerberg@linux.intel.com>

[ Upstream commit 17a0f3acdc6ec8b89ad40f6e22165a4beee25663 ]

Before sending a MSI the hardware writes information pertinent to the
interrupt cause to a memory location pointed by SMTICL register. This
memory holds three double words where the least significant bit tells
whether the interrupt cause of master/target/error is valid. The driver
does not use this but we need to set it up because otherwise it will
perform DMA write to the default address (0) and this will cause an
IOMMU fault such as below:

  DMAR: DRHD: handling fault status reg 2
  DMAR: [DMA Write] Request device [00:12.0] PASID ffffffff fault addr 0
        [fault reason 05] PTE Write access is not set

To prevent this from happening, provide a proper DMA buffer for this
that then gets mapped by the IOMMU accordingly.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/i2c/busses/i2c-ismt.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/i2c/busses/i2c-ismt.c b/drivers/i2c/busses/i2c-ismt.c
index 2f95e25a10f7..53325419ec13 100644
--- a/drivers/i2c/busses/i2c-ismt.c
+++ b/drivers/i2c/busses/i2c-ismt.c
@@ -81,6 +81,7 @@
 
 #define ISMT_DESC_ENTRIES	2	/* number of descriptor entries */
 #define ISMT_MAX_RETRIES	3	/* number of SMBus retries to attempt */
+#define ISMT_LOG_ENTRIES	3	/* number of interrupt cause log entries */
 
 /* Hardware Descriptor Constants - Control Field */
 #define ISMT_DESC_CWRL	0x01	/* Command/Write Length */
@@ -174,6 +175,8 @@ struct ismt_priv {
 	u8 head;				/* ring buffer head pointer */
 	struct completion cmp;			/* interrupt completion */
 	u8 buffer[I2C_SMBUS_BLOCK_MAX + 16];	/* temp R/W data buffer */
+	dma_addr_t log_dma;
+	u32 *log;
 };
 
 /**
@@ -408,6 +411,9 @@ static int ismt_access(struct i2c_adapter *adap, u16 addr,
 	memset(desc, 0, sizeof(struct ismt_desc));
 	desc->tgtaddr_rw = ISMT_DESC_ADDR_RW(addr, read_write);
 
+	/* Always clear the log entries */
+	memset(priv->log, 0, ISMT_LOG_ENTRIES * sizeof(u32));
+
 	/* Initialize common control bits */
 	if (likely(pci_dev_msi_enabled(priv->pci_dev)))
 		desc->control = ISMT_DESC_INT | ISMT_DESC_FAIR;
@@ -697,6 +703,8 @@ static void ismt_hw_init(struct ismt_priv *priv)
 	/* initialize the Master Descriptor Base Address (MDBA) */
 	writeq(priv->io_rng_dma, priv->smba + ISMT_MSTR_MDBA);
 
+	writeq(priv->log_dma, priv->smba + ISMT_GR_SMTICL);
+
 	/* initialize the Master Control Register (MCTRL) */
 	writel(ISMT_MCTRL_MEIE, priv->smba + ISMT_MSTR_MCTRL);
 
@@ -784,6 +792,12 @@ static int ismt_dev_init(struct ismt_priv *priv)
 	priv->head = 0;
 	init_completion(&priv->cmp);
 
+	priv->log = dmam_alloc_coherent(&priv->pci_dev->dev,
+					ISMT_LOG_ENTRIES * sizeof(u32),
+					&priv->log_dma, GFP_KERNEL);
+	if (!priv->log)
+		return -ENOMEM;
+
 	return 0;
 }
 
-- 
2.35.1




^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 5.4 15/34] drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (13 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 14/34] i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 16/34] assoc_array: Fix BUG_ON during garbage collect Greg Kroah-Hartman
                   ` (22 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Szymon Balcerak, Piyush Malgujar,
	Wolfram Sang, Sasha Levin

From: Piyush Malgujar <pmalgujar@marvell.com>

[ Upstream commit 03a35bc856ddc09f2cc1f4701adecfbf3b464cb3 ]

Due to i2c->adap.dev.fwnode not being set, ACPI_COMPANION() wasn't properly
found for TWSI controllers.

Signed-off-by: Szymon Balcerak <sbalcerak@marvell.com>
Signed-off-by: Piyush Malgujar <pmalgujar@marvell.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/i2c/busses/i2c-thunderx-pcidrv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/i2c/busses/i2c-thunderx-pcidrv.c b/drivers/i2c/busses/i2c-thunderx-pcidrv.c
index 19f8eec38717..107aeb8b54da 100644
--- a/drivers/i2c/busses/i2c-thunderx-pcidrv.c
+++ b/drivers/i2c/busses/i2c-thunderx-pcidrv.c
@@ -208,6 +208,7 @@ static int thunder_i2c_probe_pci(struct pci_dev *pdev,
 	i2c->adap.bus_recovery_info = &octeon_i2c_recovery_info;
 	i2c->adap.dev.parent = dev;
 	i2c->adap.dev.of_node = pdev->dev.of_node;
+	i2c->adap.dev.fwnode = dev->fwnode;
 	snprintf(i2c->adap.name, sizeof(i2c->adap.name),
 		 "Cavium ThunderX i2c adapter at %s", dev_name(dev));
 	i2c_set_adapdata(&i2c->adap, i2c);
-- 
2.35.1




^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 5.4 16/34] assoc_array: Fix BUG_ON during garbage collect
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (14 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 15/34] drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 17/34] cfg80211: set custom regdomain after wiphy registration Greg Kroah-Hartman
                   ` (21 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Stephen Brennan, David Howells,
	Andrew Morton, keyrings, Jarkko Sakkinen, Linus Torvalds

From: Stephen Brennan <stephen.s.brennan@oracle.com>

commit d1dc87763f406d4e67caf16dbe438a5647692395 upstream.

A rare BUG_ON triggered in assoc_array_gc:

    [3430308.818153] kernel BUG at lib/assoc_array.c:1609!

Which corresponded to the statement currently at line 1593 upstream:

    BUG_ON(assoc_array_ptr_is_meta(p));

Using the data from the core dump, I was able to generate a userspace
reproducer[1] and determine the cause of the bug.

[1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc

After running the iterator on the entire branch, an internal tree node
looked like the following:

    NODE (nr_leaves_on_branch: 3)
      SLOT [0] NODE (2 leaves)
      SLOT [1] NODE (1 leaf)
      SLOT [2..f] NODE (empty)

In the userspace reproducer, the pr_devel output when compressing this
node was:

    -- compress node 0x5607cc089380 --
    free=0, leaves=0
    [0] retain node 2/1 [nx 0]
    [1] fold node 1/1 [nx 0]
    [2] fold node 0/1 [nx 2]
    [3] fold node 0/2 [nx 2]
    [4] fold node 0/3 [nx 2]
    [5] fold node 0/4 [nx 2]
    [6] fold node 0/5 [nx 2]
    [7] fold node 0/6 [nx 2]
    [8] fold node 0/7 [nx 2]
    [9] fold node 0/8 [nx 2]
    [10] fold node 0/9 [nx 2]
    [11] fold node 0/10 [nx 2]
    [12] fold node 0/11 [nx 2]
    [13] fold node 0/12 [nx 2]
    [14] fold node 0/13 [nx 2]
    [15] fold node 0/14 [nx 2]
    after: 3

At slot 0, an internal node with 2 leaves could not be folded into the
node, because there was only one available slot (slot 0). Thus, the
internal node was retained. At slot 1, the node had one leaf, and was
able to be folded in successfully. The remaining nodes had no leaves,
and so were removed. By the end of the compression stage, there were 14
free slots, and only 3 leaf nodes. The tree was ascended and then its
parent node was compressed. When this node was seen, it could not be
folded, due to the internal node it contained.

The invariant for compression in this function is: whenever
nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all
leaf nodes. The compression step currently cannot guarantee this, given
the corner case shown above.

To fix this issue, retry compression whenever we have retained a node,
and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second
compression will then allow the node in slot 1 to be folded in,
satisfying the invariant. Below is the output of the reproducer once the
fix is applied:

    -- compress node 0x560e9c562380 --
    free=0, leaves=0
    [0] retain node 2/1 [nx 0]
    [1] fold node 1/1 [nx 0]
    [2] fold node 0/1 [nx 2]
    [3] fold node 0/2 [nx 2]
    [4] fold node 0/3 [nx 2]
    [5] fold node 0/4 [nx 2]
    [6] fold node 0/5 [nx 2]
    [7] fold node 0/6 [nx 2]
    [8] fold node 0/7 [nx 2]
    [9] fold node 0/8 [nx 2]
    [10] fold node 0/9 [nx 2]
    [11] fold node 0/10 [nx 2]
    [12] fold node 0/11 [nx 2]
    [13] fold node 0/12 [nx 2]
    [14] fold node 0/13 [nx 2]
    [15] fold node 0/14 [nx 2]
    internal nodes remain despite enough space, retrying
    -- compress node 0x560e9c562380 --
    free=14, leaves=1
    [0] fold node 2/15 [nx 0]
    after: 3

Changes
=======
DH:
 - Use false instead of 0.
 - Reorder the inserted lines in a couple of places to put retained before
   next_slot.

ver #2)
 - Fix typo in pr_devel, correct comparison to "<="

Fixes: 3cb989501c26 ("Add a generic associative array implementation.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: keyrings@vger.kernel.org
Link: https://lore.kernel.org/r/20220511225517.407935-1-stephen.s.brennan@oracle.com/ # v1
Link: https://lore.kernel.org/r/20220512215045.489140-1-stephen.s.brennan@oracle.com/ # v2
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 lib/assoc_array.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/lib/assoc_array.c
+++ b/lib/assoc_array.c
@@ -1462,6 +1462,7 @@ int assoc_array_gc(struct assoc_array *a
 	struct assoc_array_ptr *cursor, *ptr;
 	struct assoc_array_ptr *new_root, *new_parent, **new_ptr_pp;
 	unsigned long nr_leaves_on_tree;
+	bool retained;
 	int keylen, slot, nr_free, next_slot, i;
 
 	pr_devel("-->%s()\n", __func__);
@@ -1538,6 +1539,7 @@ continue_node:
 		goto descend;
 	}
 
+retry_compress:
 	pr_devel("-- compress node %p --\n", new_n);
 
 	/* Count up the number of empty slots in this node and work out the
@@ -1555,6 +1557,7 @@ continue_node:
 	pr_devel("free=%d, leaves=%lu\n", nr_free, new_n->nr_leaves_on_branch);
 
 	/* See what we can fold in */
+	retained = false;
 	next_slot = 0;
 	for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) {
 		struct assoc_array_shortcut *s;
@@ -1604,9 +1607,14 @@ continue_node:
 			pr_devel("[%d] retain node %lu/%d [nx %d]\n",
 				 slot, child->nr_leaves_on_branch, nr_free + 1,
 				 next_slot);
+			retained = true;
 		}
 	}
 
+	if (retained && new_n->nr_leaves_on_branch <= ASSOC_ARRAY_FAN_OUT) {
+		pr_devel("internal nodes remain despite enough space, retrying\n");
+		goto retry_compress;
+	}
 	pr_devel("after: %lu\n", new_n->nr_leaves_on_branch);
 
 	nr_leaves_on_tree = new_n->nr_leaves_on_branch;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 17/34] cfg80211: set custom regdomain after wiphy registration
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (15 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 16/34] assoc_array: Fix BUG_ON during garbage collect Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 18/34] drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency() Greg Kroah-Hartman
                   ` (20 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Miri Korenblit, Luca Coelho, Johannes Berg

From: Miri Korenblit <miriam.rachel.korenblit@intel.com>

commit 1b7b3ac8ff3317cdcf07a1c413de9bdb68019c2b upstream.

We used to set regulatory info before the registration of
the device and then the regulatory info didn't get set, because
the device isn't registered so there isn't a device to set the
regulatory info for. So set the regulatory info after the device
registration.
Call reg_process_self_managed_hints() once again after the device
registration because it does nothing before it.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210618133832.c96eadcffe80.I86799c2c866b5610b4cf91115c21d8ceb525c5aa@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/wireless/core.c |    8 ++++----
 net/wireless/reg.c  |    1 +
 2 files changed, 5 insertions(+), 4 deletions(-)

--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -5,7 +5,7 @@
  * Copyright 2006-2010		Johannes Berg <johannes@sipsolutions.net>
  * Copyright 2013-2014  Intel Mobile Communications GmbH
  * Copyright 2015-2017	Intel Deutschland GmbH
- * Copyright (C) 2018-2019 Intel Corporation
+ * Copyright (C) 2018-2021 Intel Corporation
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -891,9 +891,6 @@ int wiphy_register(struct wiphy *wiphy)
 		return res;
 	}
 
-	/* set up regulatory info */
-	wiphy_regulatory_register(wiphy);
-
 	list_add_rcu(&rdev->list, &cfg80211_rdev_list);
 	cfg80211_rdev_list_generation++;
 
@@ -904,6 +901,9 @@ int wiphy_register(struct wiphy *wiphy)
 	cfg80211_debugfs_rdev_add(rdev);
 	nl80211_notify_wiphy(rdev, NL80211_CMD_NEW_WIPHY);
 
+	/* set up regulatory info */
+	wiphy_regulatory_register(wiphy);
+
 	if (wiphy->regulatory_flags & REGULATORY_CUSTOM_REG) {
 		struct regulatory_request request;
 
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -3790,6 +3790,7 @@ void wiphy_regulatory_register(struct wi
 
 	wiphy_update_regulatory(wiphy, lr->initiator);
 	wiphy_all_share_dfs_chan_state(wiphy);
+	reg_process_self_managed_hints();
 }
 
 void wiphy_regulatory_deregister(struct wiphy *wiphy)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 18/34] drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency()
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (16 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 17/34] cfg80211: set custom regdomain after wiphy registration Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 19/34] exec: Force single empty string when argv is empty Greg Kroah-Hartman
                   ` (19 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Gustavo A. R. Silva

From: Gustavo A. R. Silva <gustavoars@kernel.org>

commit 336feb502a715909a8136eb6a62a83d7268a353b upstream.

Fix the following -Wstringop-overflow warnings when building with GCC-11:

drivers/gpu/drm/i915/intel_pm.c:3106:9: warning: ‘intel_read_wm_latency’ accessing 16 bytes in a region of size 10 [-Wstringop-overflow=]
 3106 |         intel_read_wm_latency(dev_priv, dev_priv->wm.pri_latency);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/intel_pm.c:3106:9: note: referencing argument 2 of type ‘u16 *’ {aka ‘short unsigned int *’}
drivers/gpu/drm/i915/intel_pm.c:2861:13: note: in a call to function ‘intel_read_wm_latency’
 2861 | static void intel_read_wm_latency(struct drm_i915_private *dev_priv,
      |             ^~~~~~~~~~~~~~~~~~~~~

by removing the over-specified array size from the argument declarations.

It seems that this code is actually safe because the size of the
array depends on the hardware generation, and the function checks
for that.

Notice that wm can be an array of 5 elements:
drivers/gpu/drm/i915/intel_pm.c:3109:   intel_read_wm_latency(dev_priv, dev_priv->wm.pri_latency);

or an array of 8 elements:
drivers/gpu/drm/i915/intel_pm.c:3131:   intel_read_wm_latency(dev_priv, dev_priv->wm.skl_latency);

and the compiler legitimately complains about that.

This helps with the ongoing efforts to globally enable
-Wstringop-overflow.

Link: https://github.com/KSPP/linux/issues/181
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/gpu/drm/i915/intel_pm.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -2822,7 +2822,7 @@ hsw_compute_linetime_wm(const struct int
 }
 
 static void intel_read_wm_latency(struct drm_i915_private *dev_priv,
-				  u16 wm[8])
+				  u16 wm[])
 {
 	struct intel_uncore *uncore = &dev_priv->uncore;
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 19/34] exec: Force single empty string when argv is empty
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (17 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 18/34] drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency() Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 20/34] netfilter: conntrack: re-fetch conntrack after insertion Greg Kroah-Hartman
                   ` (18 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Ariadne Conill, Michael Kerrisk,
	Matthew Wilcox, Christian Brauner, Rich Felker, Eric Biederman,
	Alexander Viro, linux-fsdevel, Kees Cook, Andy Lutomirski,
	Vegard Nossum

From: Kees Cook <keescook@chromium.org>

commit dcd46d897adb70d63e025f175a00a89797d31a43 upstream.

Quoting[1] Ariadne Conill:

"In several other operating systems, it is a hard requirement that the
second argument to execve(2) be the name of a program, thus prohibiting
a scenario where argc < 1. POSIX 2017 also recommends this behaviour,
but it is not an explicit requirement[2]:

    The argument arg0 should point to a filename string that is
    associated with the process being started by one of the exec
    functions.
...
Interestingly, Michael Kerrisk opened an issue about this in 2008[3],
but there was no consensus to support fixing this issue then.
Hopefully now that CVE-2021-4034 shows practical exploitative use[4]
of this bug in a shellcode, we can reconsider.

This issue is being tracked in the KSPP issue tracker[5]."

While the initial code searches[6][7] turned up what appeared to be
mostly corner case tests, trying to that just reject argv == NULL
(or an immediately terminated pointer list) quickly started tripping[8]
existing userspace programs.

The next best approach is forcing a single empty string into argv and
adjusting argc to match. The number of programs depending on argc == 0
seems a smaller set than those calling execve with a NULL argv.

Account for the additional stack space in bprm_stack_limits(). Inject an
empty string when argc == 0 (and set argc = 1). Warn about the case so
userspace has some notice about the change:

    process './argc0' launched './argc0' with NULL argv: empty string added

Additionally WARN() and reject NULL argv usage for kernel threads.

[1] https://lore.kernel.org/lkml/20220127000724.15106-1-ariadne@dereferenced.org/
[2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html
[3] https://bugzilla.kernel.org/show_bug.cgi?id=8408
[4] https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt
[5] https://github.com/KSPP/linux/issues/176
[6] https://codesearch.debian.net/search?q=execve%5C+*%5C%28%5B%5E%2C%5D%2B%2C+*NULL&literal=0
[7] https://codesearch.debian.net/search?q=execlp%3F%5Cs*%5C%28%5B%5E%2C%5D%2B%2C%5Cs*NULL&literal=0
[8] https://lore.kernel.org/lkml/20220131144352.GE16385@xsang-OptiPlex-9020/

Reported-by: Ariadne Conill <ariadne@dereferenced.org>
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Ariadne Conill <ariadne@dereferenced.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20220201000947.2453721-1-keescook@chromium.org
[vegard: fixed conflicts due to missing
 886d7de631da71e30909980fdbf318f7caade262^- and
 3950e975431bc914f7e81b8f2a2dbdf2064acb0f^-]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/exec.c |   25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

This has been tested in both argc == 0 and argc >= 1 cases, but I would
still appreciate a review given the differences with mainline. If it's
considered too risky I'm also fine with dropping it -- just wanted to
make sure this didn't fall through the cracks, as it does block a real
(albeit old by now) exploit.

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -454,6 +454,9 @@ static int prepare_arg_pages(struct linu
 	unsigned long limit, ptr_size;
 
 	bprm->argc = count(argv, MAX_ARG_STRINGS);
+	if (bprm->argc == 0)
+		pr_warn_once("process '%s' launched '%s' with NULL argv: empty string added\n",
+			     current->comm, bprm->filename);
 	if (bprm->argc < 0)
 		return bprm->argc;
 
@@ -482,8 +485,14 @@ static int prepare_arg_pages(struct linu
 	 * the stack. They aren't stored until much later when we can't
 	 * signal to the parent that the child has run out of stack space.
 	 * Instead, calculate it here so it's possible to fail gracefully.
+	 *
+	 * In the case of argc = 0, make sure there is space for adding a
+	 * empty string (which will bump argc to 1), to ensure confused
+	 * userspace programs don't start processing from argv[1], thinking
+	 * argc can never be 0, to keep them from walking envp by accident.
+	 * See do_execveat_common().
 	 */
-	ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
+	ptr_size = (max(bprm->argc, 1) + bprm->envc) * sizeof(void *);
 	if (limit <= ptr_size)
 		return -E2BIG;
 	limit -= ptr_size;
@@ -1848,6 +1857,20 @@ static int __do_execve_file(int fd, stru
 	if (retval < 0)
 		goto out;
 
+	/*
+	 * When argv is empty, add an empty string ("") as argv[0] to
+	 * ensure confused userspace programs that start processing
+	 * from argv[1] won't end up walking envp. See also
+	 * bprm_stack_limits().
+	 */
+	if (bprm->argc == 0) {
+		const char *argv[] = { "", NULL };
+		retval = copy_strings_kernel(1, argv, bprm);
+		if (retval < 0)
+			goto out;
+		bprm->argc = 1;
+	}
+
 	retval = exec_binprm(bprm);
 	if (retval < 0)
 		goto out;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 20/34] netfilter: conntrack: re-fetch conntrack after insertion
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (18 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 19/34] exec: Force single empty string when argv is empty Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 21/34] crypto: ecrdsa - Fix incorrect use of vli_cmp Greg Kroah-Hartman
                   ` (17 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, syzbot+793a590957d9c1b96620,
	Florian Westphal, Pablo Neira Ayuso

From: Florian Westphal <fw@strlen.de>

commit 56b14ecec97f39118bf85c9ac2438c5a949509ed upstream.

In case the conntrack is clashing, insertion can free skb->_nfct and
set skb->_nfct to the already-confirmed entry.

This wasn't found before because the conntrack entry and the extension
space used to free'd after an rcu grace period, plus the race needs
events enabled to trigger.

Reported-by: <syzbot+793a590957d9c1b96620@syzkaller.appspotmail.com>
Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race")
Fixes: 2ad9d7747c10 ("netfilter: conntrack: free extension area immediately")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/netfilter/nf_conntrack_core.h |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -59,8 +59,13 @@ static inline int nf_conntrack_confirm(s
 	int ret = NF_ACCEPT;
 
 	if (ct) {
-		if (!nf_ct_is_confirmed(ct))
+		if (!nf_ct_is_confirmed(ct)) {
 			ret = __nf_conntrack_confirm(skb);
+
+			if (ret == NF_ACCEPT)
+				ct = (struct nf_conn *)skb_nfct(skb);
+		}
+
 		if (likely(ret == NF_ACCEPT))
 			nf_ct_deliver_cached_events(ct);
 	}



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 21/34] crypto: ecrdsa - Fix incorrect use of vli_cmp
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (19 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 20/34] netfilter: conntrack: re-fetch conntrack after insertion Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 22/34] zsmalloc: fix races between asynchronous zspage free and page migration Greg Kroah-Hartman
                   ` (16 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Vitaly Chikunov, Herbert Xu

From: Vitaly Chikunov <vt@altlinux.org>

commit 7cc7ab73f83ee6d50dc9536bc3355495d8600fad upstream.

Correctly compare values that shall be greater-or-equal and not just
greater.

Fixes: 0d7a78643f69 ("crypto: ecrdsa - add EC-RDSA (GOST 34.10) algorithm")
Cc: <stable@vger.kernel.org>
Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 crypto/ecrdsa.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/crypto/ecrdsa.c
+++ b/crypto/ecrdsa.c
@@ -112,15 +112,15 @@ static int ecrdsa_verify(struct akcipher
 
 	/* Step 1: verify that 0 < r < q, 0 < s < q */
 	if (vli_is_zero(r, ndigits) ||
-	    vli_cmp(r, ctx->curve->n, ndigits) == 1 ||
+	    vli_cmp(r, ctx->curve->n, ndigits) >= 0 ||
 	    vli_is_zero(s, ndigits) ||
-	    vli_cmp(s, ctx->curve->n, ndigits) == 1)
+	    vli_cmp(s, ctx->curve->n, ndigits) >= 0)
 		return -EKEYREJECTED;
 
 	/* Step 2: calculate hash (h) of the message (passed as input) */
 	/* Step 3: calculate e = h \mod q */
 	vli_from_le64(e, digest, ndigits);
-	if (vli_cmp(e, ctx->curve->n, ndigits) == 1)
+	if (vli_cmp(e, ctx->curve->n, ndigits) >= 0)
 		vli_sub(e, e, ctx->curve->n, ndigits);
 	if (vli_is_zero(e, ndigits))
 		e[0] = 1;
@@ -136,7 +136,7 @@ static int ecrdsa_verify(struct akcipher
 	/* Step 6: calculate point C = z_1P + z_2Q, and R = x_c \mod q */
 	ecc_point_mult_shamir(&cc, z1, &ctx->curve->g, z2, &ctx->pub_key,
 			      ctx->curve);
-	if (vli_cmp(cc.x, ctx->curve->n, ndigits) == 1)
+	if (vli_cmp(cc.x, ctx->curve->n, ndigits) >= 0)
 		vli_sub(cc.x, cc.x, ctx->curve->n, ndigits);
 
 	/* Step 7: if R == r signature is valid */



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 22/34] zsmalloc: fix races between asynchronous zspage free and page migration
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (20 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 21/34] crypto: ecrdsa - Fix incorrect use of vli_cmp Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 23/34] dm integrity: fix error code in dm_integrity_ctr() Greg Kroah-Hartman
                   ` (15 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sultan Alsawaf, Minchan Kim,
	Nitin Gupta, Sergey Senozhatsky, Andrew Morton

From: Sultan Alsawaf <sultan@kerneltoast.com>

commit 2505a981114dcb715f8977b8433f7540854851d8 upstream.

The asynchronous zspage free worker tries to lock a zspage's entire page
list without defending against page migration.  Since pages which haven't
yet been locked can concurrently migrate off the zspage page list while
lock_zspage() churns away, lock_zspage() can suffer from a few different
lethal races.

It can lock a page which no longer belongs to the zspage and unsafely
dereference page_private(), it can unsafely dereference a torn pointer to
the next page (since there's a data race), and it can observe a spurious
NULL pointer to the next page and thus not lock all of the zspage's pages
(since a single page migration will reconstruct the entire page list, and
create_page_chain() unconditionally zeroes out each list pointer in the
process).

Fix the races by using migrate_read_lock() in lock_zspage() to synchronize
with page migration.

Link: https://lkml.kernel.org/r/20220509024703.243847-1-sultan@kerneltoast.com
Fixes: 77ff465799c602 ("zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse")
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/zsmalloc.c |   37 +++++++++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 4 deletions(-)

--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1748,11 +1748,40 @@ static enum fullness_group putback_zspag
  */
 static void lock_zspage(struct zspage *zspage)
 {
-	struct page *page = get_first_page(zspage);
+	struct page *curr_page, *page;
 
-	do {
-		lock_page(page);
-	} while ((page = get_next_page(page)) != NULL);
+	/*
+	 * Pages we haven't locked yet can be migrated off the list while we're
+	 * trying to lock them, so we need to be careful and only attempt to
+	 * lock each page under migrate_read_lock(). Otherwise, the page we lock
+	 * may no longer belong to the zspage. This means that we may wait for
+	 * the wrong page to unlock, so we must take a reference to the page
+	 * prior to waiting for it to unlock outside migrate_read_lock().
+	 */
+	while (1) {
+		migrate_read_lock(zspage);
+		page = get_first_page(zspage);
+		if (trylock_page(page))
+			break;
+		get_page(page);
+		migrate_read_unlock(zspage);
+		wait_on_page_locked(page);
+		put_page(page);
+	}
+
+	curr_page = page;
+	while ((page = get_next_page(curr_page))) {
+		if (trylock_page(page)) {
+			curr_page = page;
+		} else {
+			get_page(page);
+			migrate_read_unlock(zspage);
+			wait_on_page_locked(page);
+			put_page(page);
+			migrate_read_lock(zspage);
+		}
+	}
+	migrate_read_unlock(zspage);
 }
 
 static int zs_init_fs_context(struct fs_context *fc)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 23/34] dm integrity: fix error code in dm_integrity_ctr()
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (21 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 22/34] zsmalloc: fix races between asynchronous zspage free and page migration Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 24/34] dm crypt: make printing of the key constant-time Greg Kroah-Hartman
                   ` (14 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Dan Carpenter, Mikulas Patocka, Mike Snitzer

From: Dan Carpenter <dan.carpenter@oracle.com>

commit d3f2a14b8906df913cb04a706367b012db94a6e8 upstream.

The "r" variable shadows an earlier "r" that has function scope.  It
means that we accidentally return success instead of an error code.
Smatch has a warning for this:

	drivers/md/dm-integrity.c:4503 dm_integrity_ctr()
	warn: missing error code 'r'

Fixes: 7eada909bfd7 ("dm: add integrity target")
Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/md/dm-integrity.c |    2 --
 1 file changed, 2 deletions(-)

--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -4149,8 +4149,6 @@ try_smaller_buffer:
 	}
 
 	if (should_write_sb) {
-		int r;
-
 		init_journal(ic, 0, ic->journal_sections, 0);
 		r = dm_integrity_failed(ic);
 		if (unlikely(r)) {



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 24/34] dm crypt: make printing of the key constant-time
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (22 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 23/34] dm integrity: fix error code in dm_integrity_ctr() Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 25/34] dm stats: add cond_resched when looping over entries Greg Kroah-Hartman
                   ` (13 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mikulas Patocka, Milan Broz, Mike Snitzer

From: Mikulas Patocka <mpatocka@redhat.com>

commit 567dd8f34560fa221a6343729474536aa7ede4fd upstream.

The device mapper dm-crypt target is using scnprintf("%02x", cc->key[i]) to
report the current key to userspace. However, this is not a constant-time
operation and it may leak information about the key via timing, via cache
access patterns or via the branch predictor.

Change dm-crypt's key printing to use "%c" instead of "%02x". Also
introduce hex2asc() that carefully avoids any branching or memory
accesses when converting a number in the range 0 ... 15 to an ascii
character.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/md/dm-crypt.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -2817,6 +2817,11 @@ static int crypt_map(struct dm_target *t
 	return DM_MAPIO_SUBMITTED;
 }
 
+static char hex2asc(unsigned char c)
+{
+	return c + '0' + ((unsigned)(9 - c) >> 4 & 0x27);
+}
+
 static void crypt_status(struct dm_target *ti, status_type_t type,
 			 unsigned status_flags, char *result, unsigned maxlen)
 {
@@ -2835,9 +2840,12 @@ static void crypt_status(struct dm_targe
 		if (cc->key_size > 0) {
 			if (cc->key_string)
 				DMEMIT(":%u:%s", cc->key_size, cc->key_string);
-			else
-				for (i = 0; i < cc->key_size; i++)
-					DMEMIT("%02x", cc->key[i]);
+			else {
+				for (i = 0; i < cc->key_size; i++) {
+					DMEMIT("%c%c", hex2asc(cc->key[i] >> 4),
+					       hex2asc(cc->key[i] & 0xf));
+				}
+			}
 		} else
 			DMEMIT("-");
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 25/34] dm stats: add cond_resched when looping over entries
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (23 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 24/34] dm crypt: make printing of the key constant-time Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag Greg Kroah-Hartman
                   ` (12 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Mikulas Patocka, Mike Snitzer

From: Mikulas Patocka <mpatocka@redhat.com>

commit bfe2b0146c4d0230b68f5c71a64380ff8d361f8b upstream.

dm-stats can be used with a very large number of entries (it is only
limited by 1/4 of total system memory), so add rescheduling points to
the loops that iterate over the entries.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/md/dm-stats.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/drivers/md/dm-stats.c
+++ b/drivers/md/dm-stats.c
@@ -224,6 +224,7 @@ void dm_stats_cleanup(struct dm_stats *s
 				       atomic_read(&shared->in_flight[READ]),
 				       atomic_read(&shared->in_flight[WRITE]));
 			}
+			cond_resched();
 		}
 		dm_stat_free(&s->rcu_head);
 	}
@@ -313,6 +314,7 @@ static int dm_stats_create(struct dm_sta
 	for (ni = 0; ni < n_entries; ni++) {
 		atomic_set(&s->stat_shared[ni].in_flight[READ], 0);
 		atomic_set(&s->stat_shared[ni].in_flight[WRITE], 0);
+		cond_resched();
 	}
 
 	if (s->n_histogram_entries) {
@@ -325,6 +327,7 @@ static int dm_stats_create(struct dm_sta
 		for (ni = 0; ni < n_entries; ni++) {
 			s->stat_shared[ni].tmp.histogram = hi;
 			hi += s->n_histogram_entries + 1;
+			cond_resched();
 		}
 	}
 
@@ -345,6 +348,7 @@ static int dm_stats_create(struct dm_sta
 			for (ni = 0; ni < n_entries; ni++) {
 				p[ni].histogram = hi;
 				hi += s->n_histogram_entries + 1;
+				cond_resched();
 			}
 		}
 	}
@@ -474,6 +478,7 @@ static int dm_stats_list(struct dm_stats
 			}
 			DMEMIT("\n");
 		}
+		cond_resched();
 	}
 	mutex_unlock(&stats->mutex);
 
@@ -750,6 +755,7 @@ static void __dm_stat_clear(struct dm_st
 				local_irq_enable();
 			}
 		}
+		cond_resched();
 	}
 }
 
@@ -865,6 +871,8 @@ static int dm_stats_print(struct dm_stat
 
 		if (unlikely(sz + 1 >= maxlen))
 			goto buffer_overflow;
+
+		cond_resched();
 	}
 
 	if (clear)



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (24 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 25/34] dm stats: add cond_resched when looping over entries Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-10  4:22   ` Oleksandr Tymoshenko
  2022-06-03 17:43 ` [PATCH 5.4 27/34] raid5: introduce MD_BROKEN Greg Kroah-Hartman
                   ` (11 subsequent siblings)
  37 siblings, 1 reply; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Sarthak Kukreti, Kees Cook, Mike Snitzer

From: Sarthak Kukreti <sarthakkukreti@google.com>

commit 4caae58406f8ceb741603eee460d79bacca9b1b5 upstream.

The device-mapper framework provides a mechanism to mark targets as
immutable (and hence fail table reloads that try to change the target
type). Add the DM_TARGET_IMMUTABLE flag to the dm-verity target's
feature flags to prevent switching the verity target with a different
target type.

Fixes: a4ffc152198e ("dm: add verity target")
Cc: stable@vger.kernel.org
Signed-off-by: Sarthak Kukreti <sarthakkukreti@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/md/dm-verity-target.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/md/dm-verity-target.c
+++ b/drivers/md/dm-verity-target.c
@@ -1217,6 +1217,7 @@ bad:
 
 static struct target_type verity_target = {
 	.name		= "verity",
+	.features	= DM_TARGET_IMMUTABLE,
 	.version	= {1, 5, 0},
 	.module		= THIS_MODULE,
 	.ctr		= verity_ctr,



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 27/34] raid5: introduce MD_BROKEN
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (25 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 28/34] HID: multitouch: Add support for Google Whiskers Touchpad Greg Kroah-Hartman
                   ` (10 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Mariusz Tkaczyk, Song Liu, Xiao Ni

From: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>

commit 57668f0a4cc4083a120cc8c517ca0055c4543b59 upstream.

Raid456 module had allowed to achieve failed state. It was fixed by
fb73b357fb9 ("raid5: block failing device if raid will be failed").
This fix introduces a bug, now if raid5 fails during IO, it may result
with a hung task without completion. Faulty flag on the device is
necessary to process all requests and is checked many times, mainly in
analyze_stripe().
Allow to set faulty on drive again and set MD_BROKEN if raid is failed.

As a result, this level is allowed to achieve failed state again, but
communication with userspace (via -EBUSY status) will be preserved.

This restores possibility to fail array via #mdadm --set-faulty command
and will be fixed by additional verification on mdadm side.

Reproduction steps:
 mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1
 mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean
 mkfs.xfs /dev/md126 -f
 mount /dev/md126 /mnt/root/

 fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw
--bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4
--time_based --group_reporting --name=throughput-test-job
--eta-newline=1 &

 echo 1 > /sys/block/nvme2n1/device/device/remove
 echo 1 > /sys/block/nvme1n1/device/device/remove

 [ 1475.787779] Call Trace:
 [ 1475.793111] __schedule+0x2a6/0x700
 [ 1475.799460] schedule+0x38/0xa0
 [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456]
 [ 1475.813856] ? finish_wait+0x80/0x80
 [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456]
 [ 1475.828281] ? finish_wait+0x80/0x80
 [ 1475.834727] ? finish_wait+0x80/0x80
 [ 1475.841127] ? finish_wait+0x80/0x80
 [ 1475.847480] md_handle_request+0x119/0x190
 [ 1475.854390] md_make_request+0x8a/0x190
 [ 1475.861041] generic_make_request+0xcf/0x310
 [ 1475.868145] submit_bio+0x3c/0x160
 [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60
 [ 1475.882070] iomap_dio_bio_actor+0x175/0x390
 [ 1475.889149] iomap_apply+0xff/0x310
 [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.909974] iomap_dio_rw+0x2f2/0x490
 [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390
 [ 1475.923680] ? atime_needs_update+0x77/0xe0
 [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
 [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
 [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs]
 [ 1475.953403] aio_read+0xd5/0x180
 [ 1475.959395] ? _cond_resched+0x15/0x30
 [ 1475.965907] io_submit_one+0x20b/0x3c0
 [ 1475.972398] __x64_sys_io_submit+0xa2/0x180
 [ 1475.979335] ? do_io_getevents+0x7c/0xc0
 [ 1475.986009] do_syscall_64+0x5b/0x1a0
 [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca
 [ 1476.000255] RIP: 0033:0x7f11fc27978d
 [ 1476.006631] Code: Bad RIP value.
 [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds.

Cc: stable@vger.kernel.org
Fixes: fb73b357fb9 ("raid5: block failing device if raid will be failed")
Reviewd-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/md/raid5.c |   47 ++++++++++++++++++++++-------------------------
 1 file changed, 22 insertions(+), 25 deletions(-)

--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -609,17 +609,17 @@ int raid5_calc_degraded(struct r5conf *c
 	return degraded;
 }
 
-static int has_failed(struct r5conf *conf)
+static bool has_failed(struct r5conf *conf)
 {
-	int degraded;
+	int degraded = conf->mddev->degraded;
 
-	if (conf->mddev->reshape_position == MaxSector)
-		return conf->mddev->degraded > conf->max_degraded;
+	if (test_bit(MD_BROKEN, &conf->mddev->flags))
+		return true;
 
-	degraded = raid5_calc_degraded(conf);
-	if (degraded > conf->max_degraded)
-		return 1;
-	return 0;
+	if (conf->mddev->reshape_position != MaxSector)
+		degraded = raid5_calc_degraded(conf);
+
+	return degraded > conf->max_degraded;
 }
 
 struct stripe_head *
@@ -2679,34 +2679,31 @@ static void raid5_error(struct mddev *md
 	unsigned long flags;
 	pr_debug("raid456: error called\n");
 
+	pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n",
+		mdname(mddev), bdevname(rdev->bdev, b));
+
 	spin_lock_irqsave(&conf->device_lock, flags);
+	set_bit(Faulty, &rdev->flags);
+	clear_bit(In_sync, &rdev->flags);
+	mddev->degraded = raid5_calc_degraded(conf);
 
-	if (test_bit(In_sync, &rdev->flags) &&
-	    mddev->degraded == conf->max_degraded) {
-		/*
-		 * Don't allow to achieve failed state
-		 * Don't try to recover this device
-		 */
+	if (has_failed(conf)) {
+		set_bit(MD_BROKEN, &conf->mddev->flags);
 		conf->recovery_disabled = mddev->recovery_disabled;
-		spin_unlock_irqrestore(&conf->device_lock, flags);
-		return;
+
+		pr_crit("md/raid:%s: Cannot continue operation (%d/%d failed).\n",
+			mdname(mddev), mddev->degraded, conf->raid_disks);
+	} else {
+		pr_crit("md/raid:%s: Operation continuing on %d devices.\n",
+			mdname(mddev), conf->raid_disks - mddev->degraded);
 	}
 
-	set_bit(Faulty, &rdev->flags);
-	clear_bit(In_sync, &rdev->flags);
-	mddev->degraded = raid5_calc_degraded(conf);
 	spin_unlock_irqrestore(&conf->device_lock, flags);
 	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 
 	set_bit(Blocked, &rdev->flags);
 	set_mask_bits(&mddev->sb_flags, 0,
 		      BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING));
-	pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n"
-		"md/raid:%s: Operation continuing on %d devices.\n",
-		mdname(mddev),
-		bdevname(rdev->bdev, b),
-		mdname(mddev),
-		conf->raid_disks - mddev->degraded);
 	r5c_update_on_rdev_error(mddev, rdev);
 }
 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 28/34] HID: multitouch: Add support for Google Whiskers Touchpad
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (26 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 27/34] raid5: introduce MD_BROKEN Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 29/34] tpm: Fix buffer access in tpm2_get_tpm_pt() Greg Kroah-Hartman
                   ` (9 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Marek Maslanka, Benjamin Tissoires,
	Jiri Kosina

From: Marek Maślanka <mm@semihalf.com>

commit 1d07cef7fd7599450b3d03e1915efc2a96e1f03f upstream.

The Google Whiskers touchpad does not work properly with the default
multitouch configuration. Instead, use the same configuration as Google
Rose.

Signed-off-by: Marek Maslanka <mm@semihalf.com>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/hid/hid-multitouch.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/hid/hid-multitouch.c
+++ b/drivers/hid/hid-multitouch.c
@@ -2158,6 +2158,9 @@ static const struct hid_device_id mt_dev
 	{ .driver_data = MT_CLS_GOOGLE,
 		HID_DEVICE(HID_BUS_ANY, HID_GROUP_ANY, USB_VENDOR_ID_GOOGLE,
 			USB_DEVICE_ID_GOOGLE_TOUCH_ROSE) },
+	{ .driver_data = MT_CLS_GOOGLE,
+		HID_DEVICE(BUS_USB, HID_GROUP_MULTITOUCH_WIN_8, USB_VENDOR_ID_GOOGLE,
+			USB_DEVICE_ID_GOOGLE_WHISKERS) },
 
 	/* Generic MT device */
 	{ HID_DEVICE(HID_BUS_ANY, HID_GROUP_MULTITOUCH, HID_ANY_ID, HID_ANY_ID) },



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 29/34] tpm: Fix buffer access in tpm2_get_tpm_pt()
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (27 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 28/34] HID: multitouch: Add support for Google Whiskers Touchpad Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 30/34] tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe() Greg Kroah-Hartman
                   ` (8 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Stefan Mahnke-Hartmann, Jarkko Sakkinen

From: Stefan Mahnke-Hartmann <stefan.mahnke-hartmann@infineon.com>

commit e57b2523bd37e6434f4e64c7a685e3715ad21e9a upstream.

Under certain conditions uninitialized memory will be accessed.
As described by TCG Trusted Platform Module Library Specification,
rev. 1.59 (Part 3: Commands), if a TPM2_GetCapability is received,
requesting a capability, the TPM in field upgrade mode may return a
zero length list.
Check the property count in tpm2_get_tpm_pt().

Fixes: 2ab3241161b3 ("tpm: migrate tpm2_get_tpm_pt() to use struct tpm_buf")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Mahnke-Hartmann <stefan.mahnke-hartmann@infineon.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/char/tpm/tpm2-cmd.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/drivers/char/tpm/tpm2-cmd.c
+++ b/drivers/char/tpm/tpm2-cmd.c
@@ -706,7 +706,16 @@ ssize_t tpm2_get_tpm_pt(struct tpm_chip
 	if (!rc) {
 		out = (struct tpm2_get_cap_out *)
 			&buf.data[TPM_HEADER_SIZE];
-		*value = be32_to_cpu(out->value);
+		/*
+		 * To prevent failing boot up of some systems, Infineon TPM2.0
+		 * returns SUCCESS on TPM2_Startup in field upgrade mode. Also
+		 * the TPM2_Getcapability command returns a zero length list
+		 * in field upgrade mode.
+		 */
+		if (be32_to_cpu(out->property_cnt) > 0)
+			*value = be32_to_cpu(out->value);
+		else
+			rc = -ENODATA;
 	}
 	tpm_buf_destroy(&buf);
 	return rc;



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 30/34] tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe()
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (28 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 29/34] tpm: Fix buffer access in tpm2_get_tpm_pt() Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 31/34] docs: submitting-patches: Fix crossref to The canonical patch format Greg Kroah-Hartman
                   ` (7 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Xiu Jianfeng, Stefan Berger, Jarkko Sakkinen

From: Xiu Jianfeng <xiujianfeng@huawei.com>

commit d0dc1a7100f19121f6e7450f9cdda11926aa3838 upstream.

Currently it returns zero when CRQ response timed out, it should return
an error code instead.

Fixes: d8d74ea3c002 ("tpm: ibmvtpm: Wait for buffer to be set before proceeding")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/char/tpm/tpm_ibmvtpm.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/char/tpm/tpm_ibmvtpm.c
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -685,6 +685,7 @@ static int tpm_ibmvtpm_probe(struct vio_
 	if (!wait_event_timeout(ibmvtpm->crq_queue.wq,
 				ibmvtpm->rtce_buf != NULL,
 				HZ)) {
+		rc = -ENODEV;
 		dev_err(dev, "CRQ response timed out\n");
 		goto init_irq_cleanup;
 	}



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 31/34] docs: submitting-patches: Fix crossref to The canonical patch format
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (29 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 30/34] tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe() Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 32/34] NFS: Memory allocation failures are not server fatal errors Greg Kroah-Hartman
                   ` (6 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Akira Yokosawa, Jonathan Corbet,
	Mauro Carvalho Chehab

From: Akira Yokosawa <akiyks@gmail.com>

commit 6d5aa418b3bd42cdccc36e94ee199af423ef7c84 upstream.

The reference to `explicit_in_reply_to` is pointless as when the
reference was added in the form of "#15" [1], Section 15) was "The
canonical patch format".
The reference of "#15" had not been properly updated in a couple of
reorganizations during the plain-text SubmittingPatches era.

Fix it by using `the_canonical_patch_format`.

[1]: 2ae19acaa50a ("Documentation: Add "how to write a good patch summary" to SubmittingPatches")

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Fixes: 5903019b2a5e ("Documentation/SubmittingPatches: convert it to ReST markup")
Fixes: 9b2c76777acc ("Documentation/SubmittingPatches: enrich the Sphinx output")
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: stable@vger.kernel.org # v4.9+
Link: https://lore.kernel.org/r/64e105a5-50be-23f2-6cae-903a2ea98e18@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/process/submitting-patches.rst |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/Documentation/process/submitting-patches.rst
+++ b/Documentation/process/submitting-patches.rst
@@ -133,7 +133,7 @@ as you intend it to.
 
 The maintainer will thank you if you write your patch description in a
 form which can be easily pulled into Linux's source code management
-system, ``git``, as a "commit log".  See :ref:`explicit_in_reply_to`.
+system, ``git``, as a "commit log".  See :ref:`the_canonical_patch_format`.
 
 Solve only one problem per patch.  If your description starts to get
 long, that's a sign that you probably need to split up your patch.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 32/34] NFS: Memory allocation failures are not server fatal errors
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (30 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 31/34] docs: submitting-patches: Fix crossref to The canonical patch format Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 33/34] NFSD: Fix possible sleep during nfsd4_release_lockowner() Greg Kroah-Hartman
                   ` (5 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Olga Kornievskaia, Trond Myklebust,
	Anna Schumaker

From: Trond Myklebust <trond.myklebust@hammerspace.com>

commit 452284407c18d8a522c3039339b1860afa0025a8 upstream.

We need to filter out ENOMEM in nfs_error_is_fatal_on_server(), because
running out of memory on our client is not a server error.

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: 2dc23afffbca ("NFS: ENOMEM should also be a fatal error.")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/nfs/internal.h |    1 +
 1 file changed, 1 insertion(+)

--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -775,6 +775,7 @@ static inline bool nfs_error_is_fatal_on
 	case 0:
 	case -ERESTARTSYS:
 	case -EINTR:
+	case -ENOMEM:
 		return false;
 	}
 	return nfs_error_is_fatal(err);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 33/34] NFSD: Fix possible sleep during nfsd4_release_lockowner()
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (31 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 32/34] NFS: Memory allocation failures are not server fatal errors Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-03 17:43 ` [PATCH 5.4 34/34] bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes Greg Kroah-Hartman
                   ` (4 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg Kroah-Hartman, stable, Dai Ngo, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

commit ce3c4ad7f4ce5db7b4f08a1e237d8dd94b39180b upstream.

nfsd4_release_lockowner() holds clp->cl_lock when it calls
check_for_locks(). However, check_for_locks() calls nfsd_file_get()
/ nfsd_file_put() to access the backing inode's flc_posix list, and
nfsd_file_put() can sleep if the inode was recently removed.

Let's instead rely on the stateowner's reference count to gate
whether the release is permitted. This should be a reliable
indication of locks-in-use since file lock operations and
->lm_get_owner take appropriate references, which are released
appropriately when file locks are removed.

Reported-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/nfsd/nfs4state.c |   12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -6894,16 +6894,12 @@ nfsd4_release_lockowner(struct svc_rqst
 		if (sop->so_is_open_owner || !same_owner_str(sop, owner))
 			continue;
 
-		/* see if there are still any locks associated with it */
-		lo = lockowner(sop);
-		list_for_each_entry(stp, &sop->so_stateids, st_perstateowner) {
-			if (check_for_locks(stp->st_stid.sc_file, lo)) {
-				status = nfserr_locks_held;
-				spin_unlock(&clp->cl_lock);
-				return status;
-			}
+		if (atomic_read(&sop->so_count) != 1) {
+			spin_unlock(&clp->cl_lock);
+			return nfserr_locks_held;
 		}
 
+		lo = lockowner(sop);
 		nfs4_get_stateowner(sop);
 		break;
 	}



^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 5.4 34/34] bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (32 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 33/34] NFSD: Fix possible sleep during nfsd4_release_lockowner() Greg Kroah-Hartman
@ 2022-06-03 17:43 ` Greg Kroah-Hartman
  2022-06-04 12:21 ` [PATCH 5.4 00/34] 5.4.197-rc1 review Sudip Mukherjee
                   ` (3 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-06-03 17:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Liu Jian, Daniel Borkmann, Song Liu

From: Liu Jian <liujian56@huawei.com>

commit 45969b4152c1752089351cd6836a42a566d49bcf upstream.

The data length of skb frags + frag_list may be greater than 0xffff, and
skb_header_pointer can not handle negative offset. So, here INT_MAX is used
to check the validity of offset. Add the same change to the related function
skb_store_bytes.

Fixes: 05c74e5e53f6 ("bpf: add bpf_skb_load_bytes helper")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220416105801.88708-2-liujian56@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/filter.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1668,7 +1668,7 @@ BPF_CALL_5(bpf_skb_store_bytes, struct s
 
 	if (unlikely(flags & ~(BPF_F_RECOMPUTE_CSUM | BPF_F_INVALIDATE_HASH)))
 		return -EINVAL;
-	if (unlikely(offset > 0xffff))
+	if (unlikely(offset > INT_MAX))
 		return -EFAULT;
 	if (unlikely(bpf_try_make_writable(skb, offset + len)))
 		return -EFAULT;
@@ -1703,7 +1703,7 @@ BPF_CALL_4(bpf_skb_load_bytes, const str
 {
 	void *ptr;
 
-	if (unlikely(offset > 0xffff))
+	if (unlikely(offset > INT_MAX))
 		goto err_clear;
 
 	ptr = skb_header_pointer(skb, offset, len, to);



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 00/34] 5.4.197-rc1 review
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (33 preceding siblings ...)
  2022-06-03 17:43 ` [PATCH 5.4 34/34] bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes Greg Kroah-Hartman
@ 2022-06-04 12:21 ` Sudip Mukherjee
  2022-06-04 17:31 ` Naresh Kamboju
                   ` (2 subsequent siblings)
  37 siblings, 0 replies; 68+ messages in thread
From: Sudip Mukherjee @ 2022-06-04 12:21 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, stable, torvalds, akpm, linux, shuah, patches,
	lkft-triage, pavel, jonathanh, f.fainelli, slade

Hi Greg,

On Fri, Jun 03, 2022 at 07:42:56PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.4.197 release.
> There are 34 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000.
> Anything received after that time might be too late.

Build test:
mips (gcc version 11.3.1 20220531): 65 configs -> no failure
arm (gcc version 11.3.1 20220531): 106 configs -> no failure
arm64 (gcc version 11.3.1 20220531): 2 configs -> no failure
x86_64 (gcc version 11.3.1 20220531): 4 configs -> no failure

Boot test:
x86_64: Booted on my test laptop. No regression.
x86_64: Booted on qemu. No regression. [1]

[1]. https://openqa.qa.codethink.co.uk/tests/1264


Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>

--
Regards
Sudip


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 00/34] 5.4.197-rc1 review
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (34 preceding siblings ...)
  2022-06-04 12:21 ` [PATCH 5.4 00/34] 5.4.197-rc1 review Sudip Mukherjee
@ 2022-06-04 17:31 ` Naresh Kamboju
  2022-06-04 18:54 ` Guenter Roeck
  2022-06-06  1:08 ` Samuel Zou
  37 siblings, 0 replies; 68+ messages in thread
From: Naresh Kamboju @ 2022-06-04 17:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, stable, torvalds, akpm, linux, shuah, patches,
	lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee,
	slade

On Fri, 3 Jun 2022 at 23:14, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> This is the start of the stable review cycle for the 5.4.197 release.
> There are 34 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
>         https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.197-rc1.gz
> or in the git tree and branch at:
>         git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>

## Build
* kernel: 5.4.197-rc1
* git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
* git branch: linux-5.4.y
* git commit: 2b69e7392fd9509c34f22e22898d4fd8de4bac19
* git describe: v5.4.196-35-g2b69e7392fd9
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.4.y/build/v5.4.196-35-g2b69e7392fd9

## Test Regressions (compared to v5.4.196-11-g04a2bb5e4a0b)
No test regressions found.

## Metric Regressions (compared to v5.4.196-11-g04a2bb5e4a0b)
No metric regressions found.

## Test Fixes (compared to v5.4.196-11-g04a2bb5e4a0b)
No test fixes found.

## Metric Fixes (compared to v5.4.196-11-g04a2bb5e4a0b)
No metric fixes found.

## Test result summary
total: 130079, pass: 116477, fail: 185, skip: 12140, xfail: 1277

## Build Summary
* arc: 10 total, 10 passed, 0 failed
* arm: 313 total, 313 passed, 0 failed
* arm64: 57 total, 53 passed, 4 failed
* i386: 28 total, 25 passed, 3 failed
* mips: 37 total, 37 passed, 0 failed
* parisc: 12 total, 12 passed, 0 failed
* powerpc: 54 total, 54 passed, 0 failed
* riscv: 27 total, 27 passed, 0 failed
* s390: 12 total, 12 passed, 0 failed
* sh: 24 total, 24 passed, 0 failed
* sparc: 12 total, 12 passed, 0 failed
* x86_64: 55 total, 54 passed, 1 failed

## Test suites summary
* fwts
* kunit
* kvm-unit-tests
* libgpiod
* libhugetlbfs
* log-parser-boot
* log-parser-test
* ltp-cap_bounds
* ltp-cap_bounds-tests
* ltp-commands
* ltp-commands-tests
* ltp-containers
* ltp-containers-tests
* ltp-controllers-tests
* ltp-cpuhotplug-tests
* ltp-crypto
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests
* ltp-fcntl-locktests-tests
* ltp-filecaps
* ltp-filecaps-tests
* ltp-fs
* ltp-fs-tests
* ltp-fs_bind
* ltp-fs_bind-tests
* ltp-fs_perms_simple
* ltp-fs_perms_simple-tests
* ltp-fsx
* ltp-fsx-tests
* ltp-hugetlb
* ltp-hugetlb-tests
* ltp-io
* ltp-io-tests
* ltp-ipc
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty
* ltp-pty-tests
* ltp-sched
* ltp-sched-tests
* ltp-securebits
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* network-basic-tests
* packetdrill
* perf
* perf/Zstd-perf.data-compression
* rcutorture
* ssuite
* v4l2-compliance
* vdso

--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 00/34] 5.4.197-rc1 review
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (35 preceding siblings ...)
  2022-06-04 17:31 ` Naresh Kamboju
@ 2022-06-04 18:54 ` Guenter Roeck
  2022-06-06  1:08 ` Samuel Zou
  37 siblings, 0 replies; 68+ messages in thread
From: Guenter Roeck @ 2022-06-04 18:54 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, stable, torvalds, akpm, shuah, patches,
	lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee,
	slade

On Fri, Jun 03, 2022 at 07:42:56PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.4.197 release.
> There are 34 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000.
> Anything received after that time might be too late.
> 

Build results:
	total: 160 pass: 160 fail: 0
Qemu test results:
	total: 449 pass: 449 fail: 0

Tested-by: Guenter Roeck <linux@roeck-us.net>

Guenter

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 00/34] 5.4.197-rc1 review
  2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
                   ` (36 preceding siblings ...)
  2022-06-04 18:54 ` Guenter Roeck
@ 2022-06-06  1:08 ` Samuel Zou
  37 siblings, 0 replies; 68+ messages in thread
From: Samuel Zou @ 2022-06-06  1:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel
  Cc: stable, torvalds, akpm, linux, shuah, patches, lkft-triage,
	pavel, jonathanh, f.fainelli, sudipm.mukherjee, slade



On 2022/6/4 1:42, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.4.197 release.
> There are 34 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
> 	https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.197-rc1.gz
> or in the git tree and branch at:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Tested on arm64 and x86 for 5.4.197-rc1,

Kernel repo:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Branch: linux-5.4.y
Version: 5.4.197-rc1
Commit: 2b69e7392fd9509c34f22e22898d4fd8de4bac19
Compiler: gcc version 7.3.0 (GCC)

arm64:
--------------------------------------------------------------------
Testcase Result Summary:
total: 9030
passed: 9030
failed: 0
timeout: 0
--------------------------------------------------------------------

x86:
--------------------------------------------------------------------
Testcase Result Summary:
total: 9030
passed: 9030
failed: 0
timeout: 0
--------------------------------------------------------------------

Tested-by: Hulk Robot <hulkrobot@huawei.com>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-03 17:43 ` [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag Greg Kroah-Hartman
@ 2022-06-10  4:22   ` Oleksandr Tymoshenko
  2022-06-10  5:15     ` Greg KH
  0 siblings, 1 reply; 68+ messages in thread
From: Oleksandr Tymoshenko @ 2022-06-10  4:22 UTC (permalink / raw)
  To: gregkh; +Cc: keescook, sarthakkukreti, snitzer, stable, regressions

I believe this commit introduced a regression in dm verity on systems
where data device is an NVME one. Loading table fails with the
following diagnostics:

device-mapper: table: table load rejected: including non-request-stackable devices

The same kernel works with the same data drive on the SCSI interface.
NVME-backed dm verity works with just this commit reverted.

I believe the presence of the immutable partition is used as an indicator
of special case NVME configuration and if the data device's name starts
with "nvme" the code tries to switch the target type to
DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).

The special NVME optimization case was removed in
5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
affected.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-10  4:22   ` Oleksandr Tymoshenko
@ 2022-06-10  5:15     ` Greg KH
  2022-06-10  8:10       ` Oleksandr Tymoshenko
  2022-06-10 15:11         ` [dm-devel] " Mike Snitzer
  0 siblings, 2 replies; 68+ messages in thread
From: Greg KH @ 2022-06-10  5:15 UTC (permalink / raw)
  To: Oleksandr Tymoshenko
  Cc: keescook, sarthakkukreti, snitzer, stable, regressions

On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> I believe this commit introduced a regression in dm verity on systems
> where data device is an NVME one. Loading table fails with the
> following diagnostics:
> 
> device-mapper: table: table load rejected: including non-request-stackable devices
> 
> The same kernel works with the same data drive on the SCSI interface.
> NVME-backed dm verity works with just this commit reverted.
> 
> I believe the presence of the immutable partition is used as an indicator
> of special case NVME configuration and if the data device's name starts
> with "nvme" the code tries to switch the target type to
> DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> 
> The special NVME optimization case was removed in
> 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> affected.
> 

Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
immutable singleton target on NVMe") to those older kernels?  If so,
have you tested this and verified that it worked?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-10  5:15     ` Greg KH
@ 2022-06-10  8:10       ` Oleksandr Tymoshenko
  2022-06-10 15:11         ` [dm-devel] " Mike Snitzer
  1 sibling, 0 replies; 68+ messages in thread
From: Oleksandr Tymoshenko @ 2022-06-10  8:10 UTC (permalink / raw)
  To: Greg KH; +Cc: keescook, sarthakkukreti, snitzer, stable, regressions

On Thu, Jun 9, 2022 at 10:15 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > I believe this commit introduced a regression in dm verity on systems
> > where data device is an NVME one. Loading table fails with the
> > following diagnostics:
> >
> > device-mapper: table: table load rejected: including non-request-stackable devices
> >
> > The same kernel works with the same data drive on the SCSI interface.
> > NVME-backed dm verity works with just this commit reverted.
> >
> > I believe the presence of the immutable partition is used as an indicator
> > of special case NVME configuration and if the data device's name starts
> > with "nvme" the code tries to switch the target type to
> > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> >
> > The special NVME optimization case was removed in
> > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > affected.
> >
>
> Why wouldn't 4.9, 4.14, and 4.19 also be affected here?

Just a bad choice of words on my side: we use only 5.x branches and
it slipped my mind to verify all actively supported branches. 4.19 is likely
to be affected, it has the same code for the NVME optimization as 5.4.
4.9 and 4.14 doesn't  this code so probably not affected.

> Should I also
> just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> immutable singleton target on NVMe") to those older kernels?  If so,
> have you tested this and verified that it worked?

I don't have enough expertise in this domain to recommend a solution, that's
why I reported the problem instead of sending a patch. I did take a quick look
though: it doesn't apply cleanly and it seems that the 9c37de297f65 was removed
as a result of some other refactoring, so I think it's more complex
than backporting
a single commit.

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-10  5:15     ` Greg KH
@ 2022-06-10 15:11         ` Mike Snitzer
  2022-06-10 15:11         ` [dm-devel] " Mike Snitzer
  1 sibling, 0 replies; 68+ messages in thread
From: Mike Snitzer @ 2022-06-10 15:11 UTC (permalink / raw)
  To: Greg KH
  Cc: Oleksandr Tymoshenko, keescook, sarthakkukreti, stable,
	regressions, dm-devel

On Fri, Jun 10 2022 at  1:15P -0400,
Greg KH <gregkh@linuxfoundation.org> wrote:

> On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > I believe this commit introduced a regression in dm verity on systems
> > where data device is an NVME one. Loading table fails with the
> > following diagnostics:
> > 
> > device-mapper: table: table load rejected: including non-request-stackable devices
> > 
> > The same kernel works with the same data drive on the SCSI interface.
> > NVME-backed dm verity works with just this commit reverted.
> > 
> > I believe the presence of the immutable partition is used as an indicator
> > of special case NVME configuration and if the data device's name starts
> > with "nvme" the code tries to switch the target type to
> > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > 
> > The special NVME optimization case was removed in
> > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > affected.
> > 
> 
> Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> immutable singleton target on NVMe") to those older kernels?  If so,
> have you tested this and verified that it worked?

Sorry for the unforeseen stable@ troubles here!

In general we'd be fine to apply commit 9c37de297f65 but to do it
properly would require also making sure commits that remove
"DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
unnecessary NVMe branching in favor of scsi_dh checks") are applied --
basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
be removed.

The commit header for 8d47e65948dd documents what
DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
"nvme" mode really never got used by any userspace that I'm aware of.

Sadly I currently don't have the time to do this backport for all N
stable kernels... :(

But if that backport gets out of control: A simpler, albeit stable@
unicorn, way to resolve this is to simply revert 9c37de297f65 and make
it so that DM-mpath and DM core just used bio-based if "nvme" is
requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:

@@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)

                        if (!strcasecmp(queue_mode_name, "bio"))
                                m->queue_mode = DM_TYPE_BIO_BASED;
			else if (!strcasecmp(queue_mode_name, "nvme"))
-                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
+                               m->queue_mode = DM_TYPE_BIO_BASED;
                        else if (!strcasecmp(queue_mode_name, "rq"))
                                m->queue_mode = DM_TYPE_REQUEST_BASED;
                        else if (!strcasecmp(queue_mode_name, "mq"))

Mike

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [dm-devel] [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
@ 2022-06-10 15:11         ` Mike Snitzer
  0 siblings, 0 replies; 68+ messages in thread
From: Mike Snitzer @ 2022-06-10 15:11 UTC (permalink / raw)
  To: Greg KH
  Cc: keescook, sarthakkukreti, stable, Oleksandr Tymoshenko, dm-devel,
	regressions

On Fri, Jun 10 2022 at  1:15P -0400,
Greg KH <gregkh@linuxfoundation.org> wrote:

> On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > I believe this commit introduced a regression in dm verity on systems
> > where data device is an NVME one. Loading table fails with the
> > following diagnostics:
> > 
> > device-mapper: table: table load rejected: including non-request-stackable devices
> > 
> > The same kernel works with the same data drive on the SCSI interface.
> > NVME-backed dm verity works with just this commit reverted.
> > 
> > I believe the presence of the immutable partition is used as an indicator
> > of special case NVME configuration and if the data device's name starts
> > with "nvme" the code tries to switch the target type to
> > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > 
> > The special NVME optimization case was removed in
> > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > affected.
> > 
> 
> Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> immutable singleton target on NVMe") to those older kernels?  If so,
> have you tested this and verified that it worked?

Sorry for the unforeseen stable@ troubles here!

In general we'd be fine to apply commit 9c37de297f65 but to do it
properly would require also making sure commits that remove
"DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
unnecessary NVMe branching in favor of scsi_dh checks") are applied --
basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
be removed.

The commit header for 8d47e65948dd documents what
DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
"nvme" mode really never got used by any userspace that I'm aware of.

Sadly I currently don't have the time to do this backport for all N
stable kernels... :(

But if that backport gets out of control: A simpler, albeit stable@
unicorn, way to resolve this is to simply revert 9c37de297f65 and make
it so that DM-mpath and DM core just used bio-based if "nvme" is
requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:

@@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)

                        if (!strcasecmp(queue_mode_name, "bio"))
                                m->queue_mode = DM_TYPE_BIO_BASED;
			else if (!strcasecmp(queue_mode_name, "nvme"))
-                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
+                               m->queue_mode = DM_TYPE_BIO_BASED;
                        else if (!strcasecmp(queue_mode_name, "rq"))
                                m->queue_mode = DM_TYPE_REQUEST_BASED;
                        else if (!strcasecmp(queue_mode_name, "mq"))

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-10 15:11         ` [dm-devel] " Mike Snitzer
@ 2022-06-13  9:13           ` Greg KH
  -1 siblings, 0 replies; 68+ messages in thread
From: Greg KH @ 2022-06-13  9:13 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Oleksandr Tymoshenko, keescook, sarthakkukreti, stable,
	regressions, dm-devel

On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
> On Fri, Jun 10 2022 at  1:15P -0400,
> Greg KH <gregkh@linuxfoundation.org> wrote:
> 
> > On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > > I believe this commit introduced a regression in dm verity on systems
> > > where data device is an NVME one. Loading table fails with the
> > > following diagnostics:
> > > 
> > > device-mapper: table: table load rejected: including non-request-stackable devices
> > > 
> > > The same kernel works with the same data drive on the SCSI interface.
> > > NVME-backed dm verity works with just this commit reverted.
> > > 
> > > I believe the presence of the immutable partition is used as an indicator
> > > of special case NVME configuration and if the data device's name starts
> > > with "nvme" the code tries to switch the target type to
> > > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > > 
> > > The special NVME optimization case was removed in
> > > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > > affected.
> > > 
> > 
> > Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> > just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> > immutable singleton target on NVMe") to those older kernels?  If so,
> > have you tested this and verified that it worked?
> 
> Sorry for the unforeseen stable@ troubles here!
> 
> In general we'd be fine to apply commit 9c37de297f65 but to do it
> properly would require also making sure commits that remove
> "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
> unnecessary NVMe branching in favor of scsi_dh checks") are applied --
> basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
> be removed.
> 
> The commit header for 8d47e65948dd documents what
> DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
> "nvme" mode really never got used by any userspace that I'm aware of.
> 
> Sadly I currently don't have the time to do this backport for all N
> stable kernels... :(
> 
> But if that backport gets out of control: A simpler, albeit stable@
> unicorn, way to resolve this is to simply revert 9c37de297f65 and make
> it so that DM-mpath and DM core just used bio-based if "nvme" is
> requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
> 
> @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
> 
>                         if (!strcasecmp(queue_mode_name, "bio"))
>                                 m->queue_mode = DM_TYPE_BIO_BASED;
> 			else if (!strcasecmp(queue_mode_name, "nvme"))
> -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
> +                               m->queue_mode = DM_TYPE_BIO_BASED;
>                         else if (!strcasecmp(queue_mode_name, "rq"))
>                                 m->queue_mode = DM_TYPE_REQUEST_BASED;
>                         else if (!strcasecmp(queue_mode_name, "mq"))
> 
> Mike
> 

Ok, please submit a working patch for the kernels that need it so that
we can review and apply it to solve this regression.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [dm-devel] [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
@ 2022-06-13  9:13           ` Greg KH
  0 siblings, 0 replies; 68+ messages in thread
From: Greg KH @ 2022-06-13  9:13 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: keescook, sarthakkukreti, stable, Oleksandr Tymoshenko, dm-devel,
	regressions

On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
> On Fri, Jun 10 2022 at  1:15P -0400,
> Greg KH <gregkh@linuxfoundation.org> wrote:
> 
> > On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > > I believe this commit introduced a regression in dm verity on systems
> > > where data device is an NVME one. Loading table fails with the
> > > following diagnostics:
> > > 
> > > device-mapper: table: table load rejected: including non-request-stackable devices
> > > 
> > > The same kernel works with the same data drive on the SCSI interface.
> > > NVME-backed dm verity works with just this commit reverted.
> > > 
> > > I believe the presence of the immutable partition is used as an indicator
> > > of special case NVME configuration and if the data device's name starts
> > > with "nvme" the code tries to switch the target type to
> > > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > > 
> > > The special NVME optimization case was removed in
> > > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > > affected.
> > > 
> > 
> > Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> > just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> > immutable singleton target on NVMe") to those older kernels?  If so,
> > have you tested this and verified that it worked?
> 
> Sorry for the unforeseen stable@ troubles here!
> 
> In general we'd be fine to apply commit 9c37de297f65 but to do it
> properly would require also making sure commits that remove
> "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
> unnecessary NVMe branching in favor of scsi_dh checks") are applied --
> basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
> be removed.
> 
> The commit header for 8d47e65948dd documents what
> DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
> "nvme" mode really never got used by any userspace that I'm aware of.
> 
> Sadly I currently don't have the time to do this backport for all N
> stable kernels... :(
> 
> But if that backport gets out of control: A simpler, albeit stable@
> unicorn, way to resolve this is to simply revert 9c37de297f65 and make
> it so that DM-mpath and DM core just used bio-based if "nvme" is
> requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
> 
> @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
> 
>                         if (!strcasecmp(queue_mode_name, "bio"))
>                                 m->queue_mode = DM_TYPE_BIO_BASED;
> 			else if (!strcasecmp(queue_mode_name, "nvme"))
> -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
> +                               m->queue_mode = DM_TYPE_BIO_BASED;
>                         else if (!strcasecmp(queue_mode_name, "rq"))
>                                 m->queue_mode = DM_TYPE_REQUEST_BASED;
>                         else if (!strcasecmp(queue_mode_name, "mq"))
> 
> Mike
> 

Ok, please submit a working patch for the kernels that need it so that
we can review and apply it to solve this regression.

thanks,

greg k-h

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-13  9:13           ` [dm-devel] " Greg KH
@ 2022-06-15 14:36             ` Guenter Roeck
  -1 siblings, 0 replies; 68+ messages in thread
From: Guenter Roeck @ 2022-06-15 14:36 UTC (permalink / raw)
  To: Greg KH
  Cc: Mike Snitzer, Oleksandr Tymoshenko, keescook, sarthakkukreti,
	stable, regressions, dm-devel

On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
> On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
> > On Fri, Jun 10 2022 at  1:15P -0400,
> > Greg KH <gregkh@linuxfoundation.org> wrote:
> > 
> > > On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > > > I believe this commit introduced a regression in dm verity on systems
> > > > where data device is an NVME one. Loading table fails with the
> > > > following diagnostics:
> > > > 
> > > > device-mapper: table: table load rejected: including non-request-stackable devices
> > > > 
> > > > The same kernel works with the same data drive on the SCSI interface.
> > > > NVME-backed dm verity works with just this commit reverted.
> > > > 
> > > > I believe the presence of the immutable partition is used as an indicator
> > > > of special case NVME configuration and if the data device's name starts
> > > > with "nvme" the code tries to switch the target type to
> > > > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > > > 
> > > > The special NVME optimization case was removed in
> > > > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > > > affected.
> > > > 
> > > 
> > > Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> > > just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> > > immutable singleton target on NVMe") to those older kernels?  If so,
> > > have you tested this and verified that it worked?
> > 
> > Sorry for the unforeseen stable@ troubles here!
> > 
> > In general we'd be fine to apply commit 9c37de297f65 but to do it
> > properly would require also making sure commits that remove
> > "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
> > unnecessary NVMe branching in favor of scsi_dh checks") are applied --
> > basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
> > be removed.
> > 
> > The commit header for 8d47e65948dd documents what
> > DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
> > "nvme" mode really never got used by any userspace that I'm aware of.
> > 
> > Sadly I currently don't have the time to do this backport for all N
> > stable kernels... :(
> > 
> > But if that backport gets out of control: A simpler, albeit stable@
> > unicorn, way to resolve this is to simply revert 9c37de297f65 and make

9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
and trying to apply it results in conflicts which at least I can not
resolve.

> > it so that DM-mpath and DM core just used bio-based if "nvme" is
> > requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
> > 
> > @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
> > 
> >                         if (!strcasecmp(queue_mode_name, "bio"))
> >                                 m->queue_mode = DM_TYPE_BIO_BASED;
> > 			else if (!strcasecmp(queue_mode_name, "nvme"))
> > -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
> > +                               m->queue_mode = DM_TYPE_BIO_BASED;
> >                         else if (!strcasecmp(queue_mode_name, "rq"))
> >                                 m->queue_mode = DM_TYPE_REQUEST_BASED;
> >                         else if (!strcasecmp(queue_mode_name, "mq"))
> > 
> > Mike
> > 
> 
> Ok, please submit a working patch for the kernels that need it so that
> we can review and apply it to solve this regression.
> 

So, effectively, v5.4.y and older are broken right now for use cases
with dm on NVME drives.

Given that the regression does affect older branches, and given that we
have to revert this patch to avoid regressions in ChromeOS, would it be
possible to revert it from v5.4.y and older until a fix is found ?

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [dm-devel] [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
@ 2022-06-15 14:36             ` Guenter Roeck
  0 siblings, 0 replies; 68+ messages in thread
From: Guenter Roeck @ 2022-06-15 14:36 UTC (permalink / raw)
  To: Greg KH
  Cc: keescook, sarthakkukreti, Mike Snitzer, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
> On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
> > On Fri, Jun 10 2022 at  1:15P -0400,
> > Greg KH <gregkh@linuxfoundation.org> wrote:
> > 
> > > On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > > > I believe this commit introduced a regression in dm verity on systems
> > > > where data device is an NVME one. Loading table fails with the
> > > > following diagnostics:
> > > > 
> > > > device-mapper: table: table load rejected: including non-request-stackable devices
> > > > 
> > > > The same kernel works with the same data drive on the SCSI interface.
> > > > NVME-backed dm verity works with just this commit reverted.
> > > > 
> > > > I believe the presence of the immutable partition is used as an indicator
> > > > of special case NVME configuration and if the data device's name starts
> > > > with "nvme" the code tries to switch the target type to
> > > > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > > > 
> > > > The special NVME optimization case was removed in
> > > > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > > > affected.
> > > > 
> > > 
> > > Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> > > just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> > > immutable singleton target on NVMe") to those older kernels?  If so,
> > > have you tested this and verified that it worked?
> > 
> > Sorry for the unforeseen stable@ troubles here!
> > 
> > In general we'd be fine to apply commit 9c37de297f65 but to do it
> > properly would require also making sure commits that remove
> > "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
> > unnecessary NVMe branching in favor of scsi_dh checks") are applied --
> > basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
> > be removed.
> > 
> > The commit header for 8d47e65948dd documents what
> > DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
> > "nvme" mode really never got used by any userspace that I'm aware of.
> > 
> > Sadly I currently don't have the time to do this backport for all N
> > stable kernels... :(
> > 
> > But if that backport gets out of control: A simpler, albeit stable@
> > unicorn, way to resolve this is to simply revert 9c37de297f65 and make

9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
and trying to apply it results in conflicts which at least I can not
resolve.

> > it so that DM-mpath and DM core just used bio-based if "nvme" is
> > requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
> > 
> > @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
> > 
> >                         if (!strcasecmp(queue_mode_name, "bio"))
> >                                 m->queue_mode = DM_TYPE_BIO_BASED;
> > 			else if (!strcasecmp(queue_mode_name, "nvme"))
> > -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
> > +                               m->queue_mode = DM_TYPE_BIO_BASED;
> >                         else if (!strcasecmp(queue_mode_name, "rq"))
> >                                 m->queue_mode = DM_TYPE_REQUEST_BASED;
> >                         else if (!strcasecmp(queue_mode_name, "mq"))
> > 
> > Mike
> > 
> 
> Ok, please submit a working patch for the kernels that need it so that
> we can review and apply it to solve this regression.
> 

So, effectively, v5.4.y and older are broken right now for use cases
with dm on NVME drives.

Given that the regression does affect older branches, and given that we
have to revert this patch to avoid regressions in ChromeOS, would it be
possible to revert it from v5.4.y and older until a fix is found ?

Thanks,
Guenter

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-15 14:36             ` [dm-devel] " Guenter Roeck
@ 2022-06-15 15:29               ` Mike Snitzer
  -1 siblings, 0 replies; 68+ messages in thread
From: Mike Snitzer @ 2022-06-15 15:29 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Greg KH, keescook, sarthakkukreti, Mike Snitzer, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

On Wed, Jun 15 2022 at 10:36P -0400,
Guenter Roeck <linux@roeck-us.net> wrote:

> On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
> > On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
> > > On Fri, Jun 10 2022 at  1:15P -0400,
> > > Greg KH <gregkh@linuxfoundation.org> wrote:
> > > 
> > > > On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > > > > I believe this commit introduced a regression in dm verity on systems
> > > > > where data device is an NVME one. Loading table fails with the
> > > > > following diagnostics:
> > > > > 
> > > > > device-mapper: table: table load rejected: including non-request-stackable devices
> > > > > 
> > > > > The same kernel works with the same data drive on the SCSI interface.
> > > > > NVME-backed dm verity works with just this commit reverted.
> > > > > 
> > > > > I believe the presence of the immutable partition is used as an indicator
> > > > > of special case NVME configuration and if the data device's name starts
> > > > > with "nvme" the code tries to switch the target type to
> > > > > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > > > > 
> > > > > The special NVME optimization case was removed in
> > > > > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > > > > affected.
> > > > > 
> > > > 
> > > > Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> > > > just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> > > > immutable singleton target on NVMe") to those older kernels?  If so,
> > > > have you tested this and verified that it worked?
> > > 
> > > Sorry for the unforeseen stable@ troubles here!
> > > 
> > > In general we'd be fine to apply commit 9c37de297f65 but to do it
> > > properly would require also making sure commits that remove
> > > "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
> > > unnecessary NVMe branching in favor of scsi_dh checks") are applied --
> > > basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
> > > be removed.
> > > 
> > > The commit header for 8d47e65948dd documents what
> > > DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
> > > "nvme" mode really never got used by any userspace that I'm aware of.
> > > 
> > > Sadly I currently don't have the time to do this backport for all N
> > > stable kernels... :(
> > > 
> > > But if that backport gets out of control: A simpler, albeit stable@
> > > unicorn, way to resolve this is to simply revert 9c37de297f65 and make
> 
> 9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
> and trying to apply it results in conflicts which at least I can not
> resolve.
> 
> > > it so that DM-mpath and DM core just used bio-based if "nvme" is
> > > requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
> > > 
> > > @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
> > > 
> > >                         if (!strcasecmp(queue_mode_name, "bio"))
> > >                                 m->queue_mode = DM_TYPE_BIO_BASED;
> > > 			else if (!strcasecmp(queue_mode_name, "nvme"))
> > > -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
> > > +                               m->queue_mode = DM_TYPE_BIO_BASED;
> > >                         else if (!strcasecmp(queue_mode_name, "rq"))
> > >                                 m->queue_mode = DM_TYPE_REQUEST_BASED;
> > >                         else if (!strcasecmp(queue_mode_name, "mq"))
> > > 
> > > Mike
> > > 
> > 
> > Ok, please submit a working patch for the kernels that need it so that
> > we can review and apply it to solve this regression.
> > 
> 
> So, effectively, v5.4.y and older are broken right now for use cases
> with dm on NVME drives.
> 
> Given that the regression does affect older branches, and given that we
> have to revert this patch to avoid regressions in ChromeOS, would it be
> possible to revert it from v5.4.y and older until a fix is found ?

I obviously would prefer to not have this false-start.

I'll look at latest 5.4.y _now_ and see what can be done.

Should hopefully be pretty straight-forward.

Mike

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [dm-devel] [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
@ 2022-06-15 15:29               ` Mike Snitzer
  0 siblings, 0 replies; 68+ messages in thread
From: Mike Snitzer @ 2022-06-15 15:29 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: keescook, sarthakkukreti, Greg KH, Mike Snitzer, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

On Wed, Jun 15 2022 at 10:36P -0400,
Guenter Roeck <linux@roeck-us.net> wrote:

> On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
> > On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
> > > On Fri, Jun 10 2022 at  1:15P -0400,
> > > Greg KH <gregkh@linuxfoundation.org> wrote:
> > > 
> > > > On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > > > > I believe this commit introduced a regression in dm verity on systems
> > > > > where data device is an NVME one. Loading table fails with the
> > > > > following diagnostics:
> > > > > 
> > > > > device-mapper: table: table load rejected: including non-request-stackable devices
> > > > > 
> > > > > The same kernel works with the same data drive on the SCSI interface.
> > > > > NVME-backed dm verity works with just this commit reverted.
> > > > > 
> > > > > I believe the presence of the immutable partition is used as an indicator
> > > > > of special case NVME configuration and if the data device's name starts
> > > > > with "nvme" the code tries to switch the target type to
> > > > > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > > > > 
> > > > > The special NVME optimization case was removed in
> > > > > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > > > > affected.
> > > > > 
> > > > 
> > > > Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> > > > just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> > > > immutable singleton target on NVMe") to those older kernels?  If so,
> > > > have you tested this and verified that it worked?
> > > 
> > > Sorry for the unforeseen stable@ troubles here!
> > > 
> > > In general we'd be fine to apply commit 9c37de297f65 but to do it
> > > properly would require also making sure commits that remove
> > > "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
> > > unnecessary NVMe branching in favor of scsi_dh checks") are applied --
> > > basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
> > > be removed.
> > > 
> > > The commit header for 8d47e65948dd documents what
> > > DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
> > > "nvme" mode really never got used by any userspace that I'm aware of.
> > > 
> > > Sadly I currently don't have the time to do this backport for all N
> > > stable kernels... :(
> > > 
> > > But if that backport gets out of control: A simpler, albeit stable@
> > > unicorn, way to resolve this is to simply revert 9c37de297f65 and make
> 
> 9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
> and trying to apply it results in conflicts which at least I can not
> resolve.
> 
> > > it so that DM-mpath and DM core just used bio-based if "nvme" is
> > > requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
> > > 
> > > @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
> > > 
> > >                         if (!strcasecmp(queue_mode_name, "bio"))
> > >                                 m->queue_mode = DM_TYPE_BIO_BASED;
> > > 			else if (!strcasecmp(queue_mode_name, "nvme"))
> > > -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
> > > +                               m->queue_mode = DM_TYPE_BIO_BASED;
> > >                         else if (!strcasecmp(queue_mode_name, "rq"))
> > >                                 m->queue_mode = DM_TYPE_REQUEST_BASED;
> > >                         else if (!strcasecmp(queue_mode_name, "mq"))
> > > 
> > > Mike
> > > 
> > 
> > Ok, please submit a working patch for the kernels that need it so that
> > we can review and apply it to solve this regression.
> > 
> 
> So, effectively, v5.4.y and older are broken right now for use cases
> with dm on NVME drives.
> 
> Given that the regression does affect older branches, and given that we
> have to revert this patch to avoid regressions in ChromeOS, would it be
> possible to revert it from v5.4.y and older until a fix is found ?

I obviously would prefer to not have this false-start.

I'll look at latest 5.4.y _now_ and see what can be done.

Should hopefully be pretty straight-forward.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-15 15:29               ` [dm-devel] " Mike Snitzer
@ 2022-06-15 17:50                 ` Guenter Roeck
  -1 siblings, 0 replies; 68+ messages in thread
From: Guenter Roeck @ 2022-06-15 17:50 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Greg KH, keescook, sarthakkukreti, stable, Oleksandr Tymoshenko,
	dm-devel, regressions

On 6/15/22 08:29, Mike Snitzer wrote:
> On Wed, Jun 15 2022 at 10:36P -0400,
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
>> On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
>>> On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
>>>> On Fri, Jun 10 2022 at  1:15P -0400,
>>>> Greg KH <gregkh@linuxfoundation.org> wrote:
>>>>
>>>>> On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
>>>>>> I believe this commit introduced a regression in dm verity on systems
>>>>>> where data device is an NVME one. Loading table fails with the
>>>>>> following diagnostics:
>>>>>>
>>>>>> device-mapper: table: table load rejected: including non-request-stackable devices
>>>>>>
>>>>>> The same kernel works with the same data drive on the SCSI interface.
>>>>>> NVME-backed dm verity works with just this commit reverted.
>>>>>>
>>>>>> I believe the presence of the immutable partition is used as an indicator
>>>>>> of special case NVME configuration and if the data device's name starts
>>>>>> with "nvme" the code tries to switch the target type to
>>>>>> DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
>>>>>>
>>>>>> The special NVME optimization case was removed in
>>>>>> 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
>>>>>> affected.
>>>>>>
>>>>>
>>>>> Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
>>>>> just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
>>>>> immutable singleton target on NVMe") to those older kernels?  If so,
>>>>> have you tested this and verified that it worked?
>>>>
>>>> Sorry for the unforeseen stable@ troubles here!
>>>>
>>>> In general we'd be fine to apply commit 9c37de297f65 but to do it
>>>> properly would require also making sure commits that remove
>>>> "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
>>>> unnecessary NVMe branching in favor of scsi_dh checks") are applied --
>>>> basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
>>>> be removed.
>>>>
>>>> The commit header for 8d47e65948dd documents what
>>>> DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
>>>> "nvme" mode really never got used by any userspace that I'm aware of.
>>>>
>>>> Sadly I currently don't have the time to do this backport for all N
>>>> stable kernels... :(
>>>>
>>>> But if that backport gets out of control: A simpler, albeit stable@
>>>> unicorn, way to resolve this is to simply revert 9c37de297f65 and make
>>
>> 9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
>> and trying to apply it results in conflicts which at least I can not
>> resolve.
>>
>>>> it so that DM-mpath and DM core just used bio-based if "nvme" is
>>>> requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
>>>>
>>>> @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
>>>>
>>>>                          if (!strcasecmp(queue_mode_name, "bio"))
>>>>                                  m->queue_mode = DM_TYPE_BIO_BASED;
>>>> 			else if (!strcasecmp(queue_mode_name, "nvme"))
>>>> -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
>>>> +                               m->queue_mode = DM_TYPE_BIO_BASED;
>>>>                          else if (!strcasecmp(queue_mode_name, "rq"))
>>>>                                  m->queue_mode = DM_TYPE_REQUEST_BASED;
>>>>                          else if (!strcasecmp(queue_mode_name, "mq"))
>>>>
>>>> Mike
>>>>
>>>
>>> Ok, please submit a working patch for the kernels that need it so that
>>> we can review and apply it to solve this regression.
>>>
>>
>> So, effectively, v5.4.y and older are broken right now for use cases
>> with dm on NVME drives.
>>
>> Given that the regression does affect older branches, and given that we
>> have to revert this patch to avoid regressions in ChromeOS, would it be
>> possible to revert it from v5.4.y and older until a fix is found ?
> 
> I obviously would prefer to not have this false-start.
> 
The false start has already happened since we had to revert the patch
from chromeos-5.4 and older branches.

Guenter

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [dm-devel] [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
@ 2022-06-15 17:50                 ` Guenter Roeck
  0 siblings, 0 replies; 68+ messages in thread
From: Guenter Roeck @ 2022-06-15 17:50 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: keescook, sarthakkukreti, Greg KH, stable, Oleksandr Tymoshenko,
	dm-devel, regressions

On 6/15/22 08:29, Mike Snitzer wrote:
> On Wed, Jun 15 2022 at 10:36P -0400,
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
>> On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
>>> On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
>>>> On Fri, Jun 10 2022 at  1:15P -0400,
>>>> Greg KH <gregkh@linuxfoundation.org> wrote:
>>>>
>>>>> On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
>>>>>> I believe this commit introduced a regression in dm verity on systems
>>>>>> where data device is an NVME one. Loading table fails with the
>>>>>> following diagnostics:
>>>>>>
>>>>>> device-mapper: table: table load rejected: including non-request-stackable devices
>>>>>>
>>>>>> The same kernel works with the same data drive on the SCSI interface.
>>>>>> NVME-backed dm verity works with just this commit reverted.
>>>>>>
>>>>>> I believe the presence of the immutable partition is used as an indicator
>>>>>> of special case NVME configuration and if the data device's name starts
>>>>>> with "nvme" the code tries to switch the target type to
>>>>>> DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
>>>>>>
>>>>>> The special NVME optimization case was removed in
>>>>>> 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
>>>>>> affected.
>>>>>>
>>>>>
>>>>> Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
>>>>> just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
>>>>> immutable singleton target on NVMe") to those older kernels?  If so,
>>>>> have you tested this and verified that it worked?
>>>>
>>>> Sorry for the unforeseen stable@ troubles here!
>>>>
>>>> In general we'd be fine to apply commit 9c37de297f65 but to do it
>>>> properly would require also making sure commits that remove
>>>> "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
>>>> unnecessary NVMe branching in favor of scsi_dh checks") are applied --
>>>> basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
>>>> be removed.
>>>>
>>>> The commit header for 8d47e65948dd documents what
>>>> DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
>>>> "nvme" mode really never got used by any userspace that I'm aware of.
>>>>
>>>> Sadly I currently don't have the time to do this backport for all N
>>>> stable kernels... :(
>>>>
>>>> But if that backport gets out of control: A simpler, albeit stable@
>>>> unicorn, way to resolve this is to simply revert 9c37de297f65 and make
>>
>> 9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
>> and trying to apply it results in conflicts which at least I can not
>> resolve.
>>
>>>> it so that DM-mpath and DM core just used bio-based if "nvme" is
>>>> requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
>>>>
>>>> @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
>>>>
>>>>                          if (!strcasecmp(queue_mode_name, "bio"))
>>>>                                  m->queue_mode = DM_TYPE_BIO_BASED;
>>>> 			else if (!strcasecmp(queue_mode_name, "nvme"))
>>>> -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
>>>> +                               m->queue_mode = DM_TYPE_BIO_BASED;
>>>>                          else if (!strcasecmp(queue_mode_name, "rq"))
>>>>                                  m->queue_mode = DM_TYPE_REQUEST_BASED;
>>>>                          else if (!strcasecmp(queue_mode_name, "mq"))
>>>>
>>>> Mike
>>>>
>>>
>>> Ok, please submit a working patch for the kernels that need it so that
>>> we can review and apply it to solve this regression.
>>>
>>
>> So, effectively, v5.4.y and older are broken right now for use cases
>> with dm on NVME drives.
>>
>> Given that the regression does affect older branches, and given that we
>> have to revert this patch to avoid regressions in ChromeOS, would it be
>> possible to revert it from v5.4.y and older until a fix is found ?
> 
> I obviously would prefer to not have this false-start.
> 
The false start has already happened since we had to revert the patch
from chromeos-5.4 and older branches.

Guenter

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-15 17:50                 ` [dm-devel] " Guenter Roeck
@ 2022-06-15 20:02                   ` Mike Snitzer
  -1 siblings, 0 replies; 68+ messages in thread
From: Mike Snitzer @ 2022-06-15 20:02 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Mike Snitzer, keescook, sarthakkukreti, Greg KH, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

On Wed, Jun 15 2022 at  1:50P -0400,
Guenter Roeck <linux@roeck-us.net> wrote:

> On 6/15/22 08:29, Mike Snitzer wrote:
> > On Wed, Jun 15 2022 at 10:36P -0400,
> > Guenter Roeck <linux@roeck-us.net> wrote:
> > 
> > > On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
> > > > On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
> > > > > On Fri, Jun 10 2022 at  1:15P -0400,
> > > > > Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > 
> > > > > > On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > > > > > > I believe this commit introduced a regression in dm verity on systems
> > > > > > > where data device is an NVME one. Loading table fails with the
> > > > > > > following diagnostics:
> > > > > > > 
> > > > > > > device-mapper: table: table load rejected: including non-request-stackable devices
> > > > > > > 
> > > > > > > The same kernel works with the same data drive on the SCSI interface.
> > > > > > > NVME-backed dm verity works with just this commit reverted.
> > > > > > > 
> > > > > > > I believe the presence of the immutable partition is used as an indicator
> > > > > > > of special case NVME configuration and if the data device's name starts
> > > > > > > with "nvme" the code tries to switch the target type to
> > > > > > > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > > > > > > 
> > > > > > > The special NVME optimization case was removed in
> > > > > > > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > > > > > > affected.
> > > > > > > 
> > > > > > 
> > > > > > Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> > > > > > just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> > > > > > immutable singleton target on NVMe") to those older kernels?  If so,
> > > > > > have you tested this and verified that it worked?
> > > > > 
> > > > > Sorry for the unforeseen stable@ troubles here!
> > > > > 
> > > > > In general we'd be fine to apply commit 9c37de297f65 but to do it
> > > > > properly would require also making sure commits that remove
> > > > > "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
> > > > > unnecessary NVMe branching in favor of scsi_dh checks") are applied --
> > > > > basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
> > > > > be removed.
> > > > > 
> > > > > The commit header for 8d47e65948dd documents what
> > > > > DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
> > > > > "nvme" mode really never got used by any userspace that I'm aware of.
> > > > > 
> > > > > Sadly I currently don't have the time to do this backport for all N
> > > > > stable kernels... :(
> > > > > 
> > > > > But if that backport gets out of control: A simpler, albeit stable@
> > > > > unicorn, way to resolve this is to simply revert 9c37de297f65 and make
> > > 
> > > 9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
> > > and trying to apply it results in conflicts which at least I can not
> > > resolve.
> > > 
> > > > > it so that DM-mpath and DM core just used bio-based if "nvme" is
> > > > > requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
> > > > > 
> > > > > @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
> > > > > 
> > > > >                          if (!strcasecmp(queue_mode_name, "bio"))
> > > > >                                  m->queue_mode = DM_TYPE_BIO_BASED;
> > > > > 			else if (!strcasecmp(queue_mode_name, "nvme"))
> > > > > -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
> > > > > +                               m->queue_mode = DM_TYPE_BIO_BASED;
> > > > >                          else if (!strcasecmp(queue_mode_name, "rq"))
> > > > >                                  m->queue_mode = DM_TYPE_REQUEST_BASED;
> > > > >                          else if (!strcasecmp(queue_mode_name, "mq"))
> > > > > 
> > > > > Mike
> > > > > 
> > > > 
> > > > Ok, please submit a working patch for the kernels that need it so that
> > > > we can review and apply it to solve this regression.
> > > > 
> > > 
> > > So, effectively, v5.4.y and older are broken right now for use cases
> > > with dm on NVME drives.
> > > 
> > > Given that the regression does affect older branches, and given that we
> > > have to revert this patch to avoid regressions in ChromeOS, would it be
> > > possible to revert it from v5.4.y and older until a fix is found ?
> > 
> > I obviously would prefer to not have this false-start.
> > 
> The false start has already happened since we had to revert the patch
> from chromeos-5.4 and older branches.

OK, well this is pretty easy to fix in general.  If there are slight
differences across older trees they are easily resolved.  Fact that
stable@ couldn't cope with backporting 9c37de297f65 is.. what it is.

But this will fix the issue on 5.4.y:

From: Mike Snitzer <snitzer@kernel.org>
Date: Wed, 15 Jun 2022 14:07:09 -0400
Subject: [5.4.y PATCH] dm: remove special-casing of bio-based immutable singleton target on NVMe

Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.

There is no benefit to DM special-casing NVMe. Remove all code used to
establish DM_TYPE_NVME_BIO_BASED.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-table.c         | 32 ++----------------
 drivers/md/dm.c               | 64 +++--------------------------------
 include/linux/device-mapper.h |  1 -
 3 files changed, 7 insertions(+), 90 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 06b382304d92..81bc36a43b32 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args);
 static bool __table_type_bio_based(enum dm_queue_mode table_type)
 {
 	return (table_type == DM_TYPE_BIO_BASED ||
-		table_type == DM_TYPE_DAX_BIO_BASED ||
-		table_type == DM_TYPE_NVME_BIO_BASED);
+		table_type == DM_TYPE_DAX_BIO_BASED);
 }
 
 static bool __table_type_request_based(enum dm_queue_mode table_type)
@@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_table *t,
 	return true;
 }
 
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
-
 static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
 				  sector_t start, sector_t len, void *data)
 {
@@ -960,7 +957,6 @@ static int dm_table_determine_type(struct dm_table *t)
 			goto verify_bio_based;
 		}
 		BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
-		BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED);
 		goto verify_rq_based;
 	}
 
@@ -999,15 +995,6 @@ static int dm_table_determine_type(struct dm_table *t)
 		if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
 		    (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
 			t->type = DM_TYPE_DAX_BIO_BASED;
-		} else {
-			/* Check if upgrading to NVMe bio-based is valid or required */
-			tgt = dm_table_get_immutable_target(t);
-			if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-				goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
-			} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-			}
 		}
 		return 0;
 	}
@@ -1024,8 +1011,7 @@ static int dm_table_determine_type(struct dm_table *t)
 	 * (e.g. request completion process for partial completion.)
 	 */
 	if (t->num_targets > 1) {
-		DMERR("%s DM doesn't support multiple targets",
-		      t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
+		DMERR("request-based DM doesn't support multiple targets");
 		return -EINVAL;
 	}
 
@@ -1714,20 +1700,6 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
 	return q && !blk_queue_add_random(q);
 }
 
-static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev,
-					sector_t start, sector_t len, void *data)
-{
-	char b[BDEVNAME_SIZE];
-
-	/* For now, NVMe devices are the only devices of this class */
-	return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0);
-}
-
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
-{
-	return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL);
-}
-
 static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
 					 sector_t start, sector_t len, void *data)
 {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 37b8bb4d80f0..3c45c389ded9 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio)
 	struct mapped_device *md = tio->io->md;
 	dm_endio_fn endio = tio->ti->type->end_io;
 
-	if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) {
+	if (unlikely(error == BLK_STS_TARGET)) {
 		if (bio_op(bio) == REQ_OP_DISCARD &&
 		    !bio->bi_disk->queue->limits.max_discard_sectors)
 			disable_discard(md);
@@ -1340,10 +1340,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
 		/* the bio has been remapped so dispatch it */
 		trace_block_bio_remap(clone->bi_disk->queue, clone,
 				      bio_dev(io->orig_bio), sector);
-		if (md->type == DM_TYPE_NVME_BIO_BASED)
-			ret = direct_make_request(clone);
-		else
-			ret = generic_make_request(clone);
+		ret = generic_make_request(clone);
 		break;
 	case DM_MAPIO_KILL:
 		if (unlikely(swap_bios_limit(ti, clone))) {
@@ -1732,51 +1729,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
 	return ret;
 }
 
-/*
- * Optimized variant of __split_and_process_bio that leverages the
- * fact that targets that use it do _not_ have a need to split bios.
- */
-static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
-			      struct bio *bio, struct dm_target *ti)
-{
-	struct clone_info ci;
-	blk_qc_t ret = BLK_QC_T_NONE;
-	int error = 0;
-
-	init_clone_info(&ci, md, map, bio);
-
-	if (bio->bi_opf & REQ_PREFLUSH) {
-		struct bio flush_bio;
-
-		/*
-		 * Use an on-stack bio for this, it's safe since we don't
-		 * need to reference it after submit. It's just used as
-		 * the basis for the clone(s).
-		 */
-		bio_init(&flush_bio, NULL, 0);
-		flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
-		ci.bio = &flush_bio;
-		ci.sector_count = 0;
-		error = __send_empty_flush(&ci);
-		bio_uninit(ci.bio);
-		/* dec_pending submits any data associated with flush */
-	} else {
-		struct dm_target_io *tio;
-
-		ci.bio = bio;
-		ci.sector_count = bio_sectors(bio);
-		if (__process_abnormal_io(&ci, ti, &error))
-			goto out;
-
-		tio = alloc_tio(&ci, ti, 0, GFP_NOIO);
-		ret = __clone_and_map_simple_bio(&ci, tio, NULL);
-	}
-out:
-	/* drop the extra reference count */
-	dec_pending(ci.io, errno_to_blk_status(error));
-	return ret;
-}
-
 static blk_qc_t dm_process_bio(struct mapped_device *md,
 			       struct dm_table *map, struct bio *bio)
 {
@@ -1807,8 +1759,6 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
 		/* regular IO is split by __split_and_process_bio */
 	}
 
-	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
-		return __process_bio(md, map, bio, ti);
 	return __split_and_process_bio(md, map, bio);
 }
 
@@ -2200,12 +2150,10 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
 	if (request_based)
 		dm_stop_queue(q);
 
-	if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) {
+	if (request_based) {
 		/*
-		 * Leverage the fact that request-based DM targets and
-		 * NVMe bio based targets are immutable singletons
-		 * - used to optimize both dm_request_fn and dm_mq_queue_rq;
-		 *   and __process_bio.
+		 * Leverage the fact that request-based DM targets are
+		 * immutable singletons - used to optimize dm_mq_queue_rq.
 		 */
 		md->immutable_target = dm_table_get_immutable_target(t);
 	}
@@ -2334,7 +2282,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
 		break;
 	case DM_TYPE_BIO_BASED:
 	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
 		dm_init_congested_fn(md);
 		break;
 	case DM_TYPE_NONE:
@@ -3070,7 +3017,6 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu
 	switch (type) {
 	case DM_TYPE_BIO_BASED:
 	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
 		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
 		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
 		io_front_pad = roundup(front_pad,  __alignof__(struct dm_io)) + offsetof(struct dm_io, tio);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a53d7d2c2d95..60631f3abddb 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -28,7 +28,6 @@ enum dm_queue_mode {
 	DM_TYPE_BIO_BASED	 = 1,
 	DM_TYPE_REQUEST_BASED	 = 2,
 	DM_TYPE_DAX_BIO_BASED	 = 3,
-	DM_TYPE_NVME_BIO_BASED	 = 4,
 };
 
 typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [dm-devel] [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
@ 2022-06-15 20:02                   ` Mike Snitzer
  0 siblings, 0 replies; 68+ messages in thread
From: Mike Snitzer @ 2022-06-15 20:02 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: keescook, sarthakkukreti, Greg KH, Mike Snitzer, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

On Wed, Jun 15 2022 at  1:50P -0400,
Guenter Roeck <linux@roeck-us.net> wrote:

> On 6/15/22 08:29, Mike Snitzer wrote:
> > On Wed, Jun 15 2022 at 10:36P -0400,
> > Guenter Roeck <linux@roeck-us.net> wrote:
> > 
> > > On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
> > > > On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
> > > > > On Fri, Jun 10 2022 at  1:15P -0400,
> > > > > Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > 
> > > > > > On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > > > > > > I believe this commit introduced a regression in dm verity on systems
> > > > > > > where data device is an NVME one. Loading table fails with the
> > > > > > > following diagnostics:
> > > > > > > 
> > > > > > > device-mapper: table: table load rejected: including non-request-stackable devices
> > > > > > > 
> > > > > > > The same kernel works with the same data drive on the SCSI interface.
> > > > > > > NVME-backed dm verity works with just this commit reverted.
> > > > > > > 
> > > > > > > I believe the presence of the immutable partition is used as an indicator
> > > > > > > of special case NVME configuration and if the data device's name starts
> > > > > > > with "nvme" the code tries to switch the target type to
> > > > > > > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > > > > > > 
> > > > > > > The special NVME optimization case was removed in
> > > > > > > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > > > > > > affected.
> > > > > > > 
> > > > > > 
> > > > > > Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> > > > > > just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> > > > > > immutable singleton target on NVMe") to those older kernels?  If so,
> > > > > > have you tested this and verified that it worked?
> > > > > 
> > > > > Sorry for the unforeseen stable@ troubles here!
> > > > > 
> > > > > In general we'd be fine to apply commit 9c37de297f65 but to do it
> > > > > properly would require also making sure commits that remove
> > > > > "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
> > > > > unnecessary NVMe branching in favor of scsi_dh checks") are applied --
> > > > > basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
> > > > > be removed.
> > > > > 
> > > > > The commit header for 8d47e65948dd documents what
> > > > > DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
> > > > > "nvme" mode really never got used by any userspace that I'm aware of.
> > > > > 
> > > > > Sadly I currently don't have the time to do this backport for all N
> > > > > stable kernels... :(
> > > > > 
> > > > > But if that backport gets out of control: A simpler, albeit stable@
> > > > > unicorn, way to resolve this is to simply revert 9c37de297f65 and make
> > > 
> > > 9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
> > > and trying to apply it results in conflicts which at least I can not
> > > resolve.
> > > 
> > > > > it so that DM-mpath and DM core just used bio-based if "nvme" is
> > > > > requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
> > > > > 
> > > > > @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
> > > > > 
> > > > >                          if (!strcasecmp(queue_mode_name, "bio"))
> > > > >                                  m->queue_mode = DM_TYPE_BIO_BASED;
> > > > > 			else if (!strcasecmp(queue_mode_name, "nvme"))
> > > > > -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
> > > > > +                               m->queue_mode = DM_TYPE_BIO_BASED;
> > > > >                          else if (!strcasecmp(queue_mode_name, "rq"))
> > > > >                                  m->queue_mode = DM_TYPE_REQUEST_BASED;
> > > > >                          else if (!strcasecmp(queue_mode_name, "mq"))
> > > > > 
> > > > > Mike
> > > > > 
> > > > 
> > > > Ok, please submit a working patch for the kernels that need it so that
> > > > we can review and apply it to solve this regression.
> > > > 
> > > 
> > > So, effectively, v5.4.y and older are broken right now for use cases
> > > with dm on NVME drives.
> > > 
> > > Given that the regression does affect older branches, and given that we
> > > have to revert this patch to avoid regressions in ChromeOS, would it be
> > > possible to revert it from v5.4.y and older until a fix is found ?
> > 
> > I obviously would prefer to not have this false-start.
> > 
> The false start has already happened since we had to revert the patch
> from chromeos-5.4 and older branches.

OK, well this is pretty easy to fix in general.  If there are slight
differences across older trees they are easily resolved.  Fact that
stable@ couldn't cope with backporting 9c37de297f65 is.. what it is.

But this will fix the issue on 5.4.y:

From: Mike Snitzer <snitzer@kernel.org>
Date: Wed, 15 Jun 2022 14:07:09 -0400
Subject: [5.4.y PATCH] dm: remove special-casing of bio-based immutable singleton target on NVMe

Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.

There is no benefit to DM special-casing NVMe. Remove all code used to
establish DM_TYPE_NVME_BIO_BASED.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-table.c         | 32 ++----------------
 drivers/md/dm.c               | 64 +++--------------------------------
 include/linux/device-mapper.h |  1 -
 3 files changed, 7 insertions(+), 90 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 06b382304d92..81bc36a43b32 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args);
 static bool __table_type_bio_based(enum dm_queue_mode table_type)
 {
 	return (table_type == DM_TYPE_BIO_BASED ||
-		table_type == DM_TYPE_DAX_BIO_BASED ||
-		table_type == DM_TYPE_NVME_BIO_BASED);
+		table_type == DM_TYPE_DAX_BIO_BASED);
 }
 
 static bool __table_type_request_based(enum dm_queue_mode table_type)
@@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_table *t,
 	return true;
 }
 
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
-
 static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
 				  sector_t start, sector_t len, void *data)
 {
@@ -960,7 +957,6 @@ static int dm_table_determine_type(struct dm_table *t)
 			goto verify_bio_based;
 		}
 		BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
-		BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED);
 		goto verify_rq_based;
 	}
 
@@ -999,15 +995,6 @@ static int dm_table_determine_type(struct dm_table *t)
 		if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
 		    (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
 			t->type = DM_TYPE_DAX_BIO_BASED;
-		} else {
-			/* Check if upgrading to NVMe bio-based is valid or required */
-			tgt = dm_table_get_immutable_target(t);
-			if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-				goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
-			} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-			}
 		}
 		return 0;
 	}
@@ -1024,8 +1011,7 @@ static int dm_table_determine_type(struct dm_table *t)
 	 * (e.g. request completion process for partial completion.)
 	 */
 	if (t->num_targets > 1) {
-		DMERR("%s DM doesn't support multiple targets",
-		      t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
+		DMERR("request-based DM doesn't support multiple targets");
 		return -EINVAL;
 	}
 
@@ -1714,20 +1700,6 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
 	return q && !blk_queue_add_random(q);
 }
 
-static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev,
-					sector_t start, sector_t len, void *data)
-{
-	char b[BDEVNAME_SIZE];
-
-	/* For now, NVMe devices are the only devices of this class */
-	return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0);
-}
-
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
-{
-	return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL);
-}
-
 static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
 					 sector_t start, sector_t len, void *data)
 {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 37b8bb4d80f0..3c45c389ded9 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio)
 	struct mapped_device *md = tio->io->md;
 	dm_endio_fn endio = tio->ti->type->end_io;
 
-	if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) {
+	if (unlikely(error == BLK_STS_TARGET)) {
 		if (bio_op(bio) == REQ_OP_DISCARD &&
 		    !bio->bi_disk->queue->limits.max_discard_sectors)
 			disable_discard(md);
@@ -1340,10 +1340,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
 		/* the bio has been remapped so dispatch it */
 		trace_block_bio_remap(clone->bi_disk->queue, clone,
 				      bio_dev(io->orig_bio), sector);
-		if (md->type == DM_TYPE_NVME_BIO_BASED)
-			ret = direct_make_request(clone);
-		else
-			ret = generic_make_request(clone);
+		ret = generic_make_request(clone);
 		break;
 	case DM_MAPIO_KILL:
 		if (unlikely(swap_bios_limit(ti, clone))) {
@@ -1732,51 +1729,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
 	return ret;
 }
 
-/*
- * Optimized variant of __split_and_process_bio that leverages the
- * fact that targets that use it do _not_ have a need to split bios.
- */
-static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
-			      struct bio *bio, struct dm_target *ti)
-{
-	struct clone_info ci;
-	blk_qc_t ret = BLK_QC_T_NONE;
-	int error = 0;
-
-	init_clone_info(&ci, md, map, bio);
-
-	if (bio->bi_opf & REQ_PREFLUSH) {
-		struct bio flush_bio;
-
-		/*
-		 * Use an on-stack bio for this, it's safe since we don't
-		 * need to reference it after submit. It's just used as
-		 * the basis for the clone(s).
-		 */
-		bio_init(&flush_bio, NULL, 0);
-		flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
-		ci.bio = &flush_bio;
-		ci.sector_count = 0;
-		error = __send_empty_flush(&ci);
-		bio_uninit(ci.bio);
-		/* dec_pending submits any data associated with flush */
-	} else {
-		struct dm_target_io *tio;
-
-		ci.bio = bio;
-		ci.sector_count = bio_sectors(bio);
-		if (__process_abnormal_io(&ci, ti, &error))
-			goto out;
-
-		tio = alloc_tio(&ci, ti, 0, GFP_NOIO);
-		ret = __clone_and_map_simple_bio(&ci, tio, NULL);
-	}
-out:
-	/* drop the extra reference count */
-	dec_pending(ci.io, errno_to_blk_status(error));
-	return ret;
-}
-
 static blk_qc_t dm_process_bio(struct mapped_device *md,
 			       struct dm_table *map, struct bio *bio)
 {
@@ -1807,8 +1759,6 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
 		/* regular IO is split by __split_and_process_bio */
 	}
 
-	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
-		return __process_bio(md, map, bio, ti);
 	return __split_and_process_bio(md, map, bio);
 }
 
@@ -2200,12 +2150,10 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
 	if (request_based)
 		dm_stop_queue(q);
 
-	if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) {
+	if (request_based) {
 		/*
-		 * Leverage the fact that request-based DM targets and
-		 * NVMe bio based targets are immutable singletons
-		 * - used to optimize both dm_request_fn and dm_mq_queue_rq;
-		 *   and __process_bio.
+		 * Leverage the fact that request-based DM targets are
+		 * immutable singletons - used to optimize dm_mq_queue_rq.
 		 */
 		md->immutable_target = dm_table_get_immutable_target(t);
 	}
@@ -2334,7 +2282,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
 		break;
 	case DM_TYPE_BIO_BASED:
 	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
 		dm_init_congested_fn(md);
 		break;
 	case DM_TYPE_NONE:
@@ -3070,7 +3017,6 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu
 	switch (type) {
 	case DM_TYPE_BIO_BASED:
 	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
 		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
 		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
 		io_front_pad = roundup(front_pad,  __alignof__(struct dm_io)) + offsetof(struct dm_io, tio);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a53d7d2c2d95..60631f3abddb 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -28,7 +28,6 @@ enum dm_queue_mode {
 	DM_TYPE_BIO_BASED	 = 1,
 	DM_TYPE_REQUEST_BASED	 = 2,
 	DM_TYPE_DAX_BIO_BASED	 = 3,
-	DM_TYPE_NVME_BIO_BASED	 = 4,
 };
 
 typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;
-- 
2.30.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-15 20:02                   ` [dm-devel] " Mike Snitzer
@ 2022-06-15 20:40                     ` Guenter Roeck
  -1 siblings, 0 replies; 68+ messages in thread
From: Guenter Roeck @ 2022-06-15 20:40 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Mike Snitzer, keescook, sarthakkukreti, Greg KH, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

On Wed, Jun 15, 2022 at 04:02:36PM -0400, Mike Snitzer wrote:
[ ... ]
> 
> OK, well this is pretty easy to fix in general.  If there are slight
> differences across older trees they are easily resolved.  Fact that
> stable@ couldn't cope with backporting 9c37de297f65 is.. what it is.
> 
> But this will fix the issue on 5.4.y:
> 
> From: Mike Snitzer <snitzer@kernel.org>
> Date: Wed, 15 Jun 2022 14:07:09 -0400
> Subject: [5.4.y PATCH] dm: remove special-casing of bio-based immutable singleton target on NVMe
> 
> Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.
> 
> There is no benefit to DM special-casing NVMe. Remove all code used to
> establish DM_TYPE_NVME_BIO_BASED.
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>

I'll give it a try.

Thanks,
Guenter

> ---
>  drivers/md/dm-table.c         | 32 ++----------------
>  drivers/md/dm.c               | 64 +++--------------------------------
>  include/linux/device-mapper.h |  1 -
>  3 files changed, 7 insertions(+), 90 deletions(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 06b382304d92..81bc36a43b32 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args);
>  static bool __table_type_bio_based(enum dm_queue_mode table_type)
>  {
>  	return (table_type == DM_TYPE_BIO_BASED ||
> -		table_type == DM_TYPE_DAX_BIO_BASED ||
> -		table_type == DM_TYPE_NVME_BIO_BASED);
> +		table_type == DM_TYPE_DAX_BIO_BASED);
>  }
>  
>  static bool __table_type_request_based(enum dm_queue_mode table_type)
> @@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_table *t,
>  	return true;
>  }
>  
> -static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
> -
>  static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
>  				  sector_t start, sector_t len, void *data)
>  {
> @@ -960,7 +957,6 @@ static int dm_table_determine_type(struct dm_table *t)
>  			goto verify_bio_based;
>  		}
>  		BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
> -		BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED);
>  		goto verify_rq_based;
>  	}
>  
> @@ -999,15 +995,6 @@ static int dm_table_determine_type(struct dm_table *t)
>  		if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
>  		    (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
>  			t->type = DM_TYPE_DAX_BIO_BASED;
> -		} else {
> -			/* Check if upgrading to NVMe bio-based is valid or required */
> -			tgt = dm_table_get_immutable_target(t);
> -			if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
> -				t->type = DM_TYPE_NVME_BIO_BASED;
> -				goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
> -			} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
> -				t->type = DM_TYPE_NVME_BIO_BASED;
> -			}
>  		}
>  		return 0;
>  	}
> @@ -1024,8 +1011,7 @@ static int dm_table_determine_type(struct dm_table *t)
>  	 * (e.g. request completion process for partial completion.)
>  	 */
>  	if (t->num_targets > 1) {
> -		DMERR("%s DM doesn't support multiple targets",
> -		      t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
> +		DMERR("request-based DM doesn't support multiple targets");
>  		return -EINVAL;
>  	}
>  
> @@ -1714,20 +1700,6 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
>  	return q && !blk_queue_add_random(q);
>  }
>  
> -static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev,
> -					sector_t start, sector_t len, void *data)
> -{
> -	char b[BDEVNAME_SIZE];
> -
> -	/* For now, NVMe devices are the only devices of this class */
> -	return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0);
> -}
> -
> -static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
> -{
> -	return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL);
> -}
> -
>  static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
>  					 sector_t start, sector_t len, void *data)
>  {
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 37b8bb4d80f0..3c45c389ded9 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio)
>  	struct mapped_device *md = tio->io->md;
>  	dm_endio_fn endio = tio->ti->type->end_io;
>  
> -	if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) {
> +	if (unlikely(error == BLK_STS_TARGET)) {
>  		if (bio_op(bio) == REQ_OP_DISCARD &&
>  		    !bio->bi_disk->queue->limits.max_discard_sectors)
>  			disable_discard(md);
> @@ -1340,10 +1340,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
>  		/* the bio has been remapped so dispatch it */
>  		trace_block_bio_remap(clone->bi_disk->queue, clone,
>  				      bio_dev(io->orig_bio), sector);
> -		if (md->type == DM_TYPE_NVME_BIO_BASED)
> -			ret = direct_make_request(clone);
> -		else
> -			ret = generic_make_request(clone);
> +		ret = generic_make_request(clone);
>  		break;
>  	case DM_MAPIO_KILL:
>  		if (unlikely(swap_bios_limit(ti, clone))) {
> @@ -1732,51 +1729,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
>  	return ret;
>  }
>  
> -/*
> - * Optimized variant of __split_and_process_bio that leverages the
> - * fact that targets that use it do _not_ have a need to split bios.
> - */
> -static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
> -			      struct bio *bio, struct dm_target *ti)
> -{
> -	struct clone_info ci;
> -	blk_qc_t ret = BLK_QC_T_NONE;
> -	int error = 0;
> -
> -	init_clone_info(&ci, md, map, bio);
> -
> -	if (bio->bi_opf & REQ_PREFLUSH) {
> -		struct bio flush_bio;
> -
> -		/*
> -		 * Use an on-stack bio for this, it's safe since we don't
> -		 * need to reference it after submit. It's just used as
> -		 * the basis for the clone(s).
> -		 */
> -		bio_init(&flush_bio, NULL, 0);
> -		flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
> -		ci.bio = &flush_bio;
> -		ci.sector_count = 0;
> -		error = __send_empty_flush(&ci);
> -		bio_uninit(ci.bio);
> -		/* dec_pending submits any data associated with flush */
> -	} else {
> -		struct dm_target_io *tio;
> -
> -		ci.bio = bio;
> -		ci.sector_count = bio_sectors(bio);
> -		if (__process_abnormal_io(&ci, ti, &error))
> -			goto out;
> -
> -		tio = alloc_tio(&ci, ti, 0, GFP_NOIO);
> -		ret = __clone_and_map_simple_bio(&ci, tio, NULL);
> -	}
> -out:
> -	/* drop the extra reference count */
> -	dec_pending(ci.io, errno_to_blk_status(error));
> -	return ret;
> -}
> -
>  static blk_qc_t dm_process_bio(struct mapped_device *md,
>  			       struct dm_table *map, struct bio *bio)
>  {
> @@ -1807,8 +1759,6 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
>  		/* regular IO is split by __split_and_process_bio */
>  	}
>  
> -	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
> -		return __process_bio(md, map, bio, ti);
>  	return __split_and_process_bio(md, map, bio);
>  }
>  
> @@ -2200,12 +2150,10 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
>  	if (request_based)
>  		dm_stop_queue(q);
>  
> -	if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) {
> +	if (request_based) {
>  		/*
> -		 * Leverage the fact that request-based DM targets and
> -		 * NVMe bio based targets are immutable singletons
> -		 * - used to optimize both dm_request_fn and dm_mq_queue_rq;
> -		 *   and __process_bio.
> +		 * Leverage the fact that request-based DM targets are
> +		 * immutable singletons - used to optimize dm_mq_queue_rq.
>  		 */
>  		md->immutable_target = dm_table_get_immutable_target(t);
>  	}
> @@ -2334,7 +2282,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
>  		break;
>  	case DM_TYPE_BIO_BASED:
>  	case DM_TYPE_DAX_BIO_BASED:
> -	case DM_TYPE_NVME_BIO_BASED:
>  		dm_init_congested_fn(md);
>  		break;
>  	case DM_TYPE_NONE:
> @@ -3070,7 +3017,6 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu
>  	switch (type) {
>  	case DM_TYPE_BIO_BASED:
>  	case DM_TYPE_DAX_BIO_BASED:
> -	case DM_TYPE_NVME_BIO_BASED:
>  		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
>  		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
>  		io_front_pad = roundup(front_pad,  __alignof__(struct dm_io)) + offsetof(struct dm_io, tio);
> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> index a53d7d2c2d95..60631f3abddb 100644
> --- a/include/linux/device-mapper.h
> +++ b/include/linux/device-mapper.h
> @@ -28,7 +28,6 @@ enum dm_queue_mode {
>  	DM_TYPE_BIO_BASED	 = 1,
>  	DM_TYPE_REQUEST_BASED	 = 2,
>  	DM_TYPE_DAX_BIO_BASED	 = 3,
> -	DM_TYPE_NVME_BIO_BASED	 = 4,
>  };
>  
>  typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;
> -- 
> 2.30.0
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [dm-devel] [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
@ 2022-06-15 20:40                     ` Guenter Roeck
  0 siblings, 0 replies; 68+ messages in thread
From: Guenter Roeck @ 2022-06-15 20:40 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: keescook, sarthakkukreti, Greg KH, Mike Snitzer, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

On Wed, Jun 15, 2022 at 04:02:36PM -0400, Mike Snitzer wrote:
[ ... ]
> 
> OK, well this is pretty easy to fix in general.  If there are slight
> differences across older trees they are easily resolved.  Fact that
> stable@ couldn't cope with backporting 9c37de297f65 is.. what it is.
> 
> But this will fix the issue on 5.4.y:
> 
> From: Mike Snitzer <snitzer@kernel.org>
> Date: Wed, 15 Jun 2022 14:07:09 -0400
> Subject: [5.4.y PATCH] dm: remove special-casing of bio-based immutable singleton target on NVMe
> 
> Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.
> 
> There is no benefit to DM special-casing NVMe. Remove all code used to
> establish DM_TYPE_NVME_BIO_BASED.
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>

I'll give it a try.

Thanks,
Guenter

> ---
>  drivers/md/dm-table.c         | 32 ++----------------
>  drivers/md/dm.c               | 64 +++--------------------------------
>  include/linux/device-mapper.h |  1 -
>  3 files changed, 7 insertions(+), 90 deletions(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 06b382304d92..81bc36a43b32 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args);
>  static bool __table_type_bio_based(enum dm_queue_mode table_type)
>  {
>  	return (table_type == DM_TYPE_BIO_BASED ||
> -		table_type == DM_TYPE_DAX_BIO_BASED ||
> -		table_type == DM_TYPE_NVME_BIO_BASED);
> +		table_type == DM_TYPE_DAX_BIO_BASED);
>  }
>  
>  static bool __table_type_request_based(enum dm_queue_mode table_type)
> @@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_table *t,
>  	return true;
>  }
>  
> -static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
> -
>  static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
>  				  sector_t start, sector_t len, void *data)
>  {
> @@ -960,7 +957,6 @@ static int dm_table_determine_type(struct dm_table *t)
>  			goto verify_bio_based;
>  		}
>  		BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
> -		BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED);
>  		goto verify_rq_based;
>  	}
>  
> @@ -999,15 +995,6 @@ static int dm_table_determine_type(struct dm_table *t)
>  		if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
>  		    (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
>  			t->type = DM_TYPE_DAX_BIO_BASED;
> -		} else {
> -			/* Check if upgrading to NVMe bio-based is valid or required */
> -			tgt = dm_table_get_immutable_target(t);
> -			if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
> -				t->type = DM_TYPE_NVME_BIO_BASED;
> -				goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
> -			} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
> -				t->type = DM_TYPE_NVME_BIO_BASED;
> -			}
>  		}
>  		return 0;
>  	}
> @@ -1024,8 +1011,7 @@ static int dm_table_determine_type(struct dm_table *t)
>  	 * (e.g. request completion process for partial completion.)
>  	 */
>  	if (t->num_targets > 1) {
> -		DMERR("%s DM doesn't support multiple targets",
> -		      t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
> +		DMERR("request-based DM doesn't support multiple targets");
>  		return -EINVAL;
>  	}
>  
> @@ -1714,20 +1700,6 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
>  	return q && !blk_queue_add_random(q);
>  }
>  
> -static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev,
> -					sector_t start, sector_t len, void *data)
> -{
> -	char b[BDEVNAME_SIZE];
> -
> -	/* For now, NVMe devices are the only devices of this class */
> -	return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0);
> -}
> -
> -static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
> -{
> -	return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL);
> -}
> -
>  static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
>  					 sector_t start, sector_t len, void *data)
>  {
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 37b8bb4d80f0..3c45c389ded9 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio)
>  	struct mapped_device *md = tio->io->md;
>  	dm_endio_fn endio = tio->ti->type->end_io;
>  
> -	if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) {
> +	if (unlikely(error == BLK_STS_TARGET)) {
>  		if (bio_op(bio) == REQ_OP_DISCARD &&
>  		    !bio->bi_disk->queue->limits.max_discard_sectors)
>  			disable_discard(md);
> @@ -1340,10 +1340,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
>  		/* the bio has been remapped so dispatch it */
>  		trace_block_bio_remap(clone->bi_disk->queue, clone,
>  				      bio_dev(io->orig_bio), sector);
> -		if (md->type == DM_TYPE_NVME_BIO_BASED)
> -			ret = direct_make_request(clone);
> -		else
> -			ret = generic_make_request(clone);
> +		ret = generic_make_request(clone);
>  		break;
>  	case DM_MAPIO_KILL:
>  		if (unlikely(swap_bios_limit(ti, clone))) {
> @@ -1732,51 +1729,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
>  	return ret;
>  }
>  
> -/*
> - * Optimized variant of __split_and_process_bio that leverages the
> - * fact that targets that use it do _not_ have a need to split bios.
> - */
> -static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
> -			      struct bio *bio, struct dm_target *ti)
> -{
> -	struct clone_info ci;
> -	blk_qc_t ret = BLK_QC_T_NONE;
> -	int error = 0;
> -
> -	init_clone_info(&ci, md, map, bio);
> -
> -	if (bio->bi_opf & REQ_PREFLUSH) {
> -		struct bio flush_bio;
> -
> -		/*
> -		 * Use an on-stack bio for this, it's safe since we don't
> -		 * need to reference it after submit. It's just used as
> -		 * the basis for the clone(s).
> -		 */
> -		bio_init(&flush_bio, NULL, 0);
> -		flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
> -		ci.bio = &flush_bio;
> -		ci.sector_count = 0;
> -		error = __send_empty_flush(&ci);
> -		bio_uninit(ci.bio);
> -		/* dec_pending submits any data associated with flush */
> -	} else {
> -		struct dm_target_io *tio;
> -
> -		ci.bio = bio;
> -		ci.sector_count = bio_sectors(bio);
> -		if (__process_abnormal_io(&ci, ti, &error))
> -			goto out;
> -
> -		tio = alloc_tio(&ci, ti, 0, GFP_NOIO);
> -		ret = __clone_and_map_simple_bio(&ci, tio, NULL);
> -	}
> -out:
> -	/* drop the extra reference count */
> -	dec_pending(ci.io, errno_to_blk_status(error));
> -	return ret;
> -}
> -
>  static blk_qc_t dm_process_bio(struct mapped_device *md,
>  			       struct dm_table *map, struct bio *bio)
>  {
> @@ -1807,8 +1759,6 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
>  		/* regular IO is split by __split_and_process_bio */
>  	}
>  
> -	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
> -		return __process_bio(md, map, bio, ti);
>  	return __split_and_process_bio(md, map, bio);
>  }
>  
> @@ -2200,12 +2150,10 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
>  	if (request_based)
>  		dm_stop_queue(q);
>  
> -	if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) {
> +	if (request_based) {
>  		/*
> -		 * Leverage the fact that request-based DM targets and
> -		 * NVMe bio based targets are immutable singletons
> -		 * - used to optimize both dm_request_fn and dm_mq_queue_rq;
> -		 *   and __process_bio.
> +		 * Leverage the fact that request-based DM targets are
> +		 * immutable singletons - used to optimize dm_mq_queue_rq.
>  		 */
>  		md->immutable_target = dm_table_get_immutable_target(t);
>  	}
> @@ -2334,7 +2282,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
>  		break;
>  	case DM_TYPE_BIO_BASED:
>  	case DM_TYPE_DAX_BIO_BASED:
> -	case DM_TYPE_NVME_BIO_BASED:
>  		dm_init_congested_fn(md);
>  		break;
>  	case DM_TYPE_NONE:
> @@ -3070,7 +3017,6 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu
>  	switch (type) {
>  	case DM_TYPE_BIO_BASED:
>  	case DM_TYPE_DAX_BIO_BASED:
> -	case DM_TYPE_NVME_BIO_BASED:
>  		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
>  		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
>  		io_front_pad = roundup(front_pad,  __alignof__(struct dm_io)) + offsetof(struct dm_io, tio);
> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> index a53d7d2c2d95..60631f3abddb 100644
> --- a/include/linux/device-mapper.h
> +++ b/include/linux/device-mapper.h
> @@ -28,7 +28,6 @@ enum dm_queue_mode {
>  	DM_TYPE_BIO_BASED	 = 1,
>  	DM_TYPE_REQUEST_BASED	 = 2,
>  	DM_TYPE_DAX_BIO_BASED	 = 3,
> -	DM_TYPE_NVME_BIO_BASED	 = 4,
>  };
>  
>  typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;
> -- 
> 2.30.0
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-15 20:02                   ` [dm-devel] " Mike Snitzer
@ 2022-06-15 23:59                     ` Guenter Roeck
  -1 siblings, 0 replies; 68+ messages in thread
From: Guenter Roeck @ 2022-06-15 23:59 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Mike Snitzer, keescook, sarthakkukreti, Greg KH, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

On Wed, Jun 15, 2022 at 04:02:36PM -0400, Mike Snitzer wrote:
> On Wed, Jun 15 2022 at  1:50P -0400,
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On 6/15/22 08:29, Mike Snitzer wrote:
> > > On Wed, Jun 15 2022 at 10:36P -0400,
> > > Guenter Roeck <linux@roeck-us.net> wrote:
> > > 
> > > > On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
> > > > > On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
> > > > > > On Fri, Jun 10 2022 at  1:15P -0400,
> > > > > > Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > > 
> > > > > > > On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > > > > > > > I believe this commit introduced a regression in dm verity on systems
> > > > > > > > where data device is an NVME one. Loading table fails with the
> > > > > > > > following diagnostics:
> > > > > > > > 
> > > > > > > > device-mapper: table: table load rejected: including non-request-stackable devices
> > > > > > > > 
> > > > > > > > The same kernel works with the same data drive on the SCSI interface.
> > > > > > > > NVME-backed dm verity works with just this commit reverted.
> > > > > > > > 
> > > > > > > > I believe the presence of the immutable partition is used as an indicator
> > > > > > > > of special case NVME configuration and if the data device's name starts
> > > > > > > > with "nvme" the code tries to switch the target type to
> > > > > > > > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > > > > > > > 
> > > > > > > > The special NVME optimization case was removed in
> > > > > > > > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > > > > > > > affected.
> > > > > > > > 
> > > > > > > 
> > > > > > > Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> > > > > > > just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> > > > > > > immutable singleton target on NVMe") to those older kernels?  If so,
> > > > > > > have you tested this and verified that it worked?
> > > > > > 
> > > > > > Sorry for the unforeseen stable@ troubles here!
> > > > > > 
> > > > > > In general we'd be fine to apply commit 9c37de297f65 but to do it
> > > > > > properly would require also making sure commits that remove
> > > > > > "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
> > > > > > unnecessary NVMe branching in favor of scsi_dh checks") are applied --
> > > > > > basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
> > > > > > be removed.
> > > > > > 
> > > > > > The commit header for 8d47e65948dd documents what
> > > > > > DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
> > > > > > "nvme" mode really never got used by any userspace that I'm aware of.
> > > > > > 
> > > > > > Sadly I currently don't have the time to do this backport for all N
> > > > > > stable kernels... :(
> > > > > > 
> > > > > > But if that backport gets out of control: A simpler, albeit stable@
> > > > > > unicorn, way to resolve this is to simply revert 9c37de297f65 and make
> > > > 
> > > > 9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
> > > > and trying to apply it results in conflicts which at least I can not
> > > > resolve.
> > > > 
> > > > > > it so that DM-mpath and DM core just used bio-based if "nvme" is
> > > > > > requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
> > > > > > 
> > > > > > @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
> > > > > > 
> > > > > >                          if (!strcasecmp(queue_mode_name, "bio"))
> > > > > >                                  m->queue_mode = DM_TYPE_BIO_BASED;
> > > > > > 			else if (!strcasecmp(queue_mode_name, "nvme"))
> > > > > > -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
> > > > > > +                               m->queue_mode = DM_TYPE_BIO_BASED;
> > > > > >                          else if (!strcasecmp(queue_mode_name, "rq"))
> > > > > >                                  m->queue_mode = DM_TYPE_REQUEST_BASED;
> > > > > >                          else if (!strcasecmp(queue_mode_name, "mq"))
> > > > > > 
> > > > > > Mike
> > > > > > 
> > > > > 
> > > > > Ok, please submit a working patch for the kernels that need it so that
> > > > > we can review and apply it to solve this regression.
> > > > > 
> > > > 
> > > > So, effectively, v5.4.y and older are broken right now for use cases
> > > > with dm on NVME drives.
> > > > 
> > > > Given that the regression does affect older branches, and given that we
> > > > have to revert this patch to avoid regressions in ChromeOS, would it be
> > > > possible to revert it from v5.4.y and older until a fix is found ?
> > > 
> > > I obviously would prefer to not have this false-start.
> > > 
> > The false start has already happened since we had to revert the patch
> > from chromeos-5.4 and older branches.
> 
> OK, well this is pretty easy to fix in general.  If there are slight
> differences across older trees they are easily resolved.  Fact that
> stable@ couldn't cope with backporting 9c37de297f65 is.. what it is.
> 
> But this will fix the issue on 5.4.y:
> 
> From: Mike Snitzer <snitzer@kernel.org>
> Date: Wed, 15 Jun 2022 14:07:09 -0400
> Subject: [5.4.y PATCH] dm: remove special-casing of bio-based immutable singleton target on NVMe
> 
> Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.
> 
> There is no benefit to DM special-casing NVMe. Remove all code used to
> establish DM_TYPE_NVME_BIO_BASED.
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  drivers/md/dm-table.c         | 32 ++----------------
>  drivers/md/dm.c               | 64 +++--------------------------------
>  include/linux/device-mapper.h |  1 -
>  3 files changed, 7 insertions(+), 90 deletions(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 06b382304d92..81bc36a43b32 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args);
>  static bool __table_type_bio_based(enum dm_queue_mode table_type)
>  {
>  	return (table_type == DM_TYPE_BIO_BASED ||
> -		table_type == DM_TYPE_DAX_BIO_BASED ||
> -		table_type == DM_TYPE_NVME_BIO_BASED);
> +		table_type == DM_TYPE_DAX_BIO_BASED);
>  }
>  
>  static bool __table_type_request_based(enum dm_queue_mode table_type)
> @@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_table *t,
>  	return true;
>  }
>  
> -static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
> -
>  static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
>  				  sector_t start, sector_t len, void *data)
>  {
> @@ -960,7 +957,6 @@ static int dm_table_determine_type(struct dm_table *t)
>  			goto verify_bio_based;
>  		}
>  		BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
> -		BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED);
>  		goto verify_rq_based;
>  	}
>  
> @@ -999,15 +995,6 @@ static int dm_table_determine_type(struct dm_table *t)
>  		if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
>  		    (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
>  			t->type = DM_TYPE_DAX_BIO_BASED;
> -		} else {
> -			/* Check if upgrading to NVMe bio-based is valid or required */
> -			tgt = dm_table_get_immutable_target(t);
> -			if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
> -				t->type = DM_TYPE_NVME_BIO_BASED;
> -				goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
> -			} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
> -				t->type = DM_TYPE_NVME_BIO_BASED;
> -			}
>  		}
>  		return 0;
>  	}
> @@ -1024,8 +1011,7 @@ static int dm_table_determine_type(struct dm_table *t)
>  	 * (e.g. request completion process for partial completion.)
>  	 */
>  	if (t->num_targets > 1) {
> -		DMERR("%s DM doesn't support multiple targets",
> -		      t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
> +		DMERR("request-based DM doesn't support multiple targets");
>  		return -EINVAL;
>  	}
>  
> @@ -1714,20 +1700,6 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
>  	return q && !blk_queue_add_random(q);
>  }
>  
> -static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev,
> -					sector_t start, sector_t len, void *data)
> -{
> -	char b[BDEVNAME_SIZE];
> -
> -	/* For now, NVMe devices are the only devices of this class */
> -	return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0);
> -}
> -
> -static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
> -{
> -	return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL);
> -}
> -
>  static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
>  					 sector_t start, sector_t len, void *data)
>  {
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 37b8bb4d80f0..3c45c389ded9 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio)
>  	struct mapped_device *md = tio->io->md;
>  	dm_endio_fn endio = tio->ti->type->end_io;
>  
> -	if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) {
> +	if (unlikely(error == BLK_STS_TARGET)) {
>  		if (bio_op(bio) == REQ_OP_DISCARD &&
>  		    !bio->bi_disk->queue->limits.max_discard_sectors)
>  			disable_discard(md);
> @@ -1340,10 +1340,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
>  		/* the bio has been remapped so dispatch it */
>  		trace_block_bio_remap(clone->bi_disk->queue, clone,
>  				      bio_dev(io->orig_bio), sector);
> -		if (md->type == DM_TYPE_NVME_BIO_BASED)
> -			ret = direct_make_request(clone);
> -		else
> -			ret = generic_make_request(clone);

drivers/md/dm.c:1340:24: error: unused variable 'md'

I'll try again with this fixed.

Guenter

> +		ret = generic_make_request(clone);
>  		break;
>  	case DM_MAPIO_KILL:
>  		if (unlikely(swap_bios_limit(ti, clone))) {
> @@ -1732,51 +1729,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
>  	return ret;
>  }
>  
> -/*
> - * Optimized variant of __split_and_process_bio that leverages the
> - * fact that targets that use it do _not_ have a need to split bios.
> - */
> -static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
> -			      struct bio *bio, struct dm_target *ti)
> -{
> -	struct clone_info ci;
> -	blk_qc_t ret = BLK_QC_T_NONE;
> -	int error = 0;
> -
> -	init_clone_info(&ci, md, map, bio);
> -
> -	if (bio->bi_opf & REQ_PREFLUSH) {
> -		struct bio flush_bio;
> -
> -		/*
> -		 * Use an on-stack bio for this, it's safe since we don't
> -		 * need to reference it after submit. It's just used as
> -		 * the basis for the clone(s).
> -		 */
> -		bio_init(&flush_bio, NULL, 0);
> -		flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
> -		ci.bio = &flush_bio;
> -		ci.sector_count = 0;
> -		error = __send_empty_flush(&ci);
> -		bio_uninit(ci.bio);
> -		/* dec_pending submits any data associated with flush */
> -	} else {
> -		struct dm_target_io *tio;
> -
> -		ci.bio = bio;
> -		ci.sector_count = bio_sectors(bio);
> -		if (__process_abnormal_io(&ci, ti, &error))
> -			goto out;
> -
> -		tio = alloc_tio(&ci, ti, 0, GFP_NOIO);
> -		ret = __clone_and_map_simple_bio(&ci, tio, NULL);
> -	}
> -out:
> -	/* drop the extra reference count */
> -	dec_pending(ci.io, errno_to_blk_status(error));
> -	return ret;
> -}
> -
>  static blk_qc_t dm_process_bio(struct mapped_device *md,
>  			       struct dm_table *map, struct bio *bio)
>  {
> @@ -1807,8 +1759,6 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
>  		/* regular IO is split by __split_and_process_bio */
>  	}
>  
> -	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
> -		return __process_bio(md, map, bio, ti);
>  	return __split_and_process_bio(md, map, bio);
>  }
>  
> @@ -2200,12 +2150,10 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
>  	if (request_based)
>  		dm_stop_queue(q);
>  
> -	if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) {
> +	if (request_based) {
>  		/*
> -		 * Leverage the fact that request-based DM targets and
> -		 * NVMe bio based targets are immutable singletons
> -		 * - used to optimize both dm_request_fn and dm_mq_queue_rq;
> -		 *   and __process_bio.
> +		 * Leverage the fact that request-based DM targets are
> +		 * immutable singletons - used to optimize dm_mq_queue_rq.
>  		 */
>  		md->immutable_target = dm_table_get_immutable_target(t);
>  	}
> @@ -2334,7 +2282,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
>  		break;
>  	case DM_TYPE_BIO_BASED:
>  	case DM_TYPE_DAX_BIO_BASED:
> -	case DM_TYPE_NVME_BIO_BASED:
>  		dm_init_congested_fn(md);
>  		break;
>  	case DM_TYPE_NONE:
> @@ -3070,7 +3017,6 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu
>  	switch (type) {
>  	case DM_TYPE_BIO_BASED:
>  	case DM_TYPE_DAX_BIO_BASED:
> -	case DM_TYPE_NVME_BIO_BASED:
>  		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
>  		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
>  		io_front_pad = roundup(front_pad,  __alignof__(struct dm_io)) + offsetof(struct dm_io, tio);
> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> index a53d7d2c2d95..60631f3abddb 100644
> --- a/include/linux/device-mapper.h
> +++ b/include/linux/device-mapper.h
> @@ -28,7 +28,6 @@ enum dm_queue_mode {
>  	DM_TYPE_BIO_BASED	 = 1,
>  	DM_TYPE_REQUEST_BASED	 = 2,
>  	DM_TYPE_DAX_BIO_BASED	 = 3,
> -	DM_TYPE_NVME_BIO_BASED	 = 4,
>  };
>  
>  typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;
> -- 
> 2.30.0
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [dm-devel] [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
@ 2022-06-15 23:59                     ` Guenter Roeck
  0 siblings, 0 replies; 68+ messages in thread
From: Guenter Roeck @ 2022-06-15 23:59 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: keescook, sarthakkukreti, Greg KH, Mike Snitzer, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

On Wed, Jun 15, 2022 at 04:02:36PM -0400, Mike Snitzer wrote:
> On Wed, Jun 15 2022 at  1:50P -0400,
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On 6/15/22 08:29, Mike Snitzer wrote:
> > > On Wed, Jun 15 2022 at 10:36P -0400,
> > > Guenter Roeck <linux@roeck-us.net> wrote:
> > > 
> > > > On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
> > > > > On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
> > > > > > On Fri, Jun 10 2022 at  1:15P -0400,
> > > > > > Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > > 
> > > > > > > On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > > > > > > > I believe this commit introduced a regression in dm verity on systems
> > > > > > > > where data device is an NVME one. Loading table fails with the
> > > > > > > > following diagnostics:
> > > > > > > > 
> > > > > > > > device-mapper: table: table load rejected: including non-request-stackable devices
> > > > > > > > 
> > > > > > > > The same kernel works with the same data drive on the SCSI interface.
> > > > > > > > NVME-backed dm verity works with just this commit reverted.
> > > > > > > > 
> > > > > > > > I believe the presence of the immutable partition is used as an indicator
> > > > > > > > of special case NVME configuration and if the data device's name starts
> > > > > > > > with "nvme" the code tries to switch the target type to
> > > > > > > > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > > > > > > > 
> > > > > > > > The special NVME optimization case was removed in
> > > > > > > > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > > > > > > > affected.
> > > > > > > > 
> > > > > > > 
> > > > > > > Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> > > > > > > just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> > > > > > > immutable singleton target on NVMe") to those older kernels?  If so,
> > > > > > > have you tested this and verified that it worked?
> > > > > > 
> > > > > > Sorry for the unforeseen stable@ troubles here!
> > > > > > 
> > > > > > In general we'd be fine to apply commit 9c37de297f65 but to do it
> > > > > > properly would require also making sure commits that remove
> > > > > > "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
> > > > > > unnecessary NVMe branching in favor of scsi_dh checks") are applied --
> > > > > > basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
> > > > > > be removed.
> > > > > > 
> > > > > > The commit header for 8d47e65948dd documents what
> > > > > > DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
> > > > > > "nvme" mode really never got used by any userspace that I'm aware of.
> > > > > > 
> > > > > > Sadly I currently don't have the time to do this backport for all N
> > > > > > stable kernels... :(
> > > > > > 
> > > > > > But if that backport gets out of control: A simpler, albeit stable@
> > > > > > unicorn, way to resolve this is to simply revert 9c37de297f65 and make
> > > > 
> > > > 9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
> > > > and trying to apply it results in conflicts which at least I can not
> > > > resolve.
> > > > 
> > > > > > it so that DM-mpath and DM core just used bio-based if "nvme" is
> > > > > > requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
> > > > > > 
> > > > > > @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
> > > > > > 
> > > > > >                          if (!strcasecmp(queue_mode_name, "bio"))
> > > > > >                                  m->queue_mode = DM_TYPE_BIO_BASED;
> > > > > > 			else if (!strcasecmp(queue_mode_name, "nvme"))
> > > > > > -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
> > > > > > +                               m->queue_mode = DM_TYPE_BIO_BASED;
> > > > > >                          else if (!strcasecmp(queue_mode_name, "rq"))
> > > > > >                                  m->queue_mode = DM_TYPE_REQUEST_BASED;
> > > > > >                          else if (!strcasecmp(queue_mode_name, "mq"))
> > > > > > 
> > > > > > Mike
> > > > > > 
> > > > > 
> > > > > Ok, please submit a working patch for the kernels that need it so that
> > > > > we can review and apply it to solve this regression.
> > > > > 
> > > > 
> > > > So, effectively, v5.4.y and older are broken right now for use cases
> > > > with dm on NVME drives.
> > > > 
> > > > Given that the regression does affect older branches, and given that we
> > > > have to revert this patch to avoid regressions in ChromeOS, would it be
> > > > possible to revert it from v5.4.y and older until a fix is found ?
> > > 
> > > I obviously would prefer to not have this false-start.
> > > 
> > The false start has already happened since we had to revert the patch
> > from chromeos-5.4 and older branches.
> 
> OK, well this is pretty easy to fix in general.  If there are slight
> differences across older trees they are easily resolved.  Fact that
> stable@ couldn't cope with backporting 9c37de297f65 is.. what it is.
> 
> But this will fix the issue on 5.4.y:
> 
> From: Mike Snitzer <snitzer@kernel.org>
> Date: Wed, 15 Jun 2022 14:07:09 -0400
> Subject: [5.4.y PATCH] dm: remove special-casing of bio-based immutable singleton target on NVMe
> 
> Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.
> 
> There is no benefit to DM special-casing NVMe. Remove all code used to
> establish DM_TYPE_NVME_BIO_BASED.
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  drivers/md/dm-table.c         | 32 ++----------------
>  drivers/md/dm.c               | 64 +++--------------------------------
>  include/linux/device-mapper.h |  1 -
>  3 files changed, 7 insertions(+), 90 deletions(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 06b382304d92..81bc36a43b32 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args);
>  static bool __table_type_bio_based(enum dm_queue_mode table_type)
>  {
>  	return (table_type == DM_TYPE_BIO_BASED ||
> -		table_type == DM_TYPE_DAX_BIO_BASED ||
> -		table_type == DM_TYPE_NVME_BIO_BASED);
> +		table_type == DM_TYPE_DAX_BIO_BASED);
>  }
>  
>  static bool __table_type_request_based(enum dm_queue_mode table_type)
> @@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_table *t,
>  	return true;
>  }
>  
> -static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
> -
>  static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
>  				  sector_t start, sector_t len, void *data)
>  {
> @@ -960,7 +957,6 @@ static int dm_table_determine_type(struct dm_table *t)
>  			goto verify_bio_based;
>  		}
>  		BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
> -		BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED);
>  		goto verify_rq_based;
>  	}
>  
> @@ -999,15 +995,6 @@ static int dm_table_determine_type(struct dm_table *t)
>  		if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
>  		    (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
>  			t->type = DM_TYPE_DAX_BIO_BASED;
> -		} else {
> -			/* Check if upgrading to NVMe bio-based is valid or required */
> -			tgt = dm_table_get_immutable_target(t);
> -			if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
> -				t->type = DM_TYPE_NVME_BIO_BASED;
> -				goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
> -			} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
> -				t->type = DM_TYPE_NVME_BIO_BASED;
> -			}
>  		}
>  		return 0;
>  	}
> @@ -1024,8 +1011,7 @@ static int dm_table_determine_type(struct dm_table *t)
>  	 * (e.g. request completion process for partial completion.)
>  	 */
>  	if (t->num_targets > 1) {
> -		DMERR("%s DM doesn't support multiple targets",
> -		      t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
> +		DMERR("request-based DM doesn't support multiple targets");
>  		return -EINVAL;
>  	}
>  
> @@ -1714,20 +1700,6 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
>  	return q && !blk_queue_add_random(q);
>  }
>  
> -static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev,
> -					sector_t start, sector_t len, void *data)
> -{
> -	char b[BDEVNAME_SIZE];
> -
> -	/* For now, NVMe devices are the only devices of this class */
> -	return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0);
> -}
> -
> -static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
> -{
> -	return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL);
> -}
> -
>  static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
>  					 sector_t start, sector_t len, void *data)
>  {
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 37b8bb4d80f0..3c45c389ded9 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio)
>  	struct mapped_device *md = tio->io->md;
>  	dm_endio_fn endio = tio->ti->type->end_io;
>  
> -	if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) {
> +	if (unlikely(error == BLK_STS_TARGET)) {
>  		if (bio_op(bio) == REQ_OP_DISCARD &&
>  		    !bio->bi_disk->queue->limits.max_discard_sectors)
>  			disable_discard(md);
> @@ -1340,10 +1340,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
>  		/* the bio has been remapped so dispatch it */
>  		trace_block_bio_remap(clone->bi_disk->queue, clone,
>  				      bio_dev(io->orig_bio), sector);
> -		if (md->type == DM_TYPE_NVME_BIO_BASED)
> -			ret = direct_make_request(clone);
> -		else
> -			ret = generic_make_request(clone);

drivers/md/dm.c:1340:24: error: unused variable 'md'

I'll try again with this fixed.

Guenter

> +		ret = generic_make_request(clone);
>  		break;
>  	case DM_MAPIO_KILL:
>  		if (unlikely(swap_bios_limit(ti, clone))) {
> @@ -1732,51 +1729,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
>  	return ret;
>  }
>  
> -/*
> - * Optimized variant of __split_and_process_bio that leverages the
> - * fact that targets that use it do _not_ have a need to split bios.
> - */
> -static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
> -			      struct bio *bio, struct dm_target *ti)
> -{
> -	struct clone_info ci;
> -	blk_qc_t ret = BLK_QC_T_NONE;
> -	int error = 0;
> -
> -	init_clone_info(&ci, md, map, bio);
> -
> -	if (bio->bi_opf & REQ_PREFLUSH) {
> -		struct bio flush_bio;
> -
> -		/*
> -		 * Use an on-stack bio for this, it's safe since we don't
> -		 * need to reference it after submit. It's just used as
> -		 * the basis for the clone(s).
> -		 */
> -		bio_init(&flush_bio, NULL, 0);
> -		flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
> -		ci.bio = &flush_bio;
> -		ci.sector_count = 0;
> -		error = __send_empty_flush(&ci);
> -		bio_uninit(ci.bio);
> -		/* dec_pending submits any data associated with flush */
> -	} else {
> -		struct dm_target_io *tio;
> -
> -		ci.bio = bio;
> -		ci.sector_count = bio_sectors(bio);
> -		if (__process_abnormal_io(&ci, ti, &error))
> -			goto out;
> -
> -		tio = alloc_tio(&ci, ti, 0, GFP_NOIO);
> -		ret = __clone_and_map_simple_bio(&ci, tio, NULL);
> -	}
> -out:
> -	/* drop the extra reference count */
> -	dec_pending(ci.io, errno_to_blk_status(error));
> -	return ret;
> -}
> -
>  static blk_qc_t dm_process_bio(struct mapped_device *md,
>  			       struct dm_table *map, struct bio *bio)
>  {
> @@ -1807,8 +1759,6 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
>  		/* regular IO is split by __split_and_process_bio */
>  	}
>  
> -	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
> -		return __process_bio(md, map, bio, ti);
>  	return __split_and_process_bio(md, map, bio);
>  }
>  
> @@ -2200,12 +2150,10 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
>  	if (request_based)
>  		dm_stop_queue(q);
>  
> -	if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) {
> +	if (request_based) {
>  		/*
> -		 * Leverage the fact that request-based DM targets and
> -		 * NVMe bio based targets are immutable singletons
> -		 * - used to optimize both dm_request_fn and dm_mq_queue_rq;
> -		 *   and __process_bio.
> +		 * Leverage the fact that request-based DM targets are
> +		 * immutable singletons - used to optimize dm_mq_queue_rq.
>  		 */
>  		md->immutable_target = dm_table_get_immutable_target(t);
>  	}
> @@ -2334,7 +2282,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
>  		break;
>  	case DM_TYPE_BIO_BASED:
>  	case DM_TYPE_DAX_BIO_BASED:
> -	case DM_TYPE_NVME_BIO_BASED:
>  		dm_init_congested_fn(md);
>  		break;
>  	case DM_TYPE_NONE:
> @@ -3070,7 +3017,6 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu
>  	switch (type) {
>  	case DM_TYPE_BIO_BASED:
>  	case DM_TYPE_DAX_BIO_BASED:
> -	case DM_TYPE_NVME_BIO_BASED:
>  		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
>  		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
>  		io_front_pad = roundup(front_pad,  __alignof__(struct dm_io)) + offsetof(struct dm_io, tio);
> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> index a53d7d2c2d95..60631f3abddb 100644
> --- a/include/linux/device-mapper.h
> +++ b/include/linux/device-mapper.h
> @@ -28,7 +28,6 @@ enum dm_queue_mode {
>  	DM_TYPE_BIO_BASED	 = 1,
>  	DM_TYPE_REQUEST_BASED	 = 2,
>  	DM_TYPE_DAX_BIO_BASED	 = 3,
> -	DM_TYPE_NVME_BIO_BASED	 = 4,
>  };
>  
>  typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;
> -- 
> 2.30.0
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-15 20:02                   ` [dm-devel] " Mike Snitzer
@ 2022-06-16 23:22                     ` Guenter Roeck
  -1 siblings, 0 replies; 68+ messages in thread
From: Guenter Roeck @ 2022-06-16 23:22 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Mike Snitzer, keescook, sarthakkukreti, Greg KH, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

On 6/15/22 13:02, Mike Snitzer wrote:
> On Wed, Jun 15 2022 at  1:50P -0400,
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
>> On 6/15/22 08:29, Mike Snitzer wrote:
>>> On Wed, Jun 15 2022 at 10:36P -0400,
>>> Guenter Roeck <linux@roeck-us.net> wrote:
>>>
>>>> On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
>>>>> On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
>>>>>> On Fri, Jun 10 2022 at  1:15P -0400,
>>>>>> Greg KH <gregkh@linuxfoundation.org> wrote:
>>>>>>
>>>>>>> On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
>>>>>>>> I believe this commit introduced a regression in dm verity on systems
>>>>>>>> where data device is an NVME one. Loading table fails with the
>>>>>>>> following diagnostics:
>>>>>>>>
>>>>>>>> device-mapper: table: table load rejected: including non-request-stackable devices
>>>>>>>>
>>>>>>>> The same kernel works with the same data drive on the SCSI interface.
>>>>>>>> NVME-backed dm verity works with just this commit reverted.
>>>>>>>>
>>>>>>>> I believe the presence of the immutable partition is used as an indicator
>>>>>>>> of special case NVME configuration and if the data device's name starts
>>>>>>>> with "nvme" the code tries to switch the target type to
>>>>>>>> DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
>>>>>>>>
>>>>>>>> The special NVME optimization case was removed in
>>>>>>>> 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
>>>>>>>> affected.
>>>>>>>>
>>>>>>>
>>>>>>> Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
>>>>>>> just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
>>>>>>> immutable singleton target on NVMe") to those older kernels?  If so,
>>>>>>> have you tested this and verified that it worked?
>>>>>>
>>>>>> Sorry for the unforeseen stable@ troubles here!
>>>>>>
>>>>>> In general we'd be fine to apply commit 9c37de297f65 but to do it
>>>>>> properly would require also making sure commits that remove
>>>>>> "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
>>>>>> unnecessary NVMe branching in favor of scsi_dh checks") are applied --
>>>>>> basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
>>>>>> be removed.
>>>>>>
>>>>>> The commit header for 8d47e65948dd documents what
>>>>>> DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
>>>>>> "nvme" mode really never got used by any userspace that I'm aware of.
>>>>>>
>>>>>> Sadly I currently don't have the time to do this backport for all N
>>>>>> stable kernels... :(
>>>>>>
>>>>>> But if that backport gets out of control: A simpler, albeit stable@
>>>>>> unicorn, way to resolve this is to simply revert 9c37de297f65 and make
>>>>
>>>> 9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
>>>> and trying to apply it results in conflicts which at least I can not
>>>> resolve.
>>>>
>>>>>> it so that DM-mpath and DM core just used bio-based if "nvme" is
>>>>>> requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
>>>>>>
>>>>>> @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
>>>>>>
>>>>>>                           if (!strcasecmp(queue_mode_name, "bio"))
>>>>>>                                   m->queue_mode = DM_TYPE_BIO_BASED;
>>>>>> 			else if (!strcasecmp(queue_mode_name, "nvme"))
>>>>>> -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
>>>>>> +                               m->queue_mode = DM_TYPE_BIO_BASED;
>>>>>>                           else if (!strcasecmp(queue_mode_name, "rq"))
>>>>>>                                   m->queue_mode = DM_TYPE_REQUEST_BASED;
>>>>>>                           else if (!strcasecmp(queue_mode_name, "mq"))
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>
>>>>> Ok, please submit a working patch for the kernels that need it so that
>>>>> we can review and apply it to solve this regression.
>>>>>
>>>>
>>>> So, effectively, v5.4.y and older are broken right now for use cases
>>>> with dm on NVME drives.
>>>>
>>>> Given that the regression does affect older branches, and given that we
>>>> have to revert this patch to avoid regressions in ChromeOS, would it be
>>>> possible to revert it from v5.4.y and older until a fix is found ?
>>>
>>> I obviously would prefer to not have this false-start.
>>>
>> The false start has already happened since we had to revert the patch
>> from chromeos-5.4 and older branches.
> 
> OK, well this is pretty easy to fix in general.  If there are slight
> differences across older trees they are easily resolved.  Fact that
> stable@ couldn't cope with backporting 9c37de297f65 is.. what it is.
> 
> But this will fix the issue on 5.4.y:
> 
> From: Mike Snitzer <snitzer@kernel.org>
> Date: Wed, 15 Jun 2022 14:07:09 -0400
> Subject: [5.4.y PATCH] dm: remove special-casing of bio-based immutable singleton target on NVMe
> 
> Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.
> 
> There is no benefit to DM special-casing NVMe. Remove all code used to
> establish DM_TYPE_NVME_BIO_BASED.
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>

This patch passes our tests after I removed the unused variable.

Tested-by: Guenter Roeck <linux@roeck-us.net>

Thanks a lot for the backport!

Guenter

> ---
>   drivers/md/dm-table.c         | 32 ++----------------
>   drivers/md/dm.c               | 64 +++--------------------------------
>   include/linux/device-mapper.h |  1 -
>   3 files changed, 7 insertions(+), 90 deletions(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 06b382304d92..81bc36a43b32 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args);
>   static bool __table_type_bio_based(enum dm_queue_mode table_type)
>   {
>   	return (table_type == DM_TYPE_BIO_BASED ||
> -		table_type == DM_TYPE_DAX_BIO_BASED ||
> -		table_type == DM_TYPE_NVME_BIO_BASED);
> +		table_type == DM_TYPE_DAX_BIO_BASED);
>   }
>   
>   static bool __table_type_request_based(enum dm_queue_mode table_type)
> @@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_table *t,
>   	return true;
>   }
>   
> -static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
> -
>   static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
>   				  sector_t start, sector_t len, void *data)
>   {
> @@ -960,7 +957,6 @@ static int dm_table_determine_type(struct dm_table *t)
>   			goto verify_bio_based;
>   		}
>   		BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
> -		BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED);
>   		goto verify_rq_based;
>   	}
>   
> @@ -999,15 +995,6 @@ static int dm_table_determine_type(struct dm_table *t)
>   		if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
>   		    (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
>   			t->type = DM_TYPE_DAX_BIO_BASED;
> -		} else {
> -			/* Check if upgrading to NVMe bio-based is valid or required */
> -			tgt = dm_table_get_immutable_target(t);
> -			if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
> -				t->type = DM_TYPE_NVME_BIO_BASED;
> -				goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
> -			} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
> -				t->type = DM_TYPE_NVME_BIO_BASED;
> -			}
>   		}
>   		return 0;
>   	}
> @@ -1024,8 +1011,7 @@ static int dm_table_determine_type(struct dm_table *t)
>   	 * (e.g. request completion process for partial completion.)
>   	 */
>   	if (t->num_targets > 1) {
> -		DMERR("%s DM doesn't support multiple targets",
> -		      t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
> +		DMERR("request-based DM doesn't support multiple targets");
>   		return -EINVAL;
>   	}
>   
> @@ -1714,20 +1700,6 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
>   	return q && !blk_queue_add_random(q);
>   }
>   
> -static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev,
> -					sector_t start, sector_t len, void *data)
> -{
> -	char b[BDEVNAME_SIZE];
> -
> -	/* For now, NVMe devices are the only devices of this class */
> -	return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0);
> -}
> -
> -static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
> -{
> -	return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL);
> -}
> -
>   static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
>   					 sector_t start, sector_t len, void *data)
>   {
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 37b8bb4d80f0..3c45c389ded9 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio)
>   	struct mapped_device *md = tio->io->md;
>   	dm_endio_fn endio = tio->ti->type->end_io;
>   
> -	if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) {
> +	if (unlikely(error == BLK_STS_TARGET)) {
>   		if (bio_op(bio) == REQ_OP_DISCARD &&
>   		    !bio->bi_disk->queue->limits.max_discard_sectors)
>   			disable_discard(md);
> @@ -1340,10 +1340,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
>   		/* the bio has been remapped so dispatch it */
>   		trace_block_bio_remap(clone->bi_disk->queue, clone,
>   				      bio_dev(io->orig_bio), sector);
> -		if (md->type == DM_TYPE_NVME_BIO_BASED)
> -			ret = direct_make_request(clone);
> -		else
> -			ret = generic_make_request(clone);
> +		ret = generic_make_request(clone);
>   		break;
>   	case DM_MAPIO_KILL:
>   		if (unlikely(swap_bios_limit(ti, clone))) {
> @@ -1732,51 +1729,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
>   	return ret;
>   }
>   
> -/*
> - * Optimized variant of __split_and_process_bio that leverages the
> - * fact that targets that use it do _not_ have a need to split bios.
> - */
> -static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
> -			      struct bio *bio, struct dm_target *ti)
> -{
> -	struct clone_info ci;
> -	blk_qc_t ret = BLK_QC_T_NONE;
> -	int error = 0;
> -
> -	init_clone_info(&ci, md, map, bio);
> -
> -	if (bio->bi_opf & REQ_PREFLUSH) {
> -		struct bio flush_bio;
> -
> -		/*
> -		 * Use an on-stack bio for this, it's safe since we don't
> -		 * need to reference it after submit. It's just used as
> -		 * the basis for the clone(s).
> -		 */
> -		bio_init(&flush_bio, NULL, 0);
> -		flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
> -		ci.bio = &flush_bio;
> -		ci.sector_count = 0;
> -		error = __send_empty_flush(&ci);
> -		bio_uninit(ci.bio);
> -		/* dec_pending submits any data associated with flush */
> -	} else {
> -		struct dm_target_io *tio;
> -
> -		ci.bio = bio;
> -		ci.sector_count = bio_sectors(bio);
> -		if (__process_abnormal_io(&ci, ti, &error))
> -			goto out;
> -
> -		tio = alloc_tio(&ci, ti, 0, GFP_NOIO);
> -		ret = __clone_and_map_simple_bio(&ci, tio, NULL);
> -	}
> -out:
> -	/* drop the extra reference count */
> -	dec_pending(ci.io, errno_to_blk_status(error));
> -	return ret;
> -}
> -
>   static blk_qc_t dm_process_bio(struct mapped_device *md,
>   			       struct dm_table *map, struct bio *bio)
>   {
> @@ -1807,8 +1759,6 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
>   		/* regular IO is split by __split_and_process_bio */
>   	}
>   
> -	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
> -		return __process_bio(md, map, bio, ti);
>   	return __split_and_process_bio(md, map, bio);
>   }
>   
> @@ -2200,12 +2150,10 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
>   	if (request_based)
>   		dm_stop_queue(q);
>   
> -	if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) {
> +	if (request_based) {
>   		/*
> -		 * Leverage the fact that request-based DM targets and
> -		 * NVMe bio based targets are immutable singletons
> -		 * - used to optimize both dm_request_fn and dm_mq_queue_rq;
> -		 *   and __process_bio.
> +		 * Leverage the fact that request-based DM targets are
> +		 * immutable singletons - used to optimize dm_mq_queue_rq.
>   		 */
>   		md->immutable_target = dm_table_get_immutable_target(t);
>   	}
> @@ -2334,7 +2282,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
>   		break;
>   	case DM_TYPE_BIO_BASED:
>   	case DM_TYPE_DAX_BIO_BASED:
> -	case DM_TYPE_NVME_BIO_BASED:
>   		dm_init_congested_fn(md);
>   		break;
>   	case DM_TYPE_NONE:
> @@ -3070,7 +3017,6 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu
>   	switch (type) {
>   	case DM_TYPE_BIO_BASED:
>   	case DM_TYPE_DAX_BIO_BASED:
> -	case DM_TYPE_NVME_BIO_BASED:
>   		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
>   		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
>   		io_front_pad = roundup(front_pad,  __alignof__(struct dm_io)) + offsetof(struct dm_io, tio);
> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> index a53d7d2c2d95..60631f3abddb 100644
> --- a/include/linux/device-mapper.h
> +++ b/include/linux/device-mapper.h
> @@ -28,7 +28,6 @@ enum dm_queue_mode {
>   	DM_TYPE_BIO_BASED	 = 1,
>   	DM_TYPE_REQUEST_BASED	 = 2,
>   	DM_TYPE_DAX_BIO_BASED	 = 3,
> -	DM_TYPE_NVME_BIO_BASED	 = 4,
>   };
>   
>   typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [dm-devel] [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
@ 2022-06-16 23:22                     ` Guenter Roeck
  0 siblings, 0 replies; 68+ messages in thread
From: Guenter Roeck @ 2022-06-16 23:22 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: keescook, sarthakkukreti, Greg KH, Mike Snitzer, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

On 6/15/22 13:02, Mike Snitzer wrote:
> On Wed, Jun 15 2022 at  1:50P -0400,
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
>> On 6/15/22 08:29, Mike Snitzer wrote:
>>> On Wed, Jun 15 2022 at 10:36P -0400,
>>> Guenter Roeck <linux@roeck-us.net> wrote:
>>>
>>>> On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
>>>>> On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
>>>>>> On Fri, Jun 10 2022 at  1:15P -0400,
>>>>>> Greg KH <gregkh@linuxfoundation.org> wrote:
>>>>>>
>>>>>>> On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
>>>>>>>> I believe this commit introduced a regression in dm verity on systems
>>>>>>>> where data device is an NVME one. Loading table fails with the
>>>>>>>> following diagnostics:
>>>>>>>>
>>>>>>>> device-mapper: table: table load rejected: including non-request-stackable devices
>>>>>>>>
>>>>>>>> The same kernel works with the same data drive on the SCSI interface.
>>>>>>>> NVME-backed dm verity works with just this commit reverted.
>>>>>>>>
>>>>>>>> I believe the presence of the immutable partition is used as an indicator
>>>>>>>> of special case NVME configuration and if the data device's name starts
>>>>>>>> with "nvme" the code tries to switch the target type to
>>>>>>>> DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
>>>>>>>>
>>>>>>>> The special NVME optimization case was removed in
>>>>>>>> 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
>>>>>>>> affected.
>>>>>>>>
>>>>>>>
>>>>>>> Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
>>>>>>> just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
>>>>>>> immutable singleton target on NVMe") to those older kernels?  If so,
>>>>>>> have you tested this and verified that it worked?
>>>>>>
>>>>>> Sorry for the unforeseen stable@ troubles here!
>>>>>>
>>>>>> In general we'd be fine to apply commit 9c37de297f65 but to do it
>>>>>> properly would require also making sure commits that remove
>>>>>> "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
>>>>>> unnecessary NVMe branching in favor of scsi_dh checks") are applied --
>>>>>> basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
>>>>>> be removed.
>>>>>>
>>>>>> The commit header for 8d47e65948dd documents what
>>>>>> DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
>>>>>> "nvme" mode really never got used by any userspace that I'm aware of.
>>>>>>
>>>>>> Sadly I currently don't have the time to do this backport for all N
>>>>>> stable kernels... :(
>>>>>>
>>>>>> But if that backport gets out of control: A simpler, albeit stable@
>>>>>> unicorn, way to resolve this is to simply revert 9c37de297f65 and make
>>>>
>>>> 9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
>>>> and trying to apply it results in conflicts which at least I can not
>>>> resolve.
>>>>
>>>>>> it so that DM-mpath and DM core just used bio-based if "nvme" is
>>>>>> requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
>>>>>>
>>>>>> @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
>>>>>>
>>>>>>                           if (!strcasecmp(queue_mode_name, "bio"))
>>>>>>                                   m->queue_mode = DM_TYPE_BIO_BASED;
>>>>>> 			else if (!strcasecmp(queue_mode_name, "nvme"))
>>>>>> -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
>>>>>> +                               m->queue_mode = DM_TYPE_BIO_BASED;
>>>>>>                           else if (!strcasecmp(queue_mode_name, "rq"))
>>>>>>                                   m->queue_mode = DM_TYPE_REQUEST_BASED;
>>>>>>                           else if (!strcasecmp(queue_mode_name, "mq"))
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>
>>>>> Ok, please submit a working patch for the kernels that need it so that
>>>>> we can review and apply it to solve this regression.
>>>>>
>>>>
>>>> So, effectively, v5.4.y and older are broken right now for use cases
>>>> with dm on NVME drives.
>>>>
>>>> Given that the regression does affect older branches, and given that we
>>>> have to revert this patch to avoid regressions in ChromeOS, would it be
>>>> possible to revert it from v5.4.y and older until a fix is found ?
>>>
>>> I obviously would prefer to not have this false-start.
>>>
>> The false start has already happened since we had to revert the patch
>> from chromeos-5.4 and older branches.
> 
> OK, well this is pretty easy to fix in general.  If there are slight
> differences across older trees they are easily resolved.  Fact that
> stable@ couldn't cope with backporting 9c37de297f65 is.. what it is.
> 
> But this will fix the issue on 5.4.y:
> 
> From: Mike Snitzer <snitzer@kernel.org>
> Date: Wed, 15 Jun 2022 14:07:09 -0400
> Subject: [5.4.y PATCH] dm: remove special-casing of bio-based immutable singleton target on NVMe
> 
> Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.
> 
> There is no benefit to DM special-casing NVMe. Remove all code used to
> establish DM_TYPE_NVME_BIO_BASED.
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>

This patch passes our tests after I removed the unused variable.

Tested-by: Guenter Roeck <linux@roeck-us.net>

Thanks a lot for the backport!

Guenter

> ---
>   drivers/md/dm-table.c         | 32 ++----------------
>   drivers/md/dm.c               | 64 +++--------------------------------
>   include/linux/device-mapper.h |  1 -
>   3 files changed, 7 insertions(+), 90 deletions(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 06b382304d92..81bc36a43b32 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args);
>   static bool __table_type_bio_based(enum dm_queue_mode table_type)
>   {
>   	return (table_type == DM_TYPE_BIO_BASED ||
> -		table_type == DM_TYPE_DAX_BIO_BASED ||
> -		table_type == DM_TYPE_NVME_BIO_BASED);
> +		table_type == DM_TYPE_DAX_BIO_BASED);
>   }
>   
>   static bool __table_type_request_based(enum dm_queue_mode table_type)
> @@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_table *t,
>   	return true;
>   }
>   
> -static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
> -
>   static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
>   				  sector_t start, sector_t len, void *data)
>   {
> @@ -960,7 +957,6 @@ static int dm_table_determine_type(struct dm_table *t)
>   			goto verify_bio_based;
>   		}
>   		BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
> -		BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED);
>   		goto verify_rq_based;
>   	}
>   
> @@ -999,15 +995,6 @@ static int dm_table_determine_type(struct dm_table *t)
>   		if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
>   		    (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
>   			t->type = DM_TYPE_DAX_BIO_BASED;
> -		} else {
> -			/* Check if upgrading to NVMe bio-based is valid or required */
> -			tgt = dm_table_get_immutable_target(t);
> -			if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
> -				t->type = DM_TYPE_NVME_BIO_BASED;
> -				goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
> -			} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
> -				t->type = DM_TYPE_NVME_BIO_BASED;
> -			}
>   		}
>   		return 0;
>   	}
> @@ -1024,8 +1011,7 @@ static int dm_table_determine_type(struct dm_table *t)
>   	 * (e.g. request completion process for partial completion.)
>   	 */
>   	if (t->num_targets > 1) {
> -		DMERR("%s DM doesn't support multiple targets",
> -		      t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
> +		DMERR("request-based DM doesn't support multiple targets");
>   		return -EINVAL;
>   	}
>   
> @@ -1714,20 +1700,6 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
>   	return q && !blk_queue_add_random(q);
>   }
>   
> -static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev,
> -					sector_t start, sector_t len, void *data)
> -{
> -	char b[BDEVNAME_SIZE];
> -
> -	/* For now, NVMe devices are the only devices of this class */
> -	return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0);
> -}
> -
> -static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
> -{
> -	return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL);
> -}
> -
>   static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
>   					 sector_t start, sector_t len, void *data)
>   {
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 37b8bb4d80f0..3c45c389ded9 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio)
>   	struct mapped_device *md = tio->io->md;
>   	dm_endio_fn endio = tio->ti->type->end_io;
>   
> -	if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) {
> +	if (unlikely(error == BLK_STS_TARGET)) {
>   		if (bio_op(bio) == REQ_OP_DISCARD &&
>   		    !bio->bi_disk->queue->limits.max_discard_sectors)
>   			disable_discard(md);
> @@ -1340,10 +1340,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
>   		/* the bio has been remapped so dispatch it */
>   		trace_block_bio_remap(clone->bi_disk->queue, clone,
>   				      bio_dev(io->orig_bio), sector);
> -		if (md->type == DM_TYPE_NVME_BIO_BASED)
> -			ret = direct_make_request(clone);
> -		else
> -			ret = generic_make_request(clone);
> +		ret = generic_make_request(clone);
>   		break;
>   	case DM_MAPIO_KILL:
>   		if (unlikely(swap_bios_limit(ti, clone))) {
> @@ -1732,51 +1729,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
>   	return ret;
>   }
>   
> -/*
> - * Optimized variant of __split_and_process_bio that leverages the
> - * fact that targets that use it do _not_ have a need to split bios.
> - */
> -static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
> -			      struct bio *bio, struct dm_target *ti)
> -{
> -	struct clone_info ci;
> -	blk_qc_t ret = BLK_QC_T_NONE;
> -	int error = 0;
> -
> -	init_clone_info(&ci, md, map, bio);
> -
> -	if (bio->bi_opf & REQ_PREFLUSH) {
> -		struct bio flush_bio;
> -
> -		/*
> -		 * Use an on-stack bio for this, it's safe since we don't
> -		 * need to reference it after submit. It's just used as
> -		 * the basis for the clone(s).
> -		 */
> -		bio_init(&flush_bio, NULL, 0);
> -		flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
> -		ci.bio = &flush_bio;
> -		ci.sector_count = 0;
> -		error = __send_empty_flush(&ci);
> -		bio_uninit(ci.bio);
> -		/* dec_pending submits any data associated with flush */
> -	} else {
> -		struct dm_target_io *tio;
> -
> -		ci.bio = bio;
> -		ci.sector_count = bio_sectors(bio);
> -		if (__process_abnormal_io(&ci, ti, &error))
> -			goto out;
> -
> -		tio = alloc_tio(&ci, ti, 0, GFP_NOIO);
> -		ret = __clone_and_map_simple_bio(&ci, tio, NULL);
> -	}
> -out:
> -	/* drop the extra reference count */
> -	dec_pending(ci.io, errno_to_blk_status(error));
> -	return ret;
> -}
> -
>   static blk_qc_t dm_process_bio(struct mapped_device *md,
>   			       struct dm_table *map, struct bio *bio)
>   {
> @@ -1807,8 +1759,6 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
>   		/* regular IO is split by __split_and_process_bio */
>   	}
>   
> -	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
> -		return __process_bio(md, map, bio, ti);
>   	return __split_and_process_bio(md, map, bio);
>   }
>   
> @@ -2200,12 +2150,10 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
>   	if (request_based)
>   		dm_stop_queue(q);
>   
> -	if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) {
> +	if (request_based) {
>   		/*
> -		 * Leverage the fact that request-based DM targets and
> -		 * NVMe bio based targets are immutable singletons
> -		 * - used to optimize both dm_request_fn and dm_mq_queue_rq;
> -		 *   and __process_bio.
> +		 * Leverage the fact that request-based DM targets are
> +		 * immutable singletons - used to optimize dm_mq_queue_rq.
>   		 */
>   		md->immutable_target = dm_table_get_immutable_target(t);
>   	}
> @@ -2334,7 +2282,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
>   		break;
>   	case DM_TYPE_BIO_BASED:
>   	case DM_TYPE_DAX_BIO_BASED:
> -	case DM_TYPE_NVME_BIO_BASED:
>   		dm_init_congested_fn(md);
>   		break;
>   	case DM_TYPE_NONE:
> @@ -3070,7 +3017,6 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu
>   	switch (type) {
>   	case DM_TYPE_BIO_BASED:
>   	case DM_TYPE_DAX_BIO_BASED:
> -	case DM_TYPE_NVME_BIO_BASED:
>   		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
>   		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
>   		io_front_pad = roundup(front_pad,  __alignof__(struct dm_io)) + offsetof(struct dm_io, tio);
> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> index a53d7d2c2d95..60631f3abddb 100644
> --- a/include/linux/device-mapper.h
> +++ b/include/linux/device-mapper.h
> @@ -28,7 +28,6 @@ enum dm_queue_mode {
>   	DM_TYPE_BIO_BASED	 = 1,
>   	DM_TYPE_REQUEST_BASED	 = 2,
>   	DM_TYPE_DAX_BIO_BASED	 = 3,
> -	DM_TYPE_NVME_BIO_BASED	 = 4,
>   };
>   
>   typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
  2022-06-15 20:02                   ` [dm-devel] " Mike Snitzer
@ 2022-06-20 11:44                     ` Greg KH
  -1 siblings, 0 replies; 68+ messages in thread
From: Greg KH @ 2022-06-20 11:44 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Guenter Roeck, Mike Snitzer, keescook, sarthakkukreti, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

On Wed, Jun 15, 2022 at 04:02:36PM -0400, Mike Snitzer wrote:
> On Wed, Jun 15 2022 at  1:50P -0400,
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On 6/15/22 08:29, Mike Snitzer wrote:
> > > On Wed, Jun 15 2022 at 10:36P -0400,
> > > Guenter Roeck <linux@roeck-us.net> wrote:
> > > 
> > > > On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
> > > > > On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
> > > > > > On Fri, Jun 10 2022 at  1:15P -0400,
> > > > > > Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > > 
> > > > > > > On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > > > > > > > I believe this commit introduced a regression in dm verity on systems
> > > > > > > > where data device is an NVME one. Loading table fails with the
> > > > > > > > following diagnostics:
> > > > > > > > 
> > > > > > > > device-mapper: table: table load rejected: including non-request-stackable devices
> > > > > > > > 
> > > > > > > > The same kernel works with the same data drive on the SCSI interface.
> > > > > > > > NVME-backed dm verity works with just this commit reverted.
> > > > > > > > 
> > > > > > > > I believe the presence of the immutable partition is used as an indicator
> > > > > > > > of special case NVME configuration and if the data device's name starts
> > > > > > > > with "nvme" the code tries to switch the target type to
> > > > > > > > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > > > > > > > 
> > > > > > > > The special NVME optimization case was removed in
> > > > > > > > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > > > > > > > affected.
> > > > > > > > 
> > > > > > > 
> > > > > > > Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> > > > > > > just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> > > > > > > immutable singleton target on NVMe") to those older kernels?  If so,
> > > > > > > have you tested this and verified that it worked?
> > > > > > 
> > > > > > Sorry for the unforeseen stable@ troubles here!
> > > > > > 
> > > > > > In general we'd be fine to apply commit 9c37de297f65 but to do it
> > > > > > properly would require also making sure commits that remove
> > > > > > "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
> > > > > > unnecessary NVMe branching in favor of scsi_dh checks") are applied --
> > > > > > basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
> > > > > > be removed.
> > > > > > 
> > > > > > The commit header for 8d47e65948dd documents what
> > > > > > DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
> > > > > > "nvme" mode really never got used by any userspace that I'm aware of.
> > > > > > 
> > > > > > Sadly I currently don't have the time to do this backport for all N
> > > > > > stable kernels... :(
> > > > > > 
> > > > > > But if that backport gets out of control: A simpler, albeit stable@
> > > > > > unicorn, way to resolve this is to simply revert 9c37de297f65 and make
> > > > 
> > > > 9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
> > > > and trying to apply it results in conflicts which at least I can not
> > > > resolve.
> > > > 
> > > > > > it so that DM-mpath and DM core just used bio-based if "nvme" is
> > > > > > requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
> > > > > > 
> > > > > > @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
> > > > > > 
> > > > > >                          if (!strcasecmp(queue_mode_name, "bio"))
> > > > > >                                  m->queue_mode = DM_TYPE_BIO_BASED;
> > > > > > 			else if (!strcasecmp(queue_mode_name, "nvme"))
> > > > > > -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
> > > > > > +                               m->queue_mode = DM_TYPE_BIO_BASED;
> > > > > >                          else if (!strcasecmp(queue_mode_name, "rq"))
> > > > > >                                  m->queue_mode = DM_TYPE_REQUEST_BASED;
> > > > > >                          else if (!strcasecmp(queue_mode_name, "mq"))
> > > > > > 
> > > > > > Mike
> > > > > > 
> > > > > 
> > > > > Ok, please submit a working patch for the kernels that need it so that
> > > > > we can review and apply it to solve this regression.
> > > > > 
> > > > 
> > > > So, effectively, v5.4.y and older are broken right now for use cases
> > > > with dm on NVME drives.
> > > > 
> > > > Given that the regression does affect older branches, and given that we
> > > > have to revert this patch to avoid regressions in ChromeOS, would it be
> > > > possible to revert it from v5.4.y and older until a fix is found ?
> > > 
> > > I obviously would prefer to not have this false-start.
> > > 
> > The false start has already happened since we had to revert the patch
> > from chromeos-5.4 and older branches.
> 
> OK, well this is pretty easy to fix in general.  If there are slight
> differences across older trees they are easily resolved.  Fact that
> stable@ couldn't cope with backporting 9c37de297f65 is.. what it is.
> 
> But this will fix the issue on 5.4.y:
> 
> From: Mike Snitzer <snitzer@kernel.org>
> Date: Wed, 15 Jun 2022 14:07:09 -0400
> Subject: [5.4.y PATCH] dm: remove special-casing of bio-based immutable singleton target on NVMe
> 
> Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.
> 
> There is no benefit to DM special-casing NVMe. Remove all code used to
> establish DM_TYPE_NVME_BIO_BASED.
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  drivers/md/dm-table.c         | 32 ++----------------
>  drivers/md/dm.c               | 64 +++--------------------------------
>  include/linux/device-mapper.h |  1 -
>  3 files changed, 7 insertions(+), 90 deletions(-)

Can someone resend this in the proper format (and fixed up), with
Guenter's tested-by so that I can queue it up?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [dm-devel] [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag
@ 2022-06-20 11:44                     ` Greg KH
  0 siblings, 0 replies; 68+ messages in thread
From: Greg KH @ 2022-06-20 11:44 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: keescook, sarthakkukreti, Mike Snitzer, stable,
	Oleksandr Tymoshenko, dm-devel, Guenter Roeck, regressions

On Wed, Jun 15, 2022 at 04:02:36PM -0400, Mike Snitzer wrote:
> On Wed, Jun 15 2022 at  1:50P -0400,
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On 6/15/22 08:29, Mike Snitzer wrote:
> > > On Wed, Jun 15 2022 at 10:36P -0400,
> > > Guenter Roeck <linux@roeck-us.net> wrote:
> > > 
> > > > On Mon, Jun 13, 2022 at 11:13:21AM +0200, Greg KH wrote:
> > > > > On Fri, Jun 10, 2022 at 11:11:00AM -0400, Mike Snitzer wrote:
> > > > > > On Fri, Jun 10 2022 at  1:15P -0400,
> > > > > > Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > > 
> > > > > > > On Fri, Jun 10, 2022 at 04:22:00AM +0000, Oleksandr Tymoshenko wrote:
> > > > > > > > I believe this commit introduced a regression in dm verity on systems
> > > > > > > > where data device is an NVME one. Loading table fails with the
> > > > > > > > following diagnostics:
> > > > > > > > 
> > > > > > > > device-mapper: table: table load rejected: including non-request-stackable devices
> > > > > > > > 
> > > > > > > > The same kernel works with the same data drive on the SCSI interface.
> > > > > > > > NVME-backed dm verity works with just this commit reverted.
> > > > > > > > 
> > > > > > > > I believe the presence of the immutable partition is used as an indicator
> > > > > > > > of special case NVME configuration and if the data device's name starts
> > > > > > > > with "nvme" the code tries to switch the target type to
> > > > > > > > DM_TYPE_NVME_BIO_BASED (drivers/md/dm-table.c lines 1003-1010).
> > > > > > > > 
> > > > > > > > The special NVME optimization case was removed in
> > > > > > > > 5.10 by commit 9c37de297f6590937f95a28bec1b7ac68a38618f, so only 5.4 is
> > > > > > > > affected.
> > > > > > > > 
> > > > > > > 
> > > > > > > Why wouldn't 4.9, 4.14, and 4.19 also be affected here?  Should I also
> > > > > > > just queue up 9c37de297f65 ("dm: remove special-casing of bio-based
> > > > > > > immutable singleton target on NVMe") to those older kernels?  If so,
> > > > > > > have you tested this and verified that it worked?
> > > > > > 
> > > > > > Sorry for the unforeseen stable@ troubles here!
> > > > > > 
> > > > > > In general we'd be fine to apply commit 9c37de297f65 but to do it
> > > > > > properly would require also making sure commits that remove
> > > > > > "DM_TYPE_NVME_BIO_BASED", like 8d47e65948dd ("dm mpath: remove
> > > > > > unnecessary NVMe branching in favor of scsi_dh checks") are applied --
> > > > > > basically any lingering references to DM_TYPE_NVME_BIO_BASED need to
> > > > > > be removed.
> > > > > > 
> > > > > > The commit header for 8d47e65948dd documents what
> > > > > > DM_TYPE_NVME_BIO_BASED was used for.. it was dm-mpath specific and
> > > > > > "nvme" mode really never got used by any userspace that I'm aware of.
> > > > > > 
> > > > > > Sadly I currently don't have the time to do this backport for all N
> > > > > > stable kernels... :(
> > > > > > 
> > > > > > But if that backport gets out of control: A simpler, albeit stable@
> > > > > > unicorn, way to resolve this is to simply revert 9c37de297f65 and make
> > > > 
> > > > 9c37de297f65 can not be reverted in 5.4 and older because it isn't there,
> > > > and trying to apply it results in conflicts which at least I can not
> > > > resolve.
> > > > 
> > > > > > it so that DM-mpath and DM core just used bio-based if "nvme" is
> > > > > > requested by dm-mpath, so also in drivers/md/dm-mpath.c e.g.:
> > > > > > 
> > > > > > @@ -1091,8 +1088,6 @@ static int parse_features(struct dm_arg_set *as, struct multipath *m)
> > > > > > 
> > > > > >                          if (!strcasecmp(queue_mode_name, "bio"))
> > > > > >                                  m->queue_mode = DM_TYPE_BIO_BASED;
> > > > > > 			else if (!strcasecmp(queue_mode_name, "nvme"))
> > > > > > -                               m->queue_mode = DM_TYPE_NVME_BIO_BASED;
> > > > > > +                               m->queue_mode = DM_TYPE_BIO_BASED;
> > > > > >                          else if (!strcasecmp(queue_mode_name, "rq"))
> > > > > >                                  m->queue_mode = DM_TYPE_REQUEST_BASED;
> > > > > >                          else if (!strcasecmp(queue_mode_name, "mq"))
> > > > > > 
> > > > > > Mike
> > > > > > 
> > > > > 
> > > > > Ok, please submit a working patch for the kernels that need it so that
> > > > > we can review and apply it to solve this regression.
> > > > > 
> > > > 
> > > > So, effectively, v5.4.y and older are broken right now for use cases
> > > > with dm on NVME drives.
> > > > 
> > > > Given that the regression does affect older branches, and given that we
> > > > have to revert this patch to avoid regressions in ChromeOS, would it be
> > > > possible to revert it from v5.4.y and older until a fix is found ?
> > > 
> > > I obviously would prefer to not have this false-start.
> > > 
> > The false start has already happened since we had to revert the patch
> > from chromeos-5.4 and older branches.
> 
> OK, well this is pretty easy to fix in general.  If there are slight
> differences across older trees they are easily resolved.  Fact that
> stable@ couldn't cope with backporting 9c37de297f65 is.. what it is.
> 
> But this will fix the issue on 5.4.y:
> 
> From: Mike Snitzer <snitzer@kernel.org>
> Date: Wed, 15 Jun 2022 14:07:09 -0400
> Subject: [5.4.y PATCH] dm: remove special-casing of bio-based immutable singleton target on NVMe
> 
> Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.
> 
> There is no benefit to DM special-casing NVMe. Remove all code used to
> establish DM_TYPE_NVME_BIO_BASED.
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  drivers/md/dm-table.c         | 32 ++----------------
>  drivers/md/dm.c               | 64 +++--------------------------------
>  include/linux/device-mapper.h |  1 -
>  3 files changed, 7 insertions(+), 90 deletions(-)

Can someone resend this in the proper format (and fixed up), with
Guenter's tested-by so that I can queue it up?

thanks,

greg k-h

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [5.4.y PATCH v2] dm: remove special-casing of bio-based immutable singleton target on NVMe
  2022-06-20 11:44                     ` [dm-devel] " Greg KH
@ 2022-06-21 16:35                       ` Mike Snitzer
  -1 siblings, 0 replies; 68+ messages in thread
From: Mike Snitzer @ 2022-06-21 16:35 UTC (permalink / raw)
  To: Greg KH
  Cc: Guenter Roeck, Mike Snitzer, keescook, sarthakkukreti, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.

There is no benefit to DM special-casing NVMe. Remove all code used to
establish DM_TYPE_NVME_BIO_BASED.

Also, remove 3 'struct mapped_device *md' variables in __map_bio() which
masked the same variable that is available within __map_bio()'s scope.

Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-table.c         | 32 +--------------
 drivers/md/dm.c               | 73 ++++-------------------------------
 include/linux/device-mapper.h |  1 -
 3 files changed, 9 insertions(+), 97 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 06b382304d92..81bc36a43b32 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args);
 static bool __table_type_bio_based(enum dm_queue_mode table_type)
 {
 	return (table_type == DM_TYPE_BIO_BASED ||
-		table_type == DM_TYPE_DAX_BIO_BASED ||
-		table_type == DM_TYPE_NVME_BIO_BASED);
+		table_type == DM_TYPE_DAX_BIO_BASED);
 }
 
 static bool __table_type_request_based(enum dm_queue_mode table_type)
@@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_table *t,
 	return true;
 }
 
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
-
 static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
 				  sector_t start, sector_t len, void *data)
 {
@@ -960,7 +957,6 @@ static int dm_table_determine_type(struct dm_table *t)
 			goto verify_bio_based;
 		}
 		BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
-		BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED);
 		goto verify_rq_based;
 	}
 
@@ -999,15 +995,6 @@ static int dm_table_determine_type(struct dm_table *t)
 		if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
 		    (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
 			t->type = DM_TYPE_DAX_BIO_BASED;
-		} else {
-			/* Check if upgrading to NVMe bio-based is valid or required */
-			tgt = dm_table_get_immutable_target(t);
-			if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-				goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
-			} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-			}
 		}
 		return 0;
 	}
@@ -1024,8 +1011,7 @@ static int dm_table_determine_type(struct dm_table *t)
 	 * (e.g. request completion process for partial completion.)
 	 */
 	if (t->num_targets > 1) {
-		DMERR("%s DM doesn't support multiple targets",
-		      t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
+		DMERR("request-based DM doesn't support multiple targets");
 		return -EINVAL;
 	}
 
@@ -1714,20 +1700,6 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
 	return q && !blk_queue_add_random(q);
 }
 
-static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev,
-					sector_t start, sector_t len, void *data)
-{
-	char b[BDEVNAME_SIZE];
-
-	/* For now, NVMe devices are the only devices of this class */
-	return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0);
-}
-
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
-{
-	return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL);
-}
-
 static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
 					 sector_t start, sector_t len, void *data)
 {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 37b8bb4d80f0..77e28f77c59f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio)
 	struct mapped_device *md = tio->io->md;
 	dm_endio_fn endio = tio->ti->type->end_io;
 
-	if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) {
+	if (unlikely(error == BLK_STS_TARGET)) {
 		if (bio_op(bio) == REQ_OP_DISCARD &&
 		    !bio->bi_disk->queue->limits.max_discard_sectors)
 			disable_discard(md);
@@ -1325,7 +1325,6 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
 	sector = clone->bi_iter.bi_sector;
 
 	if (unlikely(swap_bios_limit(ti, clone))) {
-		struct mapped_device *md = io->md;
 		int latch = get_swap_bios();
 		if (unlikely(latch != md->swap_bios))
 			__set_swap_bios_limit(md, latch);
@@ -1340,24 +1339,17 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
 		/* the bio has been remapped so dispatch it */
 		trace_block_bio_remap(clone->bi_disk->queue, clone,
 				      bio_dev(io->orig_bio), sector);
-		if (md->type == DM_TYPE_NVME_BIO_BASED)
-			ret = direct_make_request(clone);
-		else
-			ret = generic_make_request(clone);
+		ret = generic_make_request(clone);
 		break;
 	case DM_MAPIO_KILL:
-		if (unlikely(swap_bios_limit(ti, clone))) {
-			struct mapped_device *md = io->md;
+		if (unlikely(swap_bios_limit(ti, clone)))
 			up(&md->swap_bios_semaphore);
-		}
 		free_tio(tio);
 		dec_pending(io, BLK_STS_IOERR);
 		break;
 	case DM_MAPIO_REQUEUE:
-		if (unlikely(swap_bios_limit(ti, clone))) {
-			struct mapped_device *md = io->md;
+		if (unlikely(swap_bios_limit(ti, clone)))
 			up(&md->swap_bios_semaphore);
-		}
 		free_tio(tio);
 		dec_pending(io, BLK_STS_DM_REQUEUE);
 		break;
@@ -1732,51 +1724,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
 	return ret;
 }
 
-/*
- * Optimized variant of __split_and_process_bio that leverages the
- * fact that targets that use it do _not_ have a need to split bios.
- */
-static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
-			      struct bio *bio, struct dm_target *ti)
-{
-	struct clone_info ci;
-	blk_qc_t ret = BLK_QC_T_NONE;
-	int error = 0;
-
-	init_clone_info(&ci, md, map, bio);
-
-	if (bio->bi_opf & REQ_PREFLUSH) {
-		struct bio flush_bio;
-
-		/*
-		 * Use an on-stack bio for this, it's safe since we don't
-		 * need to reference it after submit. It's just used as
-		 * the basis for the clone(s).
-		 */
-		bio_init(&flush_bio, NULL, 0);
-		flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
-		ci.bio = &flush_bio;
-		ci.sector_count = 0;
-		error = __send_empty_flush(&ci);
-		bio_uninit(ci.bio);
-		/* dec_pending submits any data associated with flush */
-	} else {
-		struct dm_target_io *tio;
-
-		ci.bio = bio;
-		ci.sector_count = bio_sectors(bio);
-		if (__process_abnormal_io(&ci, ti, &error))
-			goto out;
-
-		tio = alloc_tio(&ci, ti, 0, GFP_NOIO);
-		ret = __clone_and_map_simple_bio(&ci, tio, NULL);
-	}
-out:
-	/* drop the extra reference count */
-	dec_pending(ci.io, errno_to_blk_status(error));
-	return ret;
-}
-
 static blk_qc_t dm_process_bio(struct mapped_device *md,
 			       struct dm_table *map, struct bio *bio)
 {
@@ -1807,8 +1754,6 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
 		/* regular IO is split by __split_and_process_bio */
 	}
 
-	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
-		return __process_bio(md, map, bio, ti);
 	return __split_and_process_bio(md, map, bio);
 }
 
@@ -2200,12 +2145,10 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
 	if (request_based)
 		dm_stop_queue(q);
 
-	if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) {
+	if (request_based) {
 		/*
-		 * Leverage the fact that request-based DM targets and
-		 * NVMe bio based targets are immutable singletons
-		 * - used to optimize both dm_request_fn and dm_mq_queue_rq;
-		 *   and __process_bio.
+		 * Leverage the fact that request-based DM targets are
+		 * immutable singletons - used to optimize dm_mq_queue_rq.
 		 */
 		md->immutable_target = dm_table_get_immutable_target(t);
 	}
@@ -2334,7 +2277,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
 		break;
 	case DM_TYPE_BIO_BASED:
 	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
 		dm_init_congested_fn(md);
 		break;
 	case DM_TYPE_NONE:
@@ -3070,7 +3012,6 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu
 	switch (type) {
 	case DM_TYPE_BIO_BASED:
 	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
 		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
 		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
 		io_front_pad = roundup(front_pad,  __alignof__(struct dm_io)) + offsetof(struct dm_io, tio);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a53d7d2c2d95..60631f3abddb 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -28,7 +28,6 @@ enum dm_queue_mode {
 	DM_TYPE_BIO_BASED	 = 1,
 	DM_TYPE_REQUEST_BASED	 = 2,
 	DM_TYPE_DAX_BIO_BASED	 = 3,
-	DM_TYPE_NVME_BIO_BASED	 = 4,
 };
 
 typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [dm-devel] [5.4.y PATCH v2] dm: remove special-casing of bio-based immutable singleton target on NVMe
@ 2022-06-21 16:35                       ` Mike Snitzer
  0 siblings, 0 replies; 68+ messages in thread
From: Mike Snitzer @ 2022-06-21 16:35 UTC (permalink / raw)
  To: Greg KH
  Cc: keescook, sarthakkukreti, Mike Snitzer, stable,
	Oleksandr Tymoshenko, dm-devel, Guenter Roeck, regressions

Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.

There is no benefit to DM special-casing NVMe. Remove all code used to
establish DM_TYPE_NVME_BIO_BASED.

Also, remove 3 'struct mapped_device *md' variables in __map_bio() which
masked the same variable that is available within __map_bio()'s scope.

Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/md/dm-table.c         | 32 +--------------
 drivers/md/dm.c               | 73 ++++-------------------------------
 include/linux/device-mapper.h |  1 -
 3 files changed, 9 insertions(+), 97 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 06b382304d92..81bc36a43b32 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args);
 static bool __table_type_bio_based(enum dm_queue_mode table_type)
 {
 	return (table_type == DM_TYPE_BIO_BASED ||
-		table_type == DM_TYPE_DAX_BIO_BASED ||
-		table_type == DM_TYPE_NVME_BIO_BASED);
+		table_type == DM_TYPE_DAX_BIO_BASED);
 }
 
 static bool __table_type_request_based(enum dm_queue_mode table_type)
@@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_table *t,
 	return true;
 }
 
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
-
 static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
 				  sector_t start, sector_t len, void *data)
 {
@@ -960,7 +957,6 @@ static int dm_table_determine_type(struct dm_table *t)
 			goto verify_bio_based;
 		}
 		BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
-		BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED);
 		goto verify_rq_based;
 	}
 
@@ -999,15 +995,6 @@ static int dm_table_determine_type(struct dm_table *t)
 		if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
 		    (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
 			t->type = DM_TYPE_DAX_BIO_BASED;
-		} else {
-			/* Check if upgrading to NVMe bio-based is valid or required */
-			tgt = dm_table_get_immutable_target(t);
-			if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-				goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
-			} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-			}
 		}
 		return 0;
 	}
@@ -1024,8 +1011,7 @@ static int dm_table_determine_type(struct dm_table *t)
 	 * (e.g. request completion process for partial completion.)
 	 */
 	if (t->num_targets > 1) {
-		DMERR("%s DM doesn't support multiple targets",
-		      t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
+		DMERR("request-based DM doesn't support multiple targets");
 		return -EINVAL;
 	}
 
@@ -1714,20 +1700,6 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
 	return q && !blk_queue_add_random(q);
 }
 
-static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev,
-					sector_t start, sector_t len, void *data)
-{
-	char b[BDEVNAME_SIZE];
-
-	/* For now, NVMe devices are the only devices of this class */
-	return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0);
-}
-
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
-{
-	return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL);
-}
-
 static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
 					 sector_t start, sector_t len, void *data)
 {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 37b8bb4d80f0..77e28f77c59f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio)
 	struct mapped_device *md = tio->io->md;
 	dm_endio_fn endio = tio->ti->type->end_io;
 
-	if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) {
+	if (unlikely(error == BLK_STS_TARGET)) {
 		if (bio_op(bio) == REQ_OP_DISCARD &&
 		    !bio->bi_disk->queue->limits.max_discard_sectors)
 			disable_discard(md);
@@ -1325,7 +1325,6 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
 	sector = clone->bi_iter.bi_sector;
 
 	if (unlikely(swap_bios_limit(ti, clone))) {
-		struct mapped_device *md = io->md;
 		int latch = get_swap_bios();
 		if (unlikely(latch != md->swap_bios))
 			__set_swap_bios_limit(md, latch);
@@ -1340,24 +1339,17 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
 		/* the bio has been remapped so dispatch it */
 		trace_block_bio_remap(clone->bi_disk->queue, clone,
 				      bio_dev(io->orig_bio), sector);
-		if (md->type == DM_TYPE_NVME_BIO_BASED)
-			ret = direct_make_request(clone);
-		else
-			ret = generic_make_request(clone);
+		ret = generic_make_request(clone);
 		break;
 	case DM_MAPIO_KILL:
-		if (unlikely(swap_bios_limit(ti, clone))) {
-			struct mapped_device *md = io->md;
+		if (unlikely(swap_bios_limit(ti, clone)))
 			up(&md->swap_bios_semaphore);
-		}
 		free_tio(tio);
 		dec_pending(io, BLK_STS_IOERR);
 		break;
 	case DM_MAPIO_REQUEUE:
-		if (unlikely(swap_bios_limit(ti, clone))) {
-			struct mapped_device *md = io->md;
+		if (unlikely(swap_bios_limit(ti, clone)))
 			up(&md->swap_bios_semaphore);
-		}
 		free_tio(tio);
 		dec_pending(io, BLK_STS_DM_REQUEUE);
 		break;
@@ -1732,51 +1724,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
 	return ret;
 }
 
-/*
- * Optimized variant of __split_and_process_bio that leverages the
- * fact that targets that use it do _not_ have a need to split bios.
- */
-static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
-			      struct bio *bio, struct dm_target *ti)
-{
-	struct clone_info ci;
-	blk_qc_t ret = BLK_QC_T_NONE;
-	int error = 0;
-
-	init_clone_info(&ci, md, map, bio);
-
-	if (bio->bi_opf & REQ_PREFLUSH) {
-		struct bio flush_bio;
-
-		/*
-		 * Use an on-stack bio for this, it's safe since we don't
-		 * need to reference it after submit. It's just used as
-		 * the basis for the clone(s).
-		 */
-		bio_init(&flush_bio, NULL, 0);
-		flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
-		ci.bio = &flush_bio;
-		ci.sector_count = 0;
-		error = __send_empty_flush(&ci);
-		bio_uninit(ci.bio);
-		/* dec_pending submits any data associated with flush */
-	} else {
-		struct dm_target_io *tio;
-
-		ci.bio = bio;
-		ci.sector_count = bio_sectors(bio);
-		if (__process_abnormal_io(&ci, ti, &error))
-			goto out;
-
-		tio = alloc_tio(&ci, ti, 0, GFP_NOIO);
-		ret = __clone_and_map_simple_bio(&ci, tio, NULL);
-	}
-out:
-	/* drop the extra reference count */
-	dec_pending(ci.io, errno_to_blk_status(error));
-	return ret;
-}
-
 static blk_qc_t dm_process_bio(struct mapped_device *md,
 			       struct dm_table *map, struct bio *bio)
 {
@@ -1807,8 +1754,6 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
 		/* regular IO is split by __split_and_process_bio */
 	}
 
-	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
-		return __process_bio(md, map, bio, ti);
 	return __split_and_process_bio(md, map, bio);
 }
 
@@ -2200,12 +2145,10 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
 	if (request_based)
 		dm_stop_queue(q);
 
-	if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) {
+	if (request_based) {
 		/*
-		 * Leverage the fact that request-based DM targets and
-		 * NVMe bio based targets are immutable singletons
-		 * - used to optimize both dm_request_fn and dm_mq_queue_rq;
-		 *   and __process_bio.
+		 * Leverage the fact that request-based DM targets are
+		 * immutable singletons - used to optimize dm_mq_queue_rq.
 		 */
 		md->immutable_target = dm_table_get_immutable_target(t);
 	}
@@ -2334,7 +2277,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
 		break;
 	case DM_TYPE_BIO_BASED:
 	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
 		dm_init_congested_fn(md);
 		break;
 	case DM_TYPE_NONE:
@@ -3070,7 +3012,6 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu
 	switch (type) {
 	case DM_TYPE_BIO_BASED:
 	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
 		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
 		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
 		io_front_pad = roundup(front_pad,  __alignof__(struct dm_io)) + offsetof(struct dm_io, tio);
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a53d7d2c2d95..60631f3abddb 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -28,7 +28,6 @@ enum dm_queue_mode {
 	DM_TYPE_BIO_BASED	 = 1,
 	DM_TYPE_REQUEST_BASED	 = 2,
 	DM_TYPE_DAX_BIO_BASED	 = 3,
-	DM_TYPE_NVME_BIO_BASED	 = 4,
 };
 
 typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;
-- 
2.30.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [5.4.y PATCH v2] dm: remove special-casing of bio-based immutable singleton target on NVMe
  2022-06-21 16:35                       ` [dm-devel] " Mike Snitzer
@ 2022-06-23 15:48                         ` Greg KH
  -1 siblings, 0 replies; 68+ messages in thread
From: Greg KH @ 2022-06-23 15:48 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Guenter Roeck, keescook, sarthakkukreti, stable,
	Oleksandr Tymoshenko, dm-devel, regressions

On Tue, Jun 21, 2022 at 12:35:04PM -0400, Mike Snitzer wrote:
> Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.
> 
> There is no benefit to DM special-casing NVMe. Remove all code used to
> establish DM_TYPE_NVME_BIO_BASED.
> 
> Also, remove 3 'struct mapped_device *md' variables in __map_bio() which
> masked the same variable that is available within __map_bio()'s scope.
> 
> Tested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  drivers/md/dm-table.c         | 32 +--------------
>  drivers/md/dm.c               | 73 ++++-------------------------------
>  include/linux/device-mapper.h |  1 -
>  3 files changed, 9 insertions(+), 97 deletions(-)

Now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [dm-devel] [5.4.y PATCH v2] dm: remove special-casing of bio-based immutable singleton target on NVMe
@ 2022-06-23 15:48                         ` Greg KH
  0 siblings, 0 replies; 68+ messages in thread
From: Greg KH @ 2022-06-23 15:48 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: keescook, sarthakkukreti, stable, Oleksandr Tymoshenko, dm-devel,
	Guenter Roeck, regressions

On Tue, Jun 21, 2022 at 12:35:04PM -0400, Mike Snitzer wrote:
> Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.
> 
> There is no benefit to DM special-casing NVMe. Remove all code used to
> establish DM_TYPE_NVME_BIO_BASED.
> 
> Also, remove 3 'struct mapped_device *md' variables in __map_bio() which
> masked the same variable that is available within __map_bio()'s scope.
> 
> Tested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  drivers/md/dm-table.c         | 32 +--------------
>  drivers/md/dm.c               | 73 ++++-------------------------------
>  include/linux/device-mapper.h |  1 -
>  3 files changed, 9 insertions(+), 97 deletions(-)

Now queued up, thanks.

greg k-h

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Patch "dm: remove special-casing of bio-based immutable singleton target on NVMe" has been added to the 5.4-stable tree
  2022-06-21 16:35                       ` [dm-devel] " Mike Snitzer
@ 2022-06-23 16:00                         ` gregkh
  -1 siblings, 0 replies; 68+ messages in thread
From: gregkh @ 2022-06-23 16:00 UTC (permalink / raw)
  To: dm-devel, gregkh, keescook, linux, ovt, regressions,
	sarthakkukreti, snitzer
  Cc: stable-commits


This is a note to let you know that I've just added the patch titled

    dm: remove special-casing of bio-based immutable singleton target on NVMe

to the 5.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     dm-remove-special-casing-of-bio-based-immutable-singleton-target-on-nvme.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


From snitzer@kernel.org  Thu Jun 23 17:47:22 2022
From: Mike Snitzer <snitzer@kernel.org>
Date: Tue, 21 Jun 2022 12:35:04 -0400
Subject: dm: remove special-casing of bio-based immutable singleton target on NVMe
To: Greg KH <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>, Mike Snitzer <snitzer@kernel.org>, keescook@chromium.org, sarthakkukreti@google.com, stable@vger.kernel.org, Oleksandr Tymoshenko <ovt@google.com>, dm-devel@redhat.com, regressions@lists.linux.dev
Message-ID: <YrHzOGO5fOSFwqdJ@redhat.com>
Content-Disposition: inline

From: Mike Snitzer <snitzer@kernel.org>

Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.

There is no benefit to DM special-casing NVMe. Remove all code used to
establish DM_TYPE_NVME_BIO_BASED.

Also, remove 3 'struct mapped_device *md' variables in __map_bio() which
masked the same variable that is available within __map_bio()'s scope.

Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/md/dm-table.c         |   32 +-----------------
 drivers/md/dm.c               |   73 ++++--------------------------------------
 include/linux/device-mapper.h |    1 
 3 files changed, 9 insertions(+), 97 deletions(-)

--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args);
 static bool __table_type_bio_based(enum dm_queue_mode table_type)
 {
 	return (table_type == DM_TYPE_BIO_BASED ||
-		table_type == DM_TYPE_DAX_BIO_BASED ||
-		table_type == DM_TYPE_NVME_BIO_BASED);
+		table_type == DM_TYPE_DAX_BIO_BASED);
 }
 
 static bool __table_type_request_based(enum dm_queue_mode table_type)
@@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_tab
 	return true;
 }
 
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
-
 static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
 				  sector_t start, sector_t len, void *data)
 {
@@ -960,7 +957,6 @@ static int dm_table_determine_type(struc
 			goto verify_bio_based;
 		}
 		BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
-		BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED);
 		goto verify_rq_based;
 	}
 
@@ -999,15 +995,6 @@ verify_bio_based:
 		if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
 		    (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
 			t->type = DM_TYPE_DAX_BIO_BASED;
-		} else {
-			/* Check if upgrading to NVMe bio-based is valid or required */
-			tgt = dm_table_get_immutable_target(t);
-			if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-				goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
-			} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-			}
 		}
 		return 0;
 	}
@@ -1024,8 +1011,7 @@ verify_rq_based:
 	 * (e.g. request completion process for partial completion.)
 	 */
 	if (t->num_targets > 1) {
-		DMERR("%s DM doesn't support multiple targets",
-		      t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
+		DMERR("request-based DM doesn't support multiple targets");
 		return -EINVAL;
 	}
 
@@ -1714,20 +1700,6 @@ static int device_is_not_random(struct d
 	return q && !blk_queue_add_random(q);
 }
 
-static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev,
-					sector_t start, sector_t len, void *data)
-{
-	char b[BDEVNAME_SIZE];
-
-	/* For now, NVMe devices are the only devices of this class */
-	return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0);
-}
-
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
-{
-	return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL);
-}
-
 static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
 					 sector_t start, sector_t len, void *data)
 {
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio)
 	struct mapped_device *md = tio->io->md;
 	dm_endio_fn endio = tio->ti->type->end_io;
 
-	if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) {
+	if (unlikely(error == BLK_STS_TARGET)) {
 		if (bio_op(bio) == REQ_OP_DISCARD &&
 		    !bio->bi_disk->queue->limits.max_discard_sectors)
 			disable_discard(md);
@@ -1325,7 +1325,6 @@ static blk_qc_t __map_bio(struct dm_targ
 	sector = clone->bi_iter.bi_sector;
 
 	if (unlikely(swap_bios_limit(ti, clone))) {
-		struct mapped_device *md = io->md;
 		int latch = get_swap_bios();
 		if (unlikely(latch != md->swap_bios))
 			__set_swap_bios_limit(md, latch);
@@ -1340,24 +1339,17 @@ static blk_qc_t __map_bio(struct dm_targ
 		/* the bio has been remapped so dispatch it */
 		trace_block_bio_remap(clone->bi_disk->queue, clone,
 				      bio_dev(io->orig_bio), sector);
-		if (md->type == DM_TYPE_NVME_BIO_BASED)
-			ret = direct_make_request(clone);
-		else
-			ret = generic_make_request(clone);
+		ret = generic_make_request(clone);
 		break;
 	case DM_MAPIO_KILL:
-		if (unlikely(swap_bios_limit(ti, clone))) {
-			struct mapped_device *md = io->md;
+		if (unlikely(swap_bios_limit(ti, clone)))
 			up(&md->swap_bios_semaphore);
-		}
 		free_tio(tio);
 		dec_pending(io, BLK_STS_IOERR);
 		break;
 	case DM_MAPIO_REQUEUE:
-		if (unlikely(swap_bios_limit(ti, clone))) {
-			struct mapped_device *md = io->md;
+		if (unlikely(swap_bios_limit(ti, clone)))
 			up(&md->swap_bios_semaphore);
-		}
 		free_tio(tio);
 		dec_pending(io, BLK_STS_DM_REQUEUE);
 		break;
@@ -1732,51 +1724,6 @@ static blk_qc_t __split_and_process_bio(
 	return ret;
 }
 
-/*
- * Optimized variant of __split_and_process_bio that leverages the
- * fact that targets that use it do _not_ have a need to split bios.
- */
-static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
-			      struct bio *bio, struct dm_target *ti)
-{
-	struct clone_info ci;
-	blk_qc_t ret = BLK_QC_T_NONE;
-	int error = 0;
-
-	init_clone_info(&ci, md, map, bio);
-
-	if (bio->bi_opf & REQ_PREFLUSH) {
-		struct bio flush_bio;
-
-		/*
-		 * Use an on-stack bio for this, it's safe since we don't
-		 * need to reference it after submit. It's just used as
-		 * the basis for the clone(s).
-		 */
-		bio_init(&flush_bio, NULL, 0);
-		flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
-		ci.bio = &flush_bio;
-		ci.sector_count = 0;
-		error = __send_empty_flush(&ci);
-		bio_uninit(ci.bio);
-		/* dec_pending submits any data associated with flush */
-	} else {
-		struct dm_target_io *tio;
-
-		ci.bio = bio;
-		ci.sector_count = bio_sectors(bio);
-		if (__process_abnormal_io(&ci, ti, &error))
-			goto out;
-
-		tio = alloc_tio(&ci, ti, 0, GFP_NOIO);
-		ret = __clone_and_map_simple_bio(&ci, tio, NULL);
-	}
-out:
-	/* drop the extra reference count */
-	dec_pending(ci.io, errno_to_blk_status(error));
-	return ret;
-}
-
 static blk_qc_t dm_process_bio(struct mapped_device *md,
 			       struct dm_table *map, struct bio *bio)
 {
@@ -1807,8 +1754,6 @@ static blk_qc_t dm_process_bio(struct ma
 		/* regular IO is split by __split_and_process_bio */
 	}
 
-	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
-		return __process_bio(md, map, bio, ti);
 	return __split_and_process_bio(md, map, bio);
 }
 
@@ -2200,12 +2145,10 @@ static struct dm_table *__bind(struct ma
 	if (request_based)
 		dm_stop_queue(q);
 
-	if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) {
+	if (request_based) {
 		/*
-		 * Leverage the fact that request-based DM targets and
-		 * NVMe bio based targets are immutable singletons
-		 * - used to optimize both dm_request_fn and dm_mq_queue_rq;
-		 *   and __process_bio.
+		 * Leverage the fact that request-based DM targets are
+		 * immutable singletons - used to optimize dm_mq_queue_rq.
 		 */
 		md->immutable_target = dm_table_get_immutable_target(t);
 	}
@@ -2334,7 +2277,6 @@ int dm_setup_md_queue(struct mapped_devi
 		break;
 	case DM_TYPE_BIO_BASED:
 	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
 		dm_init_congested_fn(md);
 		break;
 	case DM_TYPE_NONE:
@@ -3070,7 +3012,6 @@ struct dm_md_mempools *dm_alloc_md_mempo
 	switch (type) {
 	case DM_TYPE_BIO_BASED:
 	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
 		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
 		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
 		io_front_pad = roundup(front_pad,  __alignof__(struct dm_io)) + offsetof(struct dm_io, tio);
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -28,7 +28,6 @@ enum dm_queue_mode {
 	DM_TYPE_BIO_BASED	 = 1,
 	DM_TYPE_REQUEST_BASED	 = 2,
 	DM_TYPE_DAX_BIO_BASED	 = 3,
-	DM_TYPE_NVME_BIO_BASED	 = 4,
 };
 
 typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;


Patches currently in stable-queue which might be from snitzer@kernel.org are

queue-5.4/dm-remove-special-casing-of-bio-based-immutable-singleton-target-on-nvme.patch

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [dm-devel] Patch "dm: remove special-casing of bio-based immutable singleton target on NVMe" has been added to the 5.4-stable tree
@ 2022-06-23 16:00                         ` gregkh
  0 siblings, 0 replies; 68+ messages in thread
From: gregkh @ 2022-06-23 16:00 UTC (permalink / raw)
  To: dm-devel, gregkh, keescook, linux, ovt, regressions,
	sarthakkukreti, snitzer
  Cc: stable-commits


This is a note to let you know that I've just added the patch titled

    dm: remove special-casing of bio-based immutable singleton target on NVMe

to the 5.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     dm-remove-special-casing-of-bio-based-immutable-singleton-target-on-nvme.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


From snitzer@kernel.org  Thu Jun 23 17:47:22 2022
From: Mike Snitzer <snitzer@kernel.org>
Date: Tue, 21 Jun 2022 12:35:04 -0400
Subject: dm: remove special-casing of bio-based immutable singleton target on NVMe
To: Greg KH <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>, Mike Snitzer <snitzer@kernel.org>, keescook@chromium.org, sarthakkukreti@google.com, stable@vger.kernel.org, Oleksandr Tymoshenko <ovt@google.com>, dm-devel@redhat.com, regressions@lists.linux.dev
Message-ID: <YrHzOGO5fOSFwqdJ@redhat.com>
Content-Disposition: inline

From: Mike Snitzer <snitzer@kernel.org>

Commit 9c37de297f6590937f95a28bec1b7ac68a38618f upstream.

There is no benefit to DM special-casing NVMe. Remove all code used to
establish DM_TYPE_NVME_BIO_BASED.

Also, remove 3 'struct mapped_device *md' variables in __map_bio() which
masked the same variable that is available within __map_bio()'s scope.

Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/md/dm-table.c         |   32 +-----------------
 drivers/md/dm.c               |   73 ++++--------------------------------------
 include/linux/device-mapper.h |    1 
 3 files changed, 9 insertions(+), 97 deletions(-)

--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -872,8 +872,7 @@ EXPORT_SYMBOL(dm_consume_args);
 static bool __table_type_bio_based(enum dm_queue_mode table_type)
 {
 	return (table_type == DM_TYPE_BIO_BASED ||
-		table_type == DM_TYPE_DAX_BIO_BASED ||
-		table_type == DM_TYPE_NVME_BIO_BASED);
+		table_type == DM_TYPE_DAX_BIO_BASED);
 }
 
 static bool __table_type_request_based(enum dm_queue_mode table_type)
@@ -929,8 +928,6 @@ bool dm_table_supports_dax(struct dm_tab
 	return true;
 }
 
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t);
-
 static int device_is_rq_stackable(struct dm_target *ti, struct dm_dev *dev,
 				  sector_t start, sector_t len, void *data)
 {
@@ -960,7 +957,6 @@ static int dm_table_determine_type(struc
 			goto verify_bio_based;
 		}
 		BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED);
-		BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED);
 		goto verify_rq_based;
 	}
 
@@ -999,15 +995,6 @@ verify_bio_based:
 		if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
 		    (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
 			t->type = DM_TYPE_DAX_BIO_BASED;
-		} else {
-			/* Check if upgrading to NVMe bio-based is valid or required */
-			tgt = dm_table_get_immutable_target(t);
-			if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-				goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */
-			} else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) {
-				t->type = DM_TYPE_NVME_BIO_BASED;
-			}
 		}
 		return 0;
 	}
@@ -1024,8 +1011,7 @@ verify_rq_based:
 	 * (e.g. request completion process for partial completion.)
 	 */
 	if (t->num_targets > 1) {
-		DMERR("%s DM doesn't support multiple targets",
-		      t->type == DM_TYPE_NVME_BIO_BASED ? "nvme bio-based" : "request-based");
+		DMERR("request-based DM doesn't support multiple targets");
 		return -EINVAL;
 	}
 
@@ -1714,20 +1700,6 @@ static int device_is_not_random(struct d
 	return q && !blk_queue_add_random(q);
 }
 
-static int device_is_partial_completion(struct dm_target *ti, struct dm_dev *dev,
-					sector_t start, sector_t len, void *data)
-{
-	char b[BDEVNAME_SIZE];
-
-	/* For now, NVMe devices are the only devices of this class */
-	return (strncmp(bdevname(dev->bdev, b), "nvme", 4) != 0);
-}
-
-static bool dm_table_does_not_support_partial_completion(struct dm_table *t)
-{
-	return !dm_table_any_dev_attr(t, device_is_partial_completion, NULL);
-}
-
 static int device_not_write_same_capable(struct dm_target *ti, struct dm_dev *dev,
 					 sector_t start, sector_t len, void *data)
 {
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1000,7 +1000,7 @@ static void clone_endio(struct bio *bio)
 	struct mapped_device *md = tio->io->md;
 	dm_endio_fn endio = tio->ti->type->end_io;
 
-	if (unlikely(error == BLK_STS_TARGET) && md->type != DM_TYPE_NVME_BIO_BASED) {
+	if (unlikely(error == BLK_STS_TARGET)) {
 		if (bio_op(bio) == REQ_OP_DISCARD &&
 		    !bio->bi_disk->queue->limits.max_discard_sectors)
 			disable_discard(md);
@@ -1325,7 +1325,6 @@ static blk_qc_t __map_bio(struct dm_targ
 	sector = clone->bi_iter.bi_sector;
 
 	if (unlikely(swap_bios_limit(ti, clone))) {
-		struct mapped_device *md = io->md;
 		int latch = get_swap_bios();
 		if (unlikely(latch != md->swap_bios))
 			__set_swap_bios_limit(md, latch);
@@ -1340,24 +1339,17 @@ static blk_qc_t __map_bio(struct dm_targ
 		/* the bio has been remapped so dispatch it */
 		trace_block_bio_remap(clone->bi_disk->queue, clone,
 				      bio_dev(io->orig_bio), sector);
-		if (md->type == DM_TYPE_NVME_BIO_BASED)
-			ret = direct_make_request(clone);
-		else
-			ret = generic_make_request(clone);
+		ret = generic_make_request(clone);
 		break;
 	case DM_MAPIO_KILL:
-		if (unlikely(swap_bios_limit(ti, clone))) {
-			struct mapped_device *md = io->md;
+		if (unlikely(swap_bios_limit(ti, clone)))
 			up(&md->swap_bios_semaphore);
-		}
 		free_tio(tio);
 		dec_pending(io, BLK_STS_IOERR);
 		break;
 	case DM_MAPIO_REQUEUE:
-		if (unlikely(swap_bios_limit(ti, clone))) {
-			struct mapped_device *md = io->md;
+		if (unlikely(swap_bios_limit(ti, clone)))
 			up(&md->swap_bios_semaphore);
-		}
 		free_tio(tio);
 		dec_pending(io, BLK_STS_DM_REQUEUE);
 		break;
@@ -1732,51 +1724,6 @@ static blk_qc_t __split_and_process_bio(
 	return ret;
 }
 
-/*
- * Optimized variant of __split_and_process_bio that leverages the
- * fact that targets that use it do _not_ have a need to split bios.
- */
-static blk_qc_t __process_bio(struct mapped_device *md, struct dm_table *map,
-			      struct bio *bio, struct dm_target *ti)
-{
-	struct clone_info ci;
-	blk_qc_t ret = BLK_QC_T_NONE;
-	int error = 0;
-
-	init_clone_info(&ci, md, map, bio);
-
-	if (bio->bi_opf & REQ_PREFLUSH) {
-		struct bio flush_bio;
-
-		/*
-		 * Use an on-stack bio for this, it's safe since we don't
-		 * need to reference it after submit. It's just used as
-		 * the basis for the clone(s).
-		 */
-		bio_init(&flush_bio, NULL, 0);
-		flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
-		ci.bio = &flush_bio;
-		ci.sector_count = 0;
-		error = __send_empty_flush(&ci);
-		bio_uninit(ci.bio);
-		/* dec_pending submits any data associated with flush */
-	} else {
-		struct dm_target_io *tio;
-
-		ci.bio = bio;
-		ci.sector_count = bio_sectors(bio);
-		if (__process_abnormal_io(&ci, ti, &error))
-			goto out;
-
-		tio = alloc_tio(&ci, ti, 0, GFP_NOIO);
-		ret = __clone_and_map_simple_bio(&ci, tio, NULL);
-	}
-out:
-	/* drop the extra reference count */
-	dec_pending(ci.io, errno_to_blk_status(error));
-	return ret;
-}
-
 static blk_qc_t dm_process_bio(struct mapped_device *md,
 			       struct dm_table *map, struct bio *bio)
 {
@@ -1807,8 +1754,6 @@ static blk_qc_t dm_process_bio(struct ma
 		/* regular IO is split by __split_and_process_bio */
 	}
 
-	if (dm_get_md_type(md) == DM_TYPE_NVME_BIO_BASED)
-		return __process_bio(md, map, bio, ti);
 	return __split_and_process_bio(md, map, bio);
 }
 
@@ -2200,12 +2145,10 @@ static struct dm_table *__bind(struct ma
 	if (request_based)
 		dm_stop_queue(q);
 
-	if (request_based || md->type == DM_TYPE_NVME_BIO_BASED) {
+	if (request_based) {
 		/*
-		 * Leverage the fact that request-based DM targets and
-		 * NVMe bio based targets are immutable singletons
-		 * - used to optimize both dm_request_fn and dm_mq_queue_rq;
-		 *   and __process_bio.
+		 * Leverage the fact that request-based DM targets are
+		 * immutable singletons - used to optimize dm_mq_queue_rq.
 		 */
 		md->immutable_target = dm_table_get_immutable_target(t);
 	}
@@ -2334,7 +2277,6 @@ int dm_setup_md_queue(struct mapped_devi
 		break;
 	case DM_TYPE_BIO_BASED:
 	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
 		dm_init_congested_fn(md);
 		break;
 	case DM_TYPE_NONE:
@@ -3070,7 +3012,6 @@ struct dm_md_mempools *dm_alloc_md_mempo
 	switch (type) {
 	case DM_TYPE_BIO_BASED:
 	case DM_TYPE_DAX_BIO_BASED:
-	case DM_TYPE_NVME_BIO_BASED:
 		pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
 		front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone);
 		io_front_pad = roundup(front_pad,  __alignof__(struct dm_io)) + offsetof(struct dm_io, tio);
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -28,7 +28,6 @@ enum dm_queue_mode {
 	DM_TYPE_BIO_BASED	 = 1,
 	DM_TYPE_REQUEST_BASED	 = 2,
 	DM_TYPE_DAX_BIO_BASED	 = 3,
-	DM_TYPE_NVME_BIO_BASED	 = 4,
 };
 
 typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t;


Patches currently in stable-queue which might be from snitzer@kernel.org are

queue-5.4/dm-remove-special-casing-of-bio-based-immutable-singleton-target-on-nvme.patch

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2022-06-23 16:01 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-03 17:42 [PATCH 5.4 00/34] 5.4.197-rc1 review Greg Kroah-Hartman
2022-06-03 17:42 ` [PATCH 5.4 01/34] lockdown: also lock down previous kgdb use Greg Kroah-Hartman
2022-06-03 17:42 ` [PATCH 5.4 02/34] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests Greg Kroah-Hartman
2022-06-03 17:42 ` [PATCH 5.4 03/34] staging: rtl8723bs: prevent ->Ssid overflow in rtw_wx_set_scan() Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 04/34] Input: goodix - fix spurious key release events Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 05/34] tcp: change source port randomizarion at connect() time Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 06/34] secure_seq: use the 64 bits of the siphash for port offset calculation Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 07/34] media: vim2m: Register video device after setting up internals Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 08/34] media: vim2m: initialize the media device earlier Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 09/34] ACPI: sysfs: Make sparse happy about address space in use Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 10/34] ACPI: sysfs: Fix BERT error region memory mapping Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 11/34] pinctrl: sunxi: fix f1c100s uart2 function Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 12/34] net: af_key: check encryption module availability consistency Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 13/34] net: ftgmac100: Disable hardware checksum on AST2600 Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 14/34] i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 15/34] drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 16/34] assoc_array: Fix BUG_ON during garbage collect Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 17/34] cfg80211: set custom regdomain after wiphy registration Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 18/34] drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency() Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 19/34] exec: Force single empty string when argv is empty Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 20/34] netfilter: conntrack: re-fetch conntrack after insertion Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 21/34] crypto: ecrdsa - Fix incorrect use of vli_cmp Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 22/34] zsmalloc: fix races between asynchronous zspage free and page migration Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 23/34] dm integrity: fix error code in dm_integrity_ctr() Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 24/34] dm crypt: make printing of the key constant-time Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 25/34] dm stats: add cond_resched when looping over entries Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 26/34] dm verity: set DM_TARGET_IMMUTABLE feature flag Greg Kroah-Hartman
2022-06-10  4:22   ` Oleksandr Tymoshenko
2022-06-10  5:15     ` Greg KH
2022-06-10  8:10       ` Oleksandr Tymoshenko
2022-06-10 15:11       ` Mike Snitzer
2022-06-10 15:11         ` [dm-devel] " Mike Snitzer
2022-06-13  9:13         ` Greg KH
2022-06-13  9:13           ` [dm-devel] " Greg KH
2022-06-15 14:36           ` Guenter Roeck
2022-06-15 14:36             ` [dm-devel] " Guenter Roeck
2022-06-15 15:29             ` Mike Snitzer
2022-06-15 15:29               ` [dm-devel] " Mike Snitzer
2022-06-15 17:50               ` Guenter Roeck
2022-06-15 17:50                 ` [dm-devel] " Guenter Roeck
2022-06-15 20:02                 ` Mike Snitzer
2022-06-15 20:02                   ` [dm-devel] " Mike Snitzer
2022-06-15 20:40                   ` Guenter Roeck
2022-06-15 20:40                     ` [dm-devel] " Guenter Roeck
2022-06-15 23:59                   ` Guenter Roeck
2022-06-15 23:59                     ` [dm-devel] " Guenter Roeck
2022-06-16 23:22                   ` Guenter Roeck
2022-06-16 23:22                     ` [dm-devel] " Guenter Roeck
2022-06-20 11:44                   ` Greg KH
2022-06-20 11:44                     ` [dm-devel] " Greg KH
2022-06-21 16:35                     ` [5.4.y PATCH v2] dm: remove special-casing of bio-based immutable singleton target on NVMe Mike Snitzer
2022-06-21 16:35                       ` [dm-devel] " Mike Snitzer
2022-06-23 15:48                       ` Greg KH
2022-06-23 15:48                         ` [dm-devel] " Greg KH
2022-06-23 16:00                       ` Patch "dm: remove special-casing of bio-based immutable singleton target on NVMe" has been added to the 5.4-stable tree gregkh
2022-06-23 16:00                         ` [dm-devel] " gregkh
2022-06-03 17:43 ` [PATCH 5.4 27/34] raid5: introduce MD_BROKEN Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 28/34] HID: multitouch: Add support for Google Whiskers Touchpad Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 29/34] tpm: Fix buffer access in tpm2_get_tpm_pt() Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 30/34] tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe() Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 31/34] docs: submitting-patches: Fix crossref to The canonical patch format Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 32/34] NFS: Memory allocation failures are not server fatal errors Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 33/34] NFSD: Fix possible sleep during nfsd4_release_lockowner() Greg Kroah-Hartman
2022-06-03 17:43 ` [PATCH 5.4 34/34] bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes Greg Kroah-Hartman
2022-06-04 12:21 ` [PATCH 5.4 00/34] 5.4.197-rc1 review Sudip Mukherjee
2022-06-04 17:31 ` Naresh Kamboju
2022-06-04 18:54 ` Guenter Roeck
2022-06-06  1:08 ` Samuel Zou

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.