All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Benjamin Block <bblock@linux.ibm.com>,
	Steffen Maier <maier@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>
Subject: [PATCH 5.10 008/100] scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
Date: Mon, 31 Jan 2022 11:55:29 +0100	[thread overview]
Message-ID: <20220131105220.724050290@linuxfoundation.org> (raw)
In-Reply-To: <20220131105220.424085452@linuxfoundation.org>

From: Steffen Maier <maier@linux.ibm.com>

commit 8c9db6679be4348b8aae108e11d4be2f83976e30 upstream.

Suppose we have an environment with a number of non-NPIV FCP devices
(virtual HBAs / FCP devices / zfcp "adapter"s) sharing the same physical
FCP channel (HBA port) and its I_T nexus. Plus a number of storage target
ports zoned to such shared channel. Now one target port logs out of the
fabric causing an RSCN. Zfcp reacts with an ADISC ELS and subsequent port
recovery depending on the ADISC result. This happens on all such FCP
devices (in different Linux images) concurrently as they all receive a copy
of this RSCN. In the following we look at one of those FCP devices.

Requests other than FSF_QTCB_FCP_CMND can be slow until they get a
response.

Depending on which requests are affected by slow responses, there are
different recovery outcomes. Here we want to fix failed recoveries on port
or adapter level by avoiding recovery requests that can be slow.

We need the cached N_Port_ID for the remote port "link" test with ADISC.
Just before sending the ADISC, we now intentionally forget the old cached
N_Port_ID. The idea is that on receiving an RSCN for a port, we have to
assume that any cached information about this port is stale.  This forces a
fresh new GID_PN [FC-GS] nameserver lookup on any subsequent recovery for
the same port. Since we typically can still communicate with the nameserver
efficiently, we now reach steady state quicker: Either the nameserver still
does not know about the port so we stop recovery, or the nameserver already
knows the port potentially with a new N_Port_ID and we can successfully and
quickly perform open port recovery.  For the one case, where ADISC returns
successfully, we re-initialize port->d_id because that case does not
involve any port recovery.

This also solves a problem if the storage WWPN quickly logs into the fabric
again but with a different N_Port_ID. Such as on virtual WWPN takeover
during target NPIV failover.
[https://www.redbooks.ibm.com/abstracts/redp5477.html] In that case the
RSCN from the storage FDISC was ignored by zfcp and we could not
successfully recover the failover. On some later failback on the storage,
we could have been lucky if the virtual WWPN got the same old N_Port_ID
from the SAN switch as we still had cached.  Then the related RSCN
triggered a successful port reopen recovery.  However, there is no
guarantee to get the same N_Port_ID on NPIV FDISC.

Even though NPIV-enabled FCP devices are not affected by this problem, this
code change optimizes recovery time for gone remote ports as a side effect.
The timely drop of cached N_Port_IDs prevents unnecessary slow open port
attempts.

While the problem might have been in code before v2.6.32 commit
799b76d09aee ("[SCSI] zfcp: Decouple gid_pn requests from erp") this fix
depends on the gid_pn_work introduced with that commit, so we mark it as
culprit to satisfy fix dependencies.

Note: Point-to-point remote port is already handled separately and gets its
N_Port_ID from the cached peer_d_id. So resetting port->d_id in general
does not affect PtP.

Link: https://lore.kernel.org/r/20220118165803.3667947-1-maier@linux.ibm.com
Fixes: 799b76d09aee ("[SCSI] zfcp: Decouple gid_pn requests from erp")
Cc: <stable@vger.kernel.org> #2.6.32+
Suggested-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/s390/scsi/zfcp_fc.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

--- a/drivers/s390/scsi/zfcp_fc.c
+++ b/drivers/s390/scsi/zfcp_fc.c
@@ -521,6 +521,8 @@ static void zfcp_fc_adisc_handler(void *
 		goto out;
 	}
 
+	/* re-init to undo drop from zfcp_fc_adisc() */
+	port->d_id = ntoh24(adisc_resp->adisc_port_id);
 	/* port is good, unblock rport without going through erp */
 	zfcp_scsi_schedule_rport_register(port);
  out:
@@ -534,6 +536,7 @@ static int zfcp_fc_adisc(struct zfcp_por
 	struct zfcp_fc_req *fc_req;
 	struct zfcp_adapter *adapter = port->adapter;
 	struct Scsi_Host *shost = adapter->scsi_host;
+	u32 d_id;
 	int ret;
 
 	fc_req = kmem_cache_zalloc(zfcp_fc_req_cache, GFP_ATOMIC);
@@ -558,7 +561,15 @@ static int zfcp_fc_adisc(struct zfcp_por
 	fc_req->u.adisc.req.adisc_cmd = ELS_ADISC;
 	hton24(fc_req->u.adisc.req.adisc_port_id, fc_host_port_id(shost));
 
-	ret = zfcp_fsf_send_els(adapter, port->d_id, &fc_req->ct_els,
+	d_id = port->d_id; /* remember as destination for send els below */
+	/*
+	 * Force fresh GID_PN lookup on next port recovery.
+	 * Must happen after request setup and before sending request,
+	 * to prevent race with port->d_id re-init in zfcp_fc_adisc_handler().
+	 */
+	port->d_id = 0;
+
+	ret = zfcp_fsf_send_els(adapter, d_id, &fc_req->ct_els,
 				ZFCP_FC_CTELS_TMO);
 	if (ret)
 		kmem_cache_free(zfcp_fc_req_cache, fc_req);



  parent reply	other threads:[~2022-01-31 11:06 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-31 10:55 [PATCH 5.10 000/100] 5.10.96-rc1 review Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 001/100] Bluetooth: refactor malicious adv data check Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 002/100] media: venus: core: Drop second v4l2 device unregister Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 003/100] net: sfp: ignore disabled SFP node Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 004/100] net: stmmac: skip only stmmac_ptp_register when resume from suspend Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 005/100] s390/module: fix loading modules with a lot of relocations Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 006/100] s390/hypfs: include z/VM guests with access control group set Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 007/100] bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack() Greg Kroah-Hartman
2022-01-31 10:55 ` Greg Kroah-Hartman [this message]
2022-01-31 10:55 ` [PATCH 5.10 009/100] udf: Restore i_lenAlloc when inode expansion fails Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 010/100] udf: Fix NULL ptr deref when converting from inline format Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 011/100] efi: runtime: avoid EFIv2 runtime services on Apple x86 machines Greg Kroah-Hartman
2022-02-03 20:52   ` Pavel Machek
2022-02-03 20:59     ` Matthew Garrett
2022-01-31 10:55 ` [PATCH 5.10 012/100] PM: wakeup: simplify the output logic of pm_show_wakelocks() Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 013/100] tracing/histogram: Fix a potential memory leak for kstrdup() Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 014/100] tracing: Dont inc err_log entry count if entry allocation fails Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 015/100] ceph: properly put ceph_string reference after async create attempt Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 016/100] ceph: set pool_ns in new inode layout for async creates Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 017/100] fsnotify: fix fsnotify hooks in pseudo filesystems Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 018/100] Revert "KVM: SVM: avoid infinite loop on NPF from bad address" Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 019/100] perf/x86/intel/uncore: Fix CAS_COUNT_WRITE issue for ICX Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 020/100] drm/etnaviv: relax submit size limits Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 021/100] KVM: x86: Update vCPUs runtime CPUID on write to MSR_IA32_XSS Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 022/100] arm64: errata: Fix exec handling in erratum 1418040 workaround Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 023/100] netfilter: nft_payload: do not update layer 4 checksum when mangling fragments Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 024/100] serial: 8250: of: Fix mapped region size when using reg-offset property Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 025/100] serial: stm32: fix software flow control transfer Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 026/100] tty: n_gsm: fix SW flow control encoding/handling Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 027/100] tty: Add support for Brainboxes UC cards Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 028/100] usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 029/100] usb: xhci-plat: fix crash when suspend if remote wake enable Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 030/100] usb: common: ulpi: Fix crash in ulpi_match() Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 031/100] usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 032/100] USB: core: Fix hang in usb_kill_urb by adding memory barriers Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 033/100] usb: typec: tcpm: Do not disconnect while receiving VBUS off Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 034/100] ucsi_ccg: Check DEV_INT bit only when starting CCG4 Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 035/100] jbd2: export jbd2_journal_[grab|put]_journal_head Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 036/100] ocfs2: fix a deadlock when commit trans Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 037/100] sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask Greg Kroah-Hartman
2022-01-31 10:55 ` [PATCH 5.10 038/100] x86/MCE/AMD: Allow thresholding interface updates after init Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 039/100] powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 040/100] powerpc/32s: Fix kasan_init_region() for KASAN Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 041/100] powerpc/32: Fix boot failure with GCC latent entropy plugin Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 042/100] i40e: Increase delay to 1 s after global EMP reset Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 043/100] i40e: Fix issue when maximum queues is exceeded Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 044/100] i40e: Fix queues reservation for XDP Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 045/100] i40e: Fix for failed to init adminq while VF reset Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 046/100] i40e: fix unsigned stat widths Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 047/100] usb: roles: fix include/linux/usb/role.h compile issue Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 048/100] rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 049/100] rpmsg: char: Fix race between the release of rpmsg_eptdev " Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 050/100] scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put() Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 051/100] ipv6_tunnel: Rate limit warning messages Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 052/100] ARM: 9170/1: fix panic when kasan and kprobe are enabled Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 053/100] net: fix information leakage in /proc/net/ptype Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 054/100] hwmon: (lm90) Mark alert as broken for MAX6646/6647/6649 Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 055/100] hwmon: (lm90) Mark alert as broken for MAX6680 Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 056/100] ping: fix the sk_bound_dev_if match in ping_lookup Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 057/100] ipv4: avoid using shared IP generator for connected sockets Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 058/100] hwmon: (lm90) Reduce maximum conversion rate for G781 Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 059/100] NFSv4: Handle case where the lookup of a directory fails Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 060/100] NFSv4: nfs_atomic_open() can race when looking up a non-regular file Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 061/100] net-procfs: show net devices bound packet types Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 062/100] drm/msm: Fix wrong size calculation Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 063/100] drm/msm/dsi: Fix missing put_device() call in dsi_get_phy Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 064/100] drm/msm/dsi: invalid parameter check in msm_dsi_phy_enable Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 065/100] ipv6: annotate accesses to fn->fn_sernum Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 066/100] NFS: Ensure the server has an up to date ctime before hardlinking Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 067/100] NFS: Ensure the server has an up to date ctime before renaming Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 068/100] powerpc64/bpf: Limit ldbrx to processors compliant with ISA v2.06 Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 069/100] netfilter: conntrack: dont increment invalid counter on NF_REPEAT Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 070/100] kernel: delete repeated words in comments Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 071/100] perf: Fix perf_event_read_local() time Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 072/100] sched/pelt: Relax the sync of util_sum with util_avg Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 073/100] net: phy: broadcom: hook up soft_reset for BCM54616S Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 074/100] phylib: fix potential use-after-free Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 075/100] octeontx2-pf: Forward error codes to VF Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 076/100] rxrpc: Adjust retransmission backoff Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 077/100] efi/libstub: arm64: Fix image check alignment at entry Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 078/100] hwmon: (lm90) Mark alert as broken for MAX6654 Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 079/100] powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 080/100] net: ipv4: Move ip_options_fragment() out of loop Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 081/100] net: ipv4: Fix the warning for dereference Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 082/100] ipv4: fix ip option filtering for locally generated fragments Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 083/100] ibmvnic: init ->running_cap_crqs early Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 084/100] ibmvnic: dont spin in tasklet Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 085/100] video: hyperv_fb: Fix validation of screen resolution Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 086/100] drm/msm/hdmi: Fix missing put_device() call in msm_hdmi_get_phy Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 087/100] drm/msm/dpu: invalid parameter check in dpu_setup_dspp_pcc Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 088/100] yam: fix a memory leak in yam_siocdevprivate() Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 089/100] net: cpsw: Properly initialise struct page_pool_params Greg Kroah-Hartman
2022-01-31 20:19   ` Colin Foster
2022-02-01 10:37     ` Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 090/100] net: hns3: handle empty unknown interrupt for VF Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 091/100] Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values" Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 092/100] net: bridge: vlan: fix single net device option dumping Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 093/100] ipv4: raw: lock the socket in raw_bind() Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 094/100] ipv4: tcp: send zero IPID in SYNACK messages Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 095/100] ipv4: remove sparse error in ip_neigh_gw4() Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 096/100] net: bridge: vlan: fix memory leak in __allowed_ingress Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 097/100] dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config Greg Kroah-Hartman
2022-01-31 10:56 ` [PATCH 5.10 098/100] usr/include/Makefile: add linux/nfc.h to the compile-test coverage Greg Kroah-Hartman
2022-01-31 10:57 ` [PATCH 5.10 099/100] fsnotify: invalidate dcache before IN_DELETE event Greg Kroah-Hartman
2022-01-31 10:57 ` [PATCH 5.10 100/100] block: Fix wrong offset in bio_truncate() Greg Kroah-Hartman
2022-01-31 14:06 ` [PATCH 5.10 000/100] 5.10.96-rc1 review Jon Hunter
2022-01-31 20:04 ` Florian Fainelli
2022-01-31 22:17 ` Shuah Khan
2022-02-01  4:24 ` Guenter Roeck
2022-02-01  7:49 ` Naresh Kamboju
2022-02-01 12:05 ` Pavel Machek
2022-02-01 12:45   ` Pavel Machek
2022-02-01 15:41 ` Sudip Mukherjee
2022-02-02  2:34 ` Fox Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220131105220.724050290@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bblock@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maier@linux.ibm.com \
    --cc=martin.petersen@oracle.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.