All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steffen Maier <maier@linux.ibm.com>
To: Hannes Reinecke <hare@suse.de>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Benjamin Block <bblock@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>,
	James Bottomley <james.bottomley@hansenpartnership.com>,
	linux-scsi@vger.kernel.org
Subject: Re: [PATCH 03/24] zfcp: open-code fc_block_scsi_eh() for host reset
Date: Wed, 4 May 2022 12:46:03 +0200	[thread overview]
Message-ID: <719b0a4f-a6ce-8725-9afe-76a524eadba9@linux.ibm.com> (raw)
In-Reply-To: <20220503200704.88003-4-hare@suse.de>

On 5/3/22 22:06, Hannes Reinecke wrote:
> When issuing a host reset we should be waiting for all
> ports to become unblocked; just waiting for one might
> be resulting in host reset to return too early.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Cc: Steffen Maier <maier@linux.ibm.com>
> Cc: Benjamin Block <bblock@linux.ibm.com>
> ---
>   drivers/s390/scsi/zfcp_scsi.c | 21 +++++++++++++++------
>   1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
> index 526ac240d9fe..2e615e1dcde6 100644
> --- a/drivers/s390/scsi/zfcp_scsi.c
> +++ b/drivers/s390/scsi/zfcp_scsi.c
> @@ -373,9 +373,11 @@ static int zfcp_scsi_eh_target_reset_handler(struct scsi_cmnd *scpnt)
>   
>   static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
>   {
> -	struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt->device);
> -	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
> -	int ret = SUCCESS, fc_ret;
> +	struct Scsi_Host *host = scpnt->device->host;
> +	struct zfcp_adapter *adapter = (struct zfcp_adapter *)host->hostdata[0];
> +	int ret = SUCCESS;
> +	unsigned long flags;
> +	struct zfcp_port *port;
>   
>   	if (!(adapter->connection_features & FSF_FEATURE_NPIV_MODE)) {
>   		zfcp_erp_port_forced_reopen_all(adapter, 0, "schrh_p");
> @@ -383,9 +385,16 @@ static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
>   	}
>   	zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");
>   	zfcp_erp_wait(adapter);

Now internal zfcp recovery for the adapter completed, but there are async 
pending follow-up things, such as rport(s) unblock, see below.

> -	fc_ret = fc_block_scsi_eh(scpnt);
> -	if (fc_ret)
> -		ret = fc_ret;
> +
> +	spin_lock_irqsave(&adapter->port_list_lock, flags);

zfcp_adapter.port_list_lock is of type rwlock_t:

+	read_lock_irqsave(&adapter->port_list_lock, flags);

> +	list_for_each_entry(port, &adapter->port_list, list) {
> +		struct fc_rport *rport = port->rport;

While an rport is blocked, port->rport == NULL [zfcp_scsi_rport_block() and
zfcp_scsi_rport_register()], so this could be a NULL pointer deref.

Typically all rports would still be blocked after above adapter recovery, until 
they become unblocked (via zfcp's async rport_work) 
[zfcp_erp_try_rport_unblock() near the end of lun and port recovery that happen 
as follow-up parts of adapter recovery and before zfcp_erp_wait() returns; it 
eventually calls zfcp_scsi_schedule_rport_register() queueing work item 
port->rport_work on which we could do flush_work() [one prior art in 
zfcp_init_device_configure()]].

Am starting to wonder if we would really need to sync with the async rports 
unblocks (for this adapter) after zfcp_erp_wait() above and before any deref 
port->rport. Either within this loop or with a separate loop before this one.
( Another option would be to somehow iterate rports differently via common code 
lists so we directly get rport references. )

> +
> +		ret = fc_block_rport(rport);

Lock order looks good, we hold port_list_lock here and fc_block_rport() takes 
Scsi_Host host_lock, so it matches prior art:
static void zfcp_erp_action_dismiss_adapter(struct zfcp_adapter *adapter)
		read_lock(&adapter->port_list_lock);
		    zfcp_erp_action_dismiss_port(port);
			spin_lock(port->adapter->scsi_host->host_lock);

> +		if (ret)
> +			break;

So once we find the first rport with
(rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
we would break the loop and return from zfcp_scsi_eh_host_reset_handler() with 
FAST_IO_FAIL. Is that correct? What if there are still other blocked rports if 
we were to continue with the loop? Or am I missing something regarding the goal 
of "waiting for all rports to become unblocked"?

Once we completed the loop, the question arises which consolidated return code 
would be the correct one, if we got different ret for different rports in the 
loop. Does my previous brain dump make sense?:
I was pondering in my own patch version what to return of just a subset
of ports ran into fast_io_fail. And for now I thought just fast_io_fail
would be good to let I/O bubble up for path failover, even if this would
include of rport which meanwhile unblocked properly and would not need
bubbling up pending requests because they could service them with a
simple retry.

> +	}
> +	spin_unlock_irqrestore(&adapter->port_list_lock, flags);

+	read_unlock_irqrestore(&adapter->port_list_lock, flags);

>   
>   	zfcp_dbf_scsi_eh("schrh_r", adapter, ~0, ret);
>   	return ret;


-- 
Mit freundlichen Gruessen / Kind regards
Steffen Maier

Linux on IBM Z and LinuxONE

https://www.ibm.com/privacy/us/en/
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschaeftsfuehrung: David Faller
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

  reply	other threads:[~2022-05-04 10:46 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-03 20:06 [PATCHv2 00/24] scsi: EH rework prep patches, part 1 Hannes Reinecke
2022-05-03 20:06 ` [PATCH 01/24] csiostor: use fc_block_rport() Hannes Reinecke
2022-05-03 20:06 ` [PATCH 02/24] fc_fcp: " Hannes Reinecke
2022-05-03 20:06 ` [PATCH 03/24] zfcp: open-code fc_block_scsi_eh() for host reset Hannes Reinecke
2022-05-04 10:46   ` Steffen Maier [this message]
2022-05-04 11:40     ` Steffen Maier
2022-05-12  6:12       ` Steffen Maier
2022-05-04 11:01   ` Benjamin Block
2022-05-04 11:09     ` Benjamin Block
2022-05-04 11:33     ` Benjamin Block
2022-05-03 20:06 ` [PATCH 04/24] mptfc: simplify mpt_fc_block_error_handler() Hannes Reinecke
2022-05-03 20:06 ` [PATCH 05/24] mptfusion: correct definitions for mptscsih_dev_reset() Hannes Reinecke
2022-05-03 20:06 ` [PATCH 06/24] mptfc: open-code mptfc_block_error_handler() for bus reset Hannes Reinecke
2022-05-03 20:06 ` [PATCH 07/24] qedf: use fc rport as argument for qedf_initiate_tmf() Hannes Reinecke
2022-05-03 20:06 ` [PATCH 08/24] bnx2fc: Do not rely on a scsi command for lun or target reset Hannes Reinecke
2022-05-03 20:06 ` [PATCH 09/24] ibmvfc: open-code reset loop for " Hannes Reinecke
2022-05-03 20:06 ` [PATCH 10/24] ibmvfc: use fc_block_rport() Hannes Reinecke
2022-05-03 20:06 ` [PATCH 11/24] fnic: use dedicated device reset command Hannes Reinecke
2022-05-03 20:06 ` [PATCH 12/24] fnic: use fc_block_rport() correctly Hannes Reinecke
2022-05-03 20:06 ` [PATCH 13/24] aic7xxx: make BUILD_SCSIID() a function Hannes Reinecke
2022-05-03 20:06 ` [PATCH 14/24] aic79xx: " Hannes Reinecke
2022-05-03 20:06 ` [PATCH 15/24] aic7xxx: do not reference scsi command when resetting device Hannes Reinecke
2022-05-04  1:10   ` Finn Thain
2022-05-03 20:06 ` [PATCH 16/24] aic79xx: " Hannes Reinecke
2022-05-03 20:06 ` [PATCH 17/24] snic: reserve tag for TMF Hannes Reinecke
2022-05-03 20:06 ` [PATCH 18/24] snic: use dedicated device reset command Hannes Reinecke
2022-05-03 20:06 ` [PATCH 19/24] snic: Use scsi_host_busy_iter() to traverse commands Hannes Reinecke
2022-05-03 20:07 ` [PATCH 20/24] sym53c8xx_2: split off bus reset from host reset Hannes Reinecke
2022-05-03 20:07 ` [PATCH 21/24] ips: Do not try to abort command " Hannes Reinecke
2022-05-03 20:07 ` [PATCH 22/24] qla1280: separate out host reset function from qla1280_error_action() Hannes Reinecke
2022-05-03 20:07 ` [PATCH 23/24] megaraid: pass in NULL scb for host reset Hannes Reinecke
2022-05-03 20:07 ` [PATCH 24/24] mpi3mr: split off bus_reset function from host_reset Hannes Reinecke
  -- strict thread matches above, loose matches on Subject: below --
2022-05-02 21:37 [PATCH 00/24] scsi: EH rework prep patches, part 1 Hannes Reinecke
2022-05-02 21:37 ` [PATCH 03/24] zfcp: open-code fc_block_scsi_eh() for host reset Hannes Reinecke
2022-05-03 14:06   ` Christoph Hellwig
2022-05-03 17:21   ` Steffen Maier
2022-05-03 18:42     ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=719b0a4f-a6ce-8725-9afe-76a524eadba9@linux.ibm.com \
    --to=maier@linux.ibm.com \
    --cc=bblock@linux.ibm.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=james.bottomley@hansenpartnership.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.