Re: [PATCH 1/2] scsi: fc: Update statistics for host and rport on FPIN reception.

From: Shyam Sundar <ssundar@marvell.com>
To: James Smart <james.smart@broadcom.com>,
	Nilesh Javali <njavali@marvell.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	GR-QLogic-Storage-Upstream
	<GR-QLogic-Storage-Upstream@marvell.com>
Subject: Re: [PATCH 1/2] scsi: fc: Update statistics for host and rport on FPIN reception.
Date: Mon, 28 Sep 2020 20:07:37 +0000	[thread overview]
Message-ID: <1230F28E-B730-4022-B4CA-6A47C49F15FD@marvell.com> (raw)
In-Reply-To: <31d735ac-90d6-a601-0d8e-c15739684d23@broadcom.com>

    > +
    > +/*
    > + * fc_fpin_li_stats_update - routine to update Link Integrity
    > + * event statistics.
    > + * @shost:		host the FPIN was received on
    > + * @tlv:		pointer to link integrity descriptor
    > + *
    > + */
    > +static void
    > +fc_fpin_li_stats_update(struct Scsi_Host *shost, struct fc_tlv_desc *tlv)
    > +{
    > +	u8 i;
    > +	struct fc_rport *rport = NULL;
    > +	struct fc_rport *det_rport = NULL, *attach_rport = NULL;
    > +	struct fc_host_attrs *fc_host = shost_to_fc_host(shost);
    > +	struct fc_fn_li_desc *li_desc = (struct fc_fn_li_desc *)tlv;
    > +	u64 wwpn;
    > +
    > +	rport = fc_find_rport_by_wwpn(shost,
    > +				      be64_to_cpu(li_desc->detecting_wwpn));
    > +	if (rport) {
    > +		det_rport = rport;
    > +		fc_li_stats_update(li_desc, &det_rport->stats);

    this looks odd - why are the stats counting against both the detecting 
    and attached ports - I would think it only counts against the "attached" 
    port.

    As it's the same counters - you loose the distinction of what it 
    detected vs what it is generating.  My guess is most of the detecting 
    ports would have been a switch port and it wouldn't have been found by 
    the rport_by_wwpn, so this block wasn't getting executed.

Shyam: James, the idea here was, for FPINs detected/initiated by the fabric will have the port of interest (this HBA/target that the HBA is connected to) as the "attached" port. However, as FPIN could also be generated by the Nx_Port, if it originated the FPIN and the fabric had broadcasted it, then the peer port would show as the "detecting" port in that case. Also, I am assuming that while broadcasting, the fabric does not forward an FPIN generated by HBA X back to itself. If it does, then we might indeed do a double accounting. I'll check back with the Fabric folks on that.

For a given FPIN, we only expect of these to be true.
I am open to removing the accounting against the "detecting" port for now, given that currently, there are no known implementations where the N_Port initiates the FPIN ELS.
Let me know what you think.

    > +	}
    > +
    > +	rport = fc_find_rport_by_wwpn(shost,
    > +				      be64_to_cpu(li_desc->attached_wwpn));
    > +	if (rport) {
    > +		attach_rport = rport;
    > +		fc_li_stats_update(li_desc, &attach_rport->stats);
    > +	}
    > +
    > +	if (be32_to_cpu(li_desc->pname_count) > 0) {
    > +		for (i = 0;
    > +		    i < be32_to_cpu(li_desc->pname_count);
    > +		    i++) {
    > +			wwpn = be64_to_cpu(li_desc->pname_list[i]);
    > +			rport = fc_find_rport_by_wwpn(shost, wwpn);
    > +			if (rport && rport != det_rport &&
    > +			    rport != attach_rport) {
    > +				fc_li_stats_update(li_desc, &rport->stats);

    I guess this is ok - but it makes it hard for administrators.  I believe 
    this is the list of the other nports (aka npiv) on the "attached port" 
    that is generating the error.  In that respect, it is correct to 
    increment their counters - but I hope that an administrator knows that 
    may resolve to a single physical port with only 1/N the error count.  
     From our use case in linux, as an initiator, to match an rport it must 
    be a target port using npiv and from our point of view we don't know 
    that they are all sharing the same physical port.

Shyam: I agree. But with the information in hand, I am not sure how we could do this better at this point. 

    > +			}
    > +		}
    > +	}
    > +}
    > +
    > +/*
    > + * fc_fpin_congn_stats_update - routine to update Congestion
    > + * event statistics.
    > + * @shost:		host the FPIN was received on
    > + * @tlv:		pointer to congestion descriptor
    > + *
    > + */
    > +static void
    > +fc_fpin_congn_stats_update(struct Scsi_Host *shost,
    > +			   struct fc_tlv_desc *tlv)
    > +{
    > +	struct fc_host_attrs *fc_host = shost_to_fc_host(shost);
    > +	struct fc_fn_congn_desc *congn = (struct fc_fn_congn_desc *)tlv;
    > +
    > +	fc_cn_stats_update(be16_to_cpu(congn->event_type), &fc_host->stats);
    > +}
    > +
    >   /**
    >    * fc_host_rcv_fpin - routine to process a received FPIN.
    >    * @shost:		host the FPIN was received on
    > @@ -639,8 +905,41 @@ EXPORT_SYMBOL(fc_host_post_vendor_event);
    >   void
    >   fc_host_fpin_rcv(struct Scsi_Host *shost, u32 fpin_len, char *fpin_buf)
    >   {
    > +	struct fc_els_fpin *fpin = (struct fc_els_fpin *)fpin_buf;
    > +	struct fc_tlv_desc *tlv;
    > +	u32 desc_cnt = 0, bytes_remain;
    > +	u32 dtag;
    > +
    > +	/* Update Statistics */
    > +	tlv = (struct fc_tlv_desc *)&fpin->fpin_desc[0];
    > +	bytes_remain = fpin_len - offsetof(struct fc_els_fpin, fpin_desc);
    > +	bytes_remain = min_t(u32, bytes_remain, be32_to_cpu(fpin->desc_len));
    > +
    > +	while (bytes_remain >= FC_TLV_DESC_HDR_SZ &&
    > +	       bytes_remain >= FC_TLV_DESC_SZ_FROM_LENGTH(tlv)) {
    > +		dtag = be32_to_cpu(tlv->desc_tag);
    > +		switch (dtag) {
    > +		case ELS_DTAG_LNK_INTEGRITY:
    > +			fc_fpin_li_stats_update(shost, tlv);
    > +			break;
    > +		case ELS_DTAG_DELIVERY:
    > +			fc_fpin_deli_stats_update(shost, tlv);
    > +			break;
    > +		case ELS_DTAG_PEER_CONGEST:
    > +			fc_fpin_peer_congn_stats_update(shost, tlv);
    > +			break;
    > +		case ELS_DTAG_CONGESTION:
    > +			fc_fpin_congn_stats_update(shost, tlv);
    > +		}
    > +
    > +		desc_cnt++;
    > +		bytes_remain -= FC_TLV_DESC_SZ_FROM_LENGTH(tlv);
    > +		tlv = fc_tlv_next_desc(tlv);
    > +	}
    > +
    >   	fc_host_post_fc_event(shost, fc_get_event_number(),
    > -				FCH_EVT_LINK_FPIN, fpin_len, fpin_buf, 0);
    > +			      FCH_EVT_LINK_FPIN, fpin_len, fpin_buf, 0);
    > +
    >   }
    >   EXPORT_SYMBOL(fc_host_fpin_rcv);

    Question: I know we've been asked to log the fpins to the kernel log.  
    Holding on to the counts and so is good, but it still loses some of the 
    relationship of the detected port (what detected what attached port).  
    What's your thinking on it. Should it be something in these common 
    routines and enabled/disabled by a sysfs toggle ?

Shyam: So far, I have been looking at it from the point of gathering and maintain the error stats, closest to the source of their origin.
So irrespective of if an error was "detected" by the Nx_Port itself, or by the F_Port attached to it, we are pointing the administrator towards the Nx_Port (by accounting for the error and tying it to that port).

Having said that, I do not think I completely grasp the essence of your question here, and your proposal of turning it on/off. Could you please elaborate.

All the other comments make sense to me. I'll roll them in and send out another patchset shortly.

Regards
Shyam