Re: [PATCH 2/3] qla2xxx: SAN congestion management(SCM) implementation.

From: Shyam Sundar <ssundar@marvell.com>
To: "james.smart@broadcom.com" <james.smart@broadcom.com>
Cc: Nilesh Javali <njavali@marvell.com>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	GR-QLogic-Storage-Upstream
	<GR-QLogic-Storage-Upstream@marvell.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	Arun Easi <aeasi@marvell.com>
Subject: Re: [PATCH 2/3] qla2xxx: SAN congestion management(SCM) implementation.
Date: Thu, 11 Jun 2020 17:42:19 +0000	[thread overview]
Message-ID: <351333B3-F666-420F-A9D3-DB86D2617156@marvell.com> (raw)
In-Reply-To: <BYAPR18MB2805AEA357302FCFA20D2B57B48F0@BYAPR18MB2805.namprd18.prod.outlook.com>

Seems like this (and a previous email) never made it to the reflector, resending.

The suggestions make sense to me, and I have made most of the recommended changes.
I was looking for some guidance on placements of the sim stats structures.

> On May 29, 2020, at 11:53 AM, Shyam Sundar <ssundar@marvell.com> wrote:
> 
> James,
>      I was thinking of adding a structures for tracking the target FPIN stats.
> 
> struct fc_rport_statistics {
> uint32_t link_failure_count;
> uint32_t loss_of_sync_count;
> ....
> }
> 
> under fc_rport:
> 
> struct fc_rport {
> 
> /* Private (Transport-managed) Attributes */
> struct fc_rport_statistics;
> 
>      For host FPIN stats (essentially the alarm & warning), was not sure if I should add them to the fc_host_statistics or
> define a new structure under the Private Attributes section within the fc_host_attrs/fc_vport.
> 
>      In theory, given that the host stats could be updated both via signals and FPIN, one could argue that it would make sense
> to maintain it with the current host statistics, but keeping it confined to transport will ensure we have a uniform way of handling
> congestion and peer congestion events.  
> 
>     Would appreciate your thoughts there.
> 
> Thanks
> Shyam
>    
> ---------- Forwarded message ---------
> From: James Smart <james.smart@broadcom.com>
> Date: Fri, May 15, 2020 at 3:51 PM
> Subject: Re: [PATCH 2/3] qla2xxx: SAN congestion management(SCM) implementation.
> To: Nilesh Javali <njavali@marvell.com>, <martin.petersen@oracle.com>
> Cc: <linux-scsi@vger.kernel.org>, <GR-QLogic-Storage-Upstream@marvell.com>
> 
> 
> 
> 
> On 5/14/2020 3:10 AM, Nilesh Javali wrote:
> > From: Shyam Sundar <ssundar@marvell.com>
> >
> > * Firmware Initialization with SCM enabled based on NVRAM setting and
> >    firmware support (About Firmware).
> > * Enable PUREX and add support for fabric performance impact
> >    notification(FPIN) handling.
> > * Support for the following FPIN descriptors:
> >       1. Link Integrity Notification.
> >       2. Delivery Notification.
> >       3. Peer Congestion Notification.
> >       4. Congestion Notification.
> > * Mark a device as slow when a Peer Congestion Notification is received.
> > * Allocate a default purex item for each vha, to handle memory
> >    allocation failures in ISR.
> 
> In general, there's a lot of generic things here that are done in 
> driver-specific manners.
> 
> All the FPIN statistics should be added to the scsi fc transport objects 
> and transport routines created to parse the fpin payloads and set 
> statistics.  Also, statistics can be reported via sysfs on the transport 
> object rather than creating vendor-specific bsg requests to obtain them.
> 
> In line with this - FPIN definitions should use the existing the 
> existing common headers in include/uapi/fc/fc_els.h.  The file doesn't 
> have the congestion fpin definitions, so rather than putting in a driver 
> header - put the structure definitions in the common header.
> 
> 
> >
> > Signed-off-by: Shyam Sundar <ssundar@marvell.com>
> > Signed-off-by: Arun Easi <aeasi@marvell.com>
> > Signed-off-by: Nilesh Javali <njavali@marvell.com>
> > ---
> >   drivers/scsi/qla2xxx/qla_bsg.h  | 115 +++++++
> >   drivers/scsi/qla2xxx/qla_dbg.c  |  13 +-
> >   drivers/scsi/qla2xxx/qla_def.h  |  89 ++++++
> >   drivers/scsi/qla2xxx/qla_fw.h   |   7 +-
> >   drivers/scsi/qla2xxx/qla_gbl.h  |   1 +
> >   drivers/scsi/qla2xxx/qla_init.c |   9 +-
> >   drivers/scsi/qla2xxx/qla_isr.c  | 532 ++++++++++++++++++++++++++++++--
> >   drivers/scsi/qla2xxx/qla_mbx.c  |  45 ++-
> >   drivers/scsi/qla2xxx/qla_os.c   |  18 ++
> >   9 files changed, 795 insertions(+), 34 deletions(-)
> >
> > diff --git a/drivers/scsi/qla2xxx/qla_bsg.h b/drivers/scsi/qla2xxx/qla_bsg.h
> > index 7594fad7b5b5..0b308859047c 100644
> > --- a/drivers/scsi/qla2xxx/qla_bsg.h
> > +++ b/drivers/scsi/qla2xxx/qla_bsg.h
> > @@ -290,4 +290,119 @@ struct qla_active_regions {
> >       uint8_t reserved[32];
> >   } __packed;
> >   
> > +#define SCM_LINK_EVENT_UNKNOWN                       0x0
> > +#define SCM_LINK_EVENT_LINK_FAILURE                  0x1
> > +#define SCM_LINK_EVENT_LOSS_OF_SYNC                  0x2
> > +#define SCM_LINK_EVENT_LOSS_OF_SIGNAL                0x3
> > +#define SCM_LINK_EVENT_PRIMITIVE_SEQ_PROTOCOL_ERROR  0x4
> > +#define SCM_LINK_EVENT_INVALID_TX_WORD               0x5
> > +#define SCM_LINK_EVENT_INVALID_CRC                   0x6
> > +#define SCM_LINK_EVENT_DEVICE_SPECIFIC               0xF
> > +#define SCM_LINK_EVENT_V1_SIZE                       20
> 
> These should be the existing defines in the common fc_els.h header. Also 
> make sure SCM_LINK_EVENT_V1_SIZE is actually used.
> 
> > +struct qla_scm_link_event {
> > +     uint64_t        timestamp;
> > +     uint16_t        event_type;
> > +     uint16_t        event_modifier;
> > +     uint32_t        event_threshold;
> > +     uint32_t        event_count;
> > +     uint8_t         reserved[12];
> > +} __packed;
> > +
> > +#define SCM_DELIVERY_REASON_UNKNOWN          0x0
> > +#define SCM_DELIVERY_REASON_TIMEOUT          0x1
> > +#define SCM_DELIVERY_REASON_UNABLE_TO_ROUTE  0x2
> > +#define SCM_DELIVERY_REASON_DEVICE_SPECIFIC  0xF
> > +struct qla_scm_delivery_event {
> > +     uint64_t        timestamp;
> > +     uint32_t        delivery_reason;
> > +     uint8_t         deliver_frame_hdr[24];
> > +     uint8_t         reserved[28];
> > +
> > +} __packed;
> > +
> > +#define SCM_CONGESTION_EVENT_CLEAR           0x0
> > +#define SCM_CONGESTION_EVENT_LOST_CREDIT     0x1
> > +#define SCM_CONGESTION_EVENT_CREDIT_STALL    0x2
> > +#define SCM_CONGESTION_EVENT_OVERSUBSCRIPTION        0x3
> > +#define SCM_CONGESTION_EVENT_DEVICE_SPECIFIC 0xF
> > +struct qla_scm_peer_congestion_event {
> > +     uint64_t        timestamp;
> > +     uint16_t        event_type;
> > +     uint16_t        event_modifier;
> > +     uint32_t        event_period;
> > +     uint8_t         reserved[16];
> > +} __packed;
> > +
> > +#define SCM_CONGESTION_SEVERITY_WARNING      0xF1
> > +#define SCM_CONGESTION_SEVERITY_ERROR        0xF7
> > +struct qla_scm_congestion_event {
> > +     uint64_t        timestamp;
> > +     uint16_t        event_type;
> > +     uint16_t        event_modifier;
> > +     uint32_t        event_period;
> > +     uint8_t         severity;
> > +     uint8_t         reserved[15];
> > +} __packed;
> 
> I expect many of these defines also should map to std-defined defines, 
> thus should be added to fc_els.h
> 
> 
> > +
> > +#define SCM_FLAG_RDF_REJECT          0x00
> > +#define SCM_FLAG_RDF_COMPLETED               0x01
> > +#define SCM_FLAG_BROCADE_CONNECTED   0x02
> > +#define SCM_FLAG_CISCO_CONNECTED     0x04
> > +
> > +#define SCM_STATE_CONGESTION 0x1
> > +#define SCM_STATE_DELIVERY           0x2
> > +#define SCM_STATE_LINK_INTEGRITY     0x4
> > +#define SCM_STATE_PEER_CONGESTION    0x8
> > +
> > +#define QLA_CON_PRIMITIVE_RECEIVED   0x1
> > +#define QLA_CONGESTION_ARB_WARNING   0x1
> > +#define QLA_CONGESTION_ARB_ALARM     0X2
> > +struct qla_scm_port {
> > +     uint32_t                        current_events;
> > +
> > +     struct qla_scm_link_event       link_integrity;
> > +     struct qla_scm_delivery_event   delivery;
> > +     struct qla_scm_congestion_event congestion;
> > +     uint64_t                        scm_congestion_alarm;
> > +     uint64_t                        scm_congestion_warning;
> > +     uint8_t                         scm_fabric_connection_flags;
> > +     uint8_t                         reserved[43];
> > +} __packed;
> > +
> > +struct qla_scm_target {
> > +     uint8_t         wwpn[8];
> > +     uint32_t        current_events;
> > +
> > +     struct qla_scm_link_event               link_integrity;
> > +     struct qla_scm_delivery_event           delivery;
> > +     struct qla_scm_peer_congestion_event    peer_congestion;
> > +
> > +     uint32_t        link_failure_count;
> > +     uint32_t        loss_of_sync_count;
> > +     uint32_t        loss_of_signals_count;
> > +     uint32_t        primitive_seq_protocol_err_count;
> > +     uint32_t        invalid_transmission_word_count;
> > +     uint32_t        invalid_crc_count;
> > +
> > +     uint32_t        delivery_failure_unknown;
> > +     uint32_t        delivery_timeout;
> > +     uint32_t        delivery_unable_to_route;
> > +     uint32_t        delivery_failure_device_specific;
> > +
> > +     uint32_t        peer_congestion_clear;
> > +     uint32_t        peer_congestion_lost_credit;
> > +     uint32_t        peer_congestion_credit_stall;
> > +     uint32_t        peer_congestion_oversubscription;
> > +     uint32_t        peer_congestion_device_specific;
> > +     uint32_t        link_unknown_event;
> > +     uint32_t        link_device_specific_event;
> > +     uint8_t         reserved[48];
> > +} __packed;
> > +
> 
> Q: what purpose are these shorter "meta" event structures serving ? Why 
> hold onto (what I assume is) the last event.  Wouldn't something 
> monitoring netlink and use of the existing fc_host_fpin_rcv() interface 
> be enough ? it should see all events.
> 
> 
> >   
> > +#define SCM_EDC_ACC_RECEIVED         BIT_6
> > +#define SCM_RDF_ACC_RECEIVED         BIT_7
> > +#define SCM_NOTIFICATION_TYPE_LINK_INTEGRITY 0x00020001
> > +#define SCM_NOTIFICATION_TYPE_DELIVERY               0x00020002
> > +#define SCM_NOTIFICATION_TYPE_PEER_CONGESTION        0x00020003
> > +#define SCM_NOTIFICATION_TYPE_CONGESTION     0x00020004
> > +#define FPIN_DESCRIPTOR_HEADER_SIZE  4
> > +#define FPIN_ELS_DESCRIPTOR_LIST_OFFSET      8
> > +struct fpin_descriptor {
> > +     __be32 descriptor_tag;
> > +     __be32 descriptor_length;
> > +     union {
> > +             uint8_t common_detecting_port_name[WWN_SIZE];
> > +             struct {
> > +                     uint8_t detecting_port_name[WWN_SIZE];
> > +                     uint8_t attached_port_name[WWN_SIZE];
> > +                     __be16 event_type;
> > +                     __be16 event_modifier;
> > +                     __be32 event_threshold;
> > +                     __be32 event_count;
> > +                     __be32 port_name_count;
> > +                     uint8_t port_name_list[1][WWN_SIZE];
> > +             } link_integrity;
> > +             struct {
> > +                     uint8_t detecting_port_name[WWN_SIZE];
> > +                     uint8_t attached_port_name[WWN_SIZE];
> > +                     __be32 delivery_reason_code;
> > +             } delivery;
> > +             struct {
> > +                     uint8_t detecting_port_name[WWN_SIZE];
> > +                     uint8_t attached_port_name[WWN_SIZE];
> > +                     __be16 event_type;
> > +                     __be16 event_modifier;
> > +                     __be32 event_period;
> > +                     __be32 port_name_count;
> > +                     uint8_t port_name_list[1][WWN_SIZE];
> > +             } peer_congestion;
> > +             struct {
> > +                     __be16 event_type;
> > +                     __be16 event_modifier;
> > +                     __be32 event_period;
> > +                     uint8_t severity;
> > +                     uint8_t reserved[3];
> > +             } congestion;
> > +     };
> > +};
> 
> The fpin descriptor is already in the common fc_els.h header. Use it.  
> And feel free to extend the common header definitions for the 
> congestion/delivery events.
> 
> 
> >   
> > +void
> > +qla_link_integrity_tgt_stats_update(struct fpin_descriptor *fpin_desc,
> > +                                 fc_port_t *fcport)
> > +{
> > +     ql_log(ql_log_info, fcport->vha, 0x502d,
> > +            "Link Integrity Event Type %d for Port %8phN\n",
> > +            be16_to_cpu(fpin_desc->link_integrity.event_type),
> > +            fcport->port_name);
> > +
> > +     fcport->scm_stats.link_integrity.event_type =
> > +         be16_to_cpu(fpin_desc->link_integrity.event_type);
> > +     fcport->scm_stats.link_integrity.event_modifier =
> > +         be16_to_cpu(fpin_desc->link_integrity.event_modifier);
> > +     fcport->scm_stats.link_integrity.event_threshold =
> > +         be32_to_cpu(fpin_desc->link_integrity.event_threshold);
> > +     fcport->scm_stats.link_integrity.event_count =
> > +         be32_to_cpu(fpin_desc->link_integrity.event_count);
> > +     fcport->scm_stats.link_integrity.timestamp = ktime_get_real_seconds();
> > +
> > +     fcport->scm_stats.current_events |= SCM_STATE_LINK_INTEGRITY;
> > +     switch (be16_to_cpu(fpin_desc->link_integrity.event_type)) {
> > +     case SCM_LINK_EVENT_UNKNOWN:
> > +             fcport->scm_stats.link_unknown_event +=
> > +                 be32_to_cpu(fpin_desc->link_integrity.event_count);
> > +             break;
> > +     case SCM_LINK_EVENT_LINK_FAILURE:
> > +             fcport->scm_stats.link_failure_count +=
> > +                 be32_to_cpu(fpin_desc->link_integrity.event_count);
> > +             break;
> > +     case SCM_LINK_EVENT_LOSS_OF_SYNC:
> > +             fcport->scm_stats.loss_of_sync_count +=
> > +                 be32_to_cpu(fpin_desc->link_integrity.event_count);
> > +             break;
> > +     case SCM_LINK_EVENT_LOSS_OF_SIGNAL:
> > +             fcport->scm_stats.loss_of_signals_count +=
> > +                 be32_to_cpu(fpin_desc->link_integrity.event_count);
> > +             break;
> > +     case SCM_LINK_EVENT_PRIMITIVE_SEQ_PROTOCOL_ERROR:
> > +             fcport->scm_stats.primitive_seq_protocol_err_count +=
> > +                 be32_to_cpu(fpin_desc->link_integrity.event_count);
> > +             break;
> > +     case SCM_LINK_EVENT_INVALID_TX_WORD:
> > +             fcport->scm_stats.invalid_transmission_word_count +=
> > +                 be32_to_cpu(fpin_desc->link_integrity.event_count);
> > +             break;
> > +     case SCM_LINK_EVENT_INVALID_CRC:
> > +             fcport->scm_stats.invalid_crc_count +=
> > +                 be32_to_cpu(fpin_desc->link_integrity.event_count);
> > +             break;
> > +     case SCM_LINK_EVENT_DEVICE_SPECIFIC:
> > +             fcport->scm_stats.link_device_specific_event +=
> > +                 be32_to_cpu(fpin_desc->link_integrity.event_count);
> > +             break;
> > +     }
> > +}
> > +
> 
> A prime example of a routine that should be put into the fc transport 
> and a list of statistics that should be visible via sysfs on the 
> transport host (aka port) object.
> 
> 
> > +void
> > +qla_scm_process_link_integrity_d(struct scsi_qla_host *vha,
> > +                              struct fpin_descriptor *fpin_desc)
> > +{
> > ...
> > +}
> > +
> > +void
> > +qla_delivery_tgt_stats_update(struct fpin_descriptor *fpin_desc,
> > +                           fc_port_t *fcport)
> > +{
> > ...
> > +}
> > +
> > +/*
> > + * Process Delivery Notification Descriptor
> > + */
> > +void
> > +qla_scm_process_delivery_notification_d(struct scsi_qla_host *vha,
> > +                                     struct fpin_descriptor *fpin_desc)
> > +{
> > ...
> > +}
> > +
> > ...
> > +
> > +void
> > +qla_peer_congestion_tgt_stats_update(struct fpin_descriptor *fpin_desc,
> > +                                  fc_port_t *fcport)
> > +{
> > ...
> > +}
> > +
> > +/*
> > + * Process Peer-Congestion Notification Descriptor
> > + */
> > +void
> > +qla_scm_process_peer_congestion_notification_d(struct scsi_qla_host *vha,
> > +                                     struct fpin_descriptor *fpin_desc)
> > +{
> > ...
> > +}
> > +
> > +/*
> > + * qla_scm_process_congestion_notification_d() - Process
> > + * Process Congestion Notification Descriptor
> > + * @rsp: response queue
> > + * @pkt: Entry pointer
> > + */
> > +void
> > +qla_scm_process_congestion_notification_d(struct scsi_qla_host *vha,
> > +                                       struct fpin_descriptor *fpin_desc)
> > ...
> > +}
> 
> Same comment as prior - should be in scsi fc transport routines and 
> stats set on appropriate transport object
> 
> -- james