All of lore.kernel.org
 help / color / mirror / Atom feed
From: Erwan Velu <erwanaliasr1@gmail.com>
To: unlisted-recipients:; (no To-header on input)
Cc: Erwan Velu <e.velu@criteo.com>,
	Don Brace <don.brace@microsemi.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	"open list:MICROSEMI SMART ARRAY SMARTPQI DRIVER (smartpqi)" 
	<esc.storagedev@microsemi.com>,
	"open list:MICROSEMI SMART ARRAY SMARTPQI DRIVER (smartpqi)" 
	<linux-scsi@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] scsi: smartpqi: Reporting unhandled SCSI errors
Date: Wed, 10 Apr 2019 13:02:51 +0200	[thread overview]
Message-ID: <CAL2JzuyHb_gfu5Suf3yaMF1883JN1667yhEwpdmoiqYrUTO2YA@mail.gmail.com> (raw)
In-Reply-To: <20190321094928.4198-1-e.velu@criteo.com>

Hi there !
Any reactions to this one ? I didn't got a single comment.
Cheers,
Erwan,

Le jeu. 21 mars 2019 à 10:49, Erwan Velu <erwanaliasr1@gmail.com> a écrit :
>
> When a HARDWARE_ERROR is triggered for asc=0x3e, the actual code is only considering the case where ascq=0x1.
>
> Following the http://www.t10.org/lists/asc-num.htm#ASC_3E specification, other values may occur like a timeout (ascq=0x2).
>
> This patch is about printing an error message when a non-handled message is received.
> This could help diagnose a possible miss-behavior of the controller and/or a missing implementation in the Linux Kernel.
>
> This patch keeps the exact same error handling but prints a message if an ascq != 1 income.
>
> Signed-off-by: Erwan Velu <e.velu@criteo.com>
> ---
>  drivers/scsi/smartpqi/smartpqi_init.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
> index 75ec43aa8df3..baf16c138800 100644
> --- a/drivers/scsi/smartpqi/smartpqi_init.c
> +++ b/drivers/scsi/smartpqi/smartpqi_init.c
> @@ -2762,16 +2762,25 @@ static void pqi_process_raid_io_error(struct pqi_io_request *io_request)
>                         scsi_normalize_sense(error_info->data,
>                                 sense_data_length, &sshdr) &&
>                                 sshdr.sense_key == HARDWARE_ERROR &&
> -                               sshdr.asc == 0x3e &&
> -                               sshdr.ascq == 0x1) {
> +                               sshdr.asc == 0x3e) {
>                         struct pqi_ctrl_info *ctrl_info = shost_to_hba(scmd->device->host);
>                         struct pqi_scsi_dev *device = scmd->device->hostdata;
>
> -                       if (printk_ratelimit())
> -                               scmd_printk(KERN_ERR, scmd, "received 'logical unit failure' from controller for scsi %d:%d:%d:%d\n",
> -                                       ctrl_info->scsi_host->host_no, device->bus, device->target, device->lun);
> -                       pqi_take_device_offline(scmd->device, "RAID");
> -                       host_byte = DID_NO_CONNECT;
> +                       switch (sshdr.ascq) {
> +                       case 0x1: /*LOGICAL UNIT FAILURE */
> +                               if (printk_ratelimit())
> +                                       scmd_printk(KERN_ERR, scmd, "received 'logical unit failure' from controller for scsi %d:%d:%d:%d\n",
> +                                               ctrl_info->scsi_host->host_no, device->bus, device->target, device->lun);
> +                               pqi_take_device_offline(scmd->device, "RAID");
> +                               host_byte = DID_NO_CONNECT;
> +                               break;
> +
> +                       default: /* See http://www.t10.org/lists/asc-num.htm#ASC_3E */
> +                               if (printk_ratelimit())
> +                                       scmd_printk(KERN_ERR, scmd, "received unhandled error %d from controller for scsi %d:%d:%d:%d\n",
> +                                               sshdr.ascq, ctrl_info->scsi_host->host_no, device->bus, device->target, device->lun);
> +                               break;
> +                       }
>                 }
>
>                 if (sense_data_length > SCSI_SENSE_BUFFERSIZE)
> --
> 2.20.1
>

WARNING: multiple messages have this Message-ID (diff)
From: Erwan Velu <erwanaliasr1@gmail.com>
Cc: Erwan Velu <e.velu@criteo.com>,
	Don Brace <don.brace@microsemi.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	"open list:MICROSEMI SMART ARRAY SMARTPQI DRIVER (smartpqi)"
	<esc.storagedev@microsemi.com>,
	"open list:MICROSEMI SMART ARRAY SMARTPQI DRIVER (smartpqi)"
	<linux-scsi@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] scsi: smartpqi: Reporting unhandled SCSI errors
Date: Wed, 10 Apr 2019 13:02:51 +0200	[thread overview]
Message-ID: <CAL2JzuyHb_gfu5Suf3yaMF1883JN1667yhEwpdmoiqYrUTO2YA@mail.gmail.com> (raw)
In-Reply-To: <20190321094928.4198-1-e.velu@criteo.com>

Hi there !
Any reactions to this one ? I didn't got a single comment.
Cheers,
Erwan,

Le jeu. 21 mars 2019 à 10:49, Erwan Velu <erwanaliasr1@gmail.com> a écrit :
>
> When a HARDWARE_ERROR is triggered for asc=0x3e, the actual code is only considering the case where ascq=0x1.
>
> Following the http://www.t10.org/lists/asc-num.htm#ASC_3E specification, other values may occur like a timeout (ascq=0x2).
>
> This patch is about printing an error message when a non-handled message is received.
> This could help diagnose a possible miss-behavior of the controller and/or a missing implementation in the Linux Kernel.
>
> This patch keeps the exact same error handling but prints a message if an ascq != 1 income.
>
> Signed-off-by: Erwan Velu <e.velu@criteo.com>
> ---
>  drivers/scsi/smartpqi/smartpqi_init.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
> index 75ec43aa8df3..baf16c138800 100644
> --- a/drivers/scsi/smartpqi/smartpqi_init.c
> +++ b/drivers/scsi/smartpqi/smartpqi_init.c
> @@ -2762,16 +2762,25 @@ static void pqi_process_raid_io_error(struct pqi_io_request *io_request)
>                         scsi_normalize_sense(error_info->data,
>                                 sense_data_length, &sshdr) &&
>                                 sshdr.sense_key == HARDWARE_ERROR &&
> -                               sshdr.asc == 0x3e &&
> -                               sshdr.ascq == 0x1) {
> +                               sshdr.asc == 0x3e) {
>                         struct pqi_ctrl_info *ctrl_info = shost_to_hba(scmd->device->host);
>                         struct pqi_scsi_dev *device = scmd->device->hostdata;
>
> -                       if (printk_ratelimit())
> -                               scmd_printk(KERN_ERR, scmd, "received 'logical unit failure' from controller for scsi %d:%d:%d:%d\n",
> -                                       ctrl_info->scsi_host->host_no, device->bus, device->target, device->lun);
> -                       pqi_take_device_offline(scmd->device, "RAID");
> -                       host_byte = DID_NO_CONNECT;
> +                       switch (sshdr.ascq) {
> +                       case 0x1: /*LOGICAL UNIT FAILURE */
> +                               if (printk_ratelimit())
> +                                       scmd_printk(KERN_ERR, scmd, "received 'logical unit failure' from controller for scsi %d:%d:%d:%d\n",
> +                                               ctrl_info->scsi_host->host_no, device->bus, device->target, device->lun);
> +                               pqi_take_device_offline(scmd->device, "RAID");
> +                               host_byte = DID_NO_CONNECT;
> +                               break;
> +
> +                       default: /* See http://www.t10.org/lists/asc-num.htm#ASC_3E */
> +                               if (printk_ratelimit())
> +                                       scmd_printk(KERN_ERR, scmd, "received unhandled error %d from controller for scsi %d:%d:%d:%d\n",
> +                                               sshdr.ascq, ctrl_info->scsi_host->host_no, device->bus, device->target, device->lun);
> +                               break;
> +                       }
>                 }
>
>                 if (sense_data_length > SCSI_SENSE_BUFFERSIZE)
> --
> 2.20.1
>

  reply	other threads:[~2019-04-10 11:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-21  9:49 [PATCH] scsi: smartpqi: Reporting unhandled SCSI errors Erwan Velu
2019-03-21  9:49 ` Erwan Velu
2019-04-10 11:02 ` Erwan Velu [this message]
2019-04-10 11:02   ` Erwan Velu
2019-05-01 14:22   ` Don.Brace
2019-05-01 14:22     ` Don.Brace
2019-05-14  0:30     ` Martin K. Petersen
2019-05-14  0:30       ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL2JzuyHb_gfu5Suf3yaMF1883JN1667yhEwpdmoiqYrUTO2YA@mail.gmail.com \
    --to=erwanaliasr1@gmail.com \
    --cc=don.brace@microsemi.com \
    --cc=e.velu@criteo.com \
    --cc=esc.storagedev@microsemi.com \
    --cc=jejb@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.