linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v6 6/8] scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errors
  2022-10-17  9:20 ` [PATCH v6 6/8] scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errors John Garry
@ 2022-10-17  9:01   ` Jinpu Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Jinpu Wang @ 2022-10-17  9:01 UTC (permalink / raw)
  To: John Garry
  Cc: jejb, martin.petersen, jinpu.wang, damien.lemoal, linux-scsi,
	linux-kernel, linuxarm, yangxingui, niklas.cassel

On Mon, Oct 17, 2022 at 10:50 AM John Garry <john.garry@huawei.com> wrote:
>
> In commit c6b9ef5779c3 ("[SCSI] pm80xx: NCQ error handling changes") the
> driver had support added to handle NCQ errors but much of what is done
> in this handling is duplicated from the libata EH.
>
> In that named commit we handle in 2x main steps:
> a. Issue read log ext10 to examine and clear the errors
> b. Issue SATA_ABORT all command
>
> Indeed, in libata EH, we do similar to above:
> a. ata_do_eh() -> ata_eh_autopsy() -> ata_eh_link_autopsy() ->
>    ata_eh_analyze_ncq_error() -> ata_eh_read_log_10h()
> b. ata_do_eh() -> ata_eh_recover() which will issue a device soft reset
>    or hard reset
>
> Since there is so much duplication, use sas_ata_device_link_abort() which
> will abort all pending IOs and kick of ATA EH which will do the steps,
> above.
>
> However we will not follow the advisory to send the SATA_ABORT all command
> after the autopsy in read log ext10. Indeed, in libsas EH, we already send
> a per-task SATA_ABORT command, and this is prior to the ATA EH kicking in
> and issuing the read log ext10 in the recovery process. I judge that this
> is ok as the SATA_ABORT command does not actually send any protocol on the
> link to abort IO on the other side, so would not change any state on the
> disk (for the read log ext10 command).
>
> Signed-off-by: John Garry <john.garry@huawei.com>
> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Tested-by: Niklas Cassel <niklas.cassel@wdc.com> # pm80xx
Reuse ata helper is nice.
Acked-by: Jack Wang <jinpu.wang@ionos.com>
> ---
>  drivers/scsi/pm8001/pm8001_hwi.c | 171 +++----------------------------
>  drivers/scsi/pm8001/pm8001_sas.c |   6 --
>  drivers/scsi/pm8001/pm8001_sas.h |   5 -
>  drivers/scsi/pm8001/pm80xx_hwi.c | 163 ++---------------------------
>  4 files changed, 19 insertions(+), 326 deletions(-)
>
> diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c
> index c0adc3a9d196..ec1a9ab61814 100644
> --- a/drivers/scsi/pm8001/pm8001_hwi.c
> +++ b/drivers/scsi/pm8001/pm8001_hwi.c
> @@ -1724,7 +1724,14 @@ void pm8001_work_fn(struct work_struct *work)
>                                 pm8001_free_dev(pm8001_dev);
>                         }
>                 }
> -       }       break;
> +       }
> +       break;
> +       case IO_XFER_ERROR_ABORTED_NCQ_MODE:
> +       {
> +               dev = pm8001_dev->sas_device;
> +               sas_ata_device_link_abort(dev, false);
> +       }
> +       break;
>         }
>         kfree(pw);
>  }
> @@ -1748,110 +1755,6 @@ int pm8001_handle_event(struct pm8001_hba_info *pm8001_ha, void *data,
>         return ret;
>  }
>
> -static void pm8001_send_abort_all(struct pm8001_hba_info *pm8001_ha,
> -               struct pm8001_device *pm8001_ha_dev)
> -{
> -       struct pm8001_ccb_info *ccb;
> -       struct sas_task *task;
> -       struct task_abort_req task_abort;
> -       u32 opc = OPC_INB_SATA_ABORT;
> -       int ret;
> -
> -       pm8001_ha_dev->id |= NCQ_ABORT_ALL_FLAG;
> -       pm8001_ha_dev->id &= ~NCQ_READ_LOG_FLAG;
> -
> -       task = sas_alloc_slow_task(GFP_ATOMIC);
> -       if (!task) {
> -               pm8001_dbg(pm8001_ha, FAIL, "cannot allocate task\n");
> -               return;
> -       }
> -
> -       task->task_done = pm8001_task_done;
> -
> -       ccb = pm8001_ccb_alloc(pm8001_ha, pm8001_ha_dev, task);
> -       if (!ccb) {
> -               sas_free_task(task);
> -               return;
> -       }
> -
> -       memset(&task_abort, 0, sizeof(task_abort));
> -       task_abort.abort_all = cpu_to_le32(1);
> -       task_abort.device_id = cpu_to_le32(pm8001_ha_dev->device_id);
> -       task_abort.tag = cpu_to_le32(ccb->ccb_tag);
> -
> -       ret = pm8001_mpi_build_cmd(pm8001_ha, 0, opc, &task_abort,
> -                                  sizeof(task_abort), 0);
> -       if (ret) {
> -               sas_free_task(task);
> -               pm8001_ccb_free(pm8001_ha, ccb);
> -       }
> -}
> -
> -static void pm8001_send_read_log(struct pm8001_hba_info *pm8001_ha,
> -               struct pm8001_device *pm8001_ha_dev)
> -{
> -       struct sata_start_req sata_cmd;
> -       int res;
> -       struct pm8001_ccb_info *ccb;
> -       struct sas_task *task = NULL;
> -       struct host_to_dev_fis fis;
> -       struct domain_device *dev;
> -       u32 opc = OPC_INB_SATA_HOST_OPSTART;
> -
> -       task = sas_alloc_slow_task(GFP_ATOMIC);
> -       if (!task) {
> -               pm8001_dbg(pm8001_ha, FAIL, "cannot allocate task !!!\n");
> -               return;
> -       }
> -       task->task_done = pm8001_task_done;
> -
> -       /*
> -        * Allocate domain device by ourselves as libsas is not going to
> -        * provide any.
> -        */
> -       dev = kzalloc(sizeof(struct domain_device), GFP_ATOMIC);
> -       if (!dev) {
> -               sas_free_task(task);
> -               pm8001_dbg(pm8001_ha, FAIL,
> -                          "Domain device cannot be allocated\n");
> -               return;
> -       }
> -       task->dev = dev;
> -       task->dev->lldd_dev = pm8001_ha_dev;
> -
> -       ccb = pm8001_ccb_alloc(pm8001_ha, pm8001_ha_dev, task);
> -       if (!ccb) {
> -               sas_free_task(task);
> -               kfree(dev);
> -               return;
> -       }
> -
> -       pm8001_ha_dev->id |= NCQ_READ_LOG_FLAG;
> -       pm8001_ha_dev->id |= NCQ_2ND_RLE_FLAG;
> -
> -       /* construct read log FIS */
> -       memset(&fis, 0, sizeof(struct host_to_dev_fis));
> -       fis.fis_type = 0x27;
> -       fis.flags = 0x80;
> -       fis.command = ATA_CMD_READ_LOG_EXT;
> -       fis.lbal = 0x10;
> -       fis.sector_count = 0x1;
> -
> -       memset(&sata_cmd, 0, sizeof(sata_cmd));
> -       sata_cmd.tag = cpu_to_le32(ccb->ccb_tag);
> -       sata_cmd.device_id = cpu_to_le32(pm8001_ha_dev->device_id);
> -       sata_cmd.ncqtag_atap_dir_m = cpu_to_le32((0x1 << 7) | (0x5 << 9));
> -       memcpy(&sata_cmd.sata_fis, &fis, sizeof(struct host_to_dev_fis));
> -
> -       res = pm8001_mpi_build_cmd(pm8001_ha, 0, opc, &sata_cmd,
> -                                  sizeof(sata_cmd), 0);
> -       if (res) {
> -               sas_free_task(task);
> -               pm8001_ccb_free(pm8001_ha, ccb);
> -               kfree(dev);
> -       }
> -}
> -
>  /**
>   * mpi_ssp_completion- process the event that FW response to the SSP request.
>   * @pm8001_ha: our hba card information
> @@ -2301,8 +2204,7 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha, void *piomb)
>                 return;
>         }
>
> -       if ((pm8001_dev && !(pm8001_dev->id & NCQ_READ_LOG_FLAG))
> -               && unlikely(!t || !t->lldd_task || !t->dev)) {
> +       if (pm8001_dev && unlikely(!t || !t->lldd_task || !t->dev)) {
>                 pm8001_dbg(pm8001_ha, FAIL, "task or dev null\n");
>                 return;
>         }
> @@ -2360,15 +2262,6 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha, void *piomb)
>                 if (param == 0) {
>                         ts->resp = SAS_TASK_COMPLETE;
>                         ts->stat = SAS_SAM_STAT_GOOD;
> -                       /* check if response is for SEND READ LOG */
> -                       if (pm8001_dev &&
> -                           (pm8001_dev->id & NCQ_READ_LOG_FLAG)) {
> -                               pm8001_send_abort_all(pm8001_ha, pm8001_dev);
> -                               /* Free the tag */
> -                               pm8001_tag_free(pm8001_ha, tag);
> -                               sas_free_task(t);
> -                               return;
> -                       }
>                 } else {
>                         u8 len;
>                         ts->resp = SAS_TASK_COMPLETE;
> @@ -2666,9 +2559,10 @@ static void mpi_sata_event(struct pm8001_hba_info *pm8001_ha, void *piomb)
>         if (event == IO_XFER_ERROR_ABORTED_NCQ_MODE) {
>                 /* find device using device id */
>                 pm8001_dev = pm8001_find_dev(pm8001_ha, dev_id);
> -               /* send read log extension */
>                 if (pm8001_dev)
> -                       pm8001_send_read_log(pm8001_ha, pm8001_dev);
> +                       pm8001_handle_event(pm8001_ha,
> +                               pm8001_dev,
> +                               IO_XFER_ERROR_ABORTED_NCQ_MODE);
>                 return;
>         }
>
> @@ -3649,12 +3543,7 @@ int pm8001_mpi_task_abort_resp(struct pm8001_hba_info *pm8001_ha, void *piomb)
>         pm8001_ccb_task_free(pm8001_ha, ccb);
>         mb();
>
> -       if (pm8001_dev->id & NCQ_ABORT_ALL_FLAG) {
> -               sas_free_task(t);
> -               pm8001_dev->id &= ~NCQ_ABORT_ALL_FLAG;
> -       } else {
> -               t->task_done(t);
> -       }
> +       t->task_done(t);
>
>         return 0;
>  }
> @@ -4206,7 +4095,6 @@ static int pm8001_chip_sata_req(struct pm8001_hba_info *pm8001_ha,
>         u64 phys_addr;
>         u32 ATAP = 0x0;
>         u32 dir;
> -       unsigned long flags;
>         u32  opc = OPC_INB_SATA_HOST_OPSTART;
>
>         memset(&sata_cmd, 0, sizeof(sata_cmd));
> @@ -4261,39 +4149,6 @@ static int pm8001_chip_sata_req(struct pm8001_hba_info *pm8001_ha,
>                 sata_cmd.esgl = 0;
>         }
>
> -       /* Check for read log for failed drive and return */
> -       if (sata_cmd.sata_fis.command == 0x2f) {
> -               if (((pm8001_ha_dev->id & NCQ_READ_LOG_FLAG) ||
> -                       (pm8001_ha_dev->id & NCQ_ABORT_ALL_FLAG) ||
> -                       (pm8001_ha_dev->id & NCQ_2ND_RLE_FLAG))) {
> -                       struct task_status_struct *ts;
> -
> -                       pm8001_ha_dev->id &= 0xDFFFFFFF;
> -                       ts = &task->task_status;
> -
> -                       spin_lock_irqsave(&task->task_state_lock, flags);
> -                       ts->resp = SAS_TASK_COMPLETE;
> -                       ts->stat = SAS_SAM_STAT_GOOD;
> -                       task->task_state_flags &= ~SAS_TASK_STATE_PENDING;
> -                       task->task_state_flags |= SAS_TASK_STATE_DONE;
> -                       if (unlikely((task->task_state_flags &
> -                                       SAS_TASK_STATE_ABORTED))) {
> -                               spin_unlock_irqrestore(&task->task_state_lock,
> -                                                       flags);
> -                               pm8001_dbg(pm8001_ha, FAIL,
> -                                          "task 0x%p resp 0x%x  stat 0x%x but aborted by upper layer\n",
> -                                          task, ts->resp,
> -                                          ts->stat);
> -                               pm8001_ccb_task_free(pm8001_ha, ccb);
> -                       } else {
> -                               spin_unlock_irqrestore(&task->task_state_lock,
> -                                                       flags);
> -                               pm8001_ccb_task_free_done(pm8001_ha, ccb);
> -                               return 0;
> -                       }
> -               }
> -       }
> -
>         return pm8001_mpi_build_cmd(pm8001_ha, 0, opc, &sata_cmd,
>                                     sizeof(sata_cmd), 0);
>  }
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
> index d5ec29f69be3..2d84ae95a1f9 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -687,12 +687,6 @@ int pm8001_dev_found(struct domain_device *dev)
>         return pm8001_dev_found_notify(dev);
>  }
>
> -void pm8001_task_done(struct sas_task *task)
> -{
> -       del_timer(&task->slow_task->timer);
> -       complete(&task->slow_task->completion);
> -}
> -
>  #define PM8001_TASK_TIMEOUT 20
>
>  /**
> diff --git a/drivers/scsi/pm8001/pm8001_sas.h b/drivers/scsi/pm8001/pm8001_sas.h
> index b08f52673889..16a753d5e8a7 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.h
> +++ b/drivers/scsi/pm8001/pm8001_sas.h
> @@ -579,10 +579,6 @@ struct pm8001_fw_image_header {
>  #define FLASH_UPDATE_DNLD_NOT_SUPPORTED                0x10
>  #define FLASH_UPDATE_DISABLED                  0x11
>
> -#define        NCQ_READ_LOG_FLAG                       0x80000000
> -#define        NCQ_ABORT_ALL_FLAG                      0x40000000
> -#define        NCQ_2ND_RLE_FLAG                        0x20000000
> -
>  /* Device states */
>  #define DS_OPERATIONAL                         0x01
>  #define DS_PORT_IN_RESET                       0x02
> @@ -709,7 +705,6 @@ int pm8001_mpi_fw_flash_update_resp(struct pm8001_hba_info *pm8001_ha,
>  int pm8001_mpi_general_event(struct pm8001_hba_info *pm8001_ha, void *piomb);
>  int pm8001_mpi_task_abort_resp(struct pm8001_hba_info *pm8001_ha, void *piomb);
>  struct sas_task *pm8001_alloc_task(void);
> -void pm8001_task_done(struct sas_task *task);
>  void pm8001_free_task(struct sas_task *task);
>  void pm8001_tag_free(struct pm8001_hba_info *pm8001_ha, u32 tag);
>  struct pm8001_device *pm8001_find_dev(struct pm8001_hba_info *pm8001_ha,
> diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c
> index dd0e06983cd3..4484c498bcb6 100644
> --- a/drivers/scsi/pm8001/pm80xx_hwi.c
> +++ b/drivers/scsi/pm8001/pm80xx_hwi.c
> @@ -1778,113 +1778,6 @@ pm80xx_chip_interrupt_disable(struct pm8001_hba_info *pm8001_ha, u8 vec)
>         pm80xx_chip_intx_interrupt_disable(pm8001_ha);
>  }
>
> -static void pm80xx_send_abort_all(struct pm8001_hba_info *pm8001_ha,
> -               struct pm8001_device *pm8001_ha_dev)
> -{
> -       struct pm8001_ccb_info *ccb;
> -       struct sas_task *task;
> -       struct task_abort_req task_abort;
> -       u32 opc = OPC_INB_SATA_ABORT;
> -       int ret;
> -
> -       pm8001_ha_dev->id |= NCQ_ABORT_ALL_FLAG;
> -       pm8001_ha_dev->id &= ~NCQ_READ_LOG_FLAG;
> -
> -       task = sas_alloc_slow_task(GFP_ATOMIC);
> -       if (!task) {
> -               pm8001_dbg(pm8001_ha, FAIL, "cannot allocate task\n");
> -               return;
> -       }
> -       task->task_done = pm8001_task_done;
> -
> -       ccb = pm8001_ccb_alloc(pm8001_ha, pm8001_ha_dev, task);
> -       if (!ccb) {
> -               sas_free_task(task);
> -               return;
> -       }
> -
> -       memset(&task_abort, 0, sizeof(task_abort));
> -       task_abort.abort_all = cpu_to_le32(1);
> -       task_abort.device_id = cpu_to_le32(pm8001_ha_dev->device_id);
> -       task_abort.tag = cpu_to_le32(ccb->ccb_tag);
> -
> -       ret = pm8001_mpi_build_cmd(pm8001_ha, 0, opc, &task_abort,
> -                                  sizeof(task_abort), 0);
> -       pm8001_dbg(pm8001_ha, FAIL, "Executing abort task end\n");
> -       if (ret) {
> -               sas_free_task(task);
> -               pm8001_ccb_free(pm8001_ha, ccb);
> -       }
> -}
> -
> -static void pm80xx_send_read_log(struct pm8001_hba_info *pm8001_ha,
> -               struct pm8001_device *pm8001_ha_dev)
> -{
> -       struct sata_start_req sata_cmd;
> -       int res;
> -       struct pm8001_ccb_info *ccb;
> -       struct sas_task *task = NULL;
> -       struct host_to_dev_fis fis;
> -       struct domain_device *dev;
> -       u32 opc = OPC_INB_SATA_HOST_OPSTART;
> -
> -       task = sas_alloc_slow_task(GFP_ATOMIC);
> -       if (!task) {
> -               pm8001_dbg(pm8001_ha, FAIL, "cannot allocate task !!!\n");
> -               return;
> -       }
> -       task->task_done = pm8001_task_done;
> -
> -       /*
> -        * Allocate domain device by ourselves as libsas is not going to
> -        * provide any.
> -        */
> -       dev = kzalloc(sizeof(struct domain_device), GFP_ATOMIC);
> -       if (!dev) {
> -               sas_free_task(task);
> -               pm8001_dbg(pm8001_ha, FAIL,
> -                          "Domain device cannot be allocated\n");
> -               return;
> -       }
> -
> -       task->dev = dev;
> -       task->dev->lldd_dev = pm8001_ha_dev;
> -
> -       ccb = pm8001_ccb_alloc(pm8001_ha, pm8001_ha_dev, task);
> -       if (!ccb) {
> -               sas_free_task(task);
> -               kfree(dev);
> -               return;
> -       }
> -
> -       pm8001_ha_dev->id |= NCQ_READ_LOG_FLAG;
> -       pm8001_ha_dev->id |= NCQ_2ND_RLE_FLAG;
> -
> -       memset(&sata_cmd, 0, sizeof(sata_cmd));
> -
> -       /* construct read log FIS */
> -       memset(&fis, 0, sizeof(struct host_to_dev_fis));
> -       fis.fis_type = 0x27;
> -       fis.flags = 0x80;
> -       fis.command = ATA_CMD_READ_LOG_EXT;
> -       fis.lbal = 0x10;
> -       fis.sector_count = 0x1;
> -
> -       sata_cmd.tag = cpu_to_le32(ccb->ccb_tag);
> -       sata_cmd.device_id = cpu_to_le32(pm8001_ha_dev->device_id);
> -       sata_cmd.ncqtag_atap_dir_m_dad = cpu_to_le32(((0x1 << 7) | (0x5 << 9)));
> -       memcpy(&sata_cmd.sata_fis, &fis, sizeof(struct host_to_dev_fis));
> -
> -       res = pm8001_mpi_build_cmd(pm8001_ha, 0, opc, &sata_cmd,
> -                                  sizeof(sata_cmd), 0);
> -       pm8001_dbg(pm8001_ha, FAIL, "Executing read log end\n");
> -       if (res) {
> -               sas_free_task(task);
> -               pm8001_ccb_free(pm8001_ha, ccb);
> -               kfree(dev);
> -       }
> -}
> -
>  /**
>   * mpi_ssp_completion - process the event that FW response to the SSP request.
>   * @pm8001_ha: our hba card information
> @@ -2402,11 +2295,9 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha,
>                 return;
>         }
>
> -       if ((pm8001_dev && !(pm8001_dev->id & NCQ_READ_LOG_FLAG))
> -               && unlikely(!t || !t->lldd_task || !t->dev)) {
> -               pm8001_dbg(pm8001_ha, FAIL, "task or dev null\n");
> +
> +       if (pm8001_dev && unlikely(!t->lldd_task || !t->dev))
>                 return;
> -       }
>
>         ts = &t->task_status;
>
> @@ -2463,15 +2354,6 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha,
>                 if (param == 0) {
>                         ts->resp = SAS_TASK_COMPLETE;
>                         ts->stat = SAS_SAM_STAT_GOOD;
> -                       /* check if response is for SEND READ LOG */
> -                       if (pm8001_dev &&
> -                           (pm8001_dev->id & NCQ_READ_LOG_FLAG)) {
> -                               pm80xx_send_abort_all(pm8001_ha, pm8001_dev);
> -                               /* Free the tag */
> -                               pm8001_tag_free(pm8001_ha, tag);
> -                               sas_free_task(t);
> -                               return;
> -                       }
>                 } else {
>                         u8 len;
>                         ts->resp = SAS_TASK_COMPLETE;
> @@ -2806,9 +2688,11 @@ static void mpi_sata_event(struct pm8001_hba_info *pm8001_ha,
>         if (event == IO_XFER_ERROR_ABORTED_NCQ_MODE) {
>                 /* find device using device id */
>                 pm8001_dev = pm8001_find_dev(pm8001_ha, dev_id);
> -               /* send read log extension */
> +               /* send read log extension by aborting the link - libata does what we want */
>                 if (pm8001_dev)
> -                       pm80xx_send_read_log(pm8001_ha, pm8001_dev);
> +                       pm8001_handle_event(pm8001_ha,
> +                               pm8001_dev,
> +                               IO_XFER_ERROR_ABORTED_NCQ_MODE);
>                 return;
>         }
>
> @@ -4556,7 +4440,6 @@ static int pm80xx_chip_sata_req(struct pm8001_hba_info *pm8001_ha,
>         u32 end_addr_high, end_addr_low;
>         u32 ATAP = 0x0;
>         u32 dir;
> -       unsigned long flags;
>         u32 opc = OPC_INB_SATA_HOST_OPSTART;
>         memset(&sata_cmd, 0, sizeof(sata_cmd));
>
> @@ -4735,40 +4618,6 @@ static int pm80xx_chip_sata_req(struct pm8001_hba_info *pm8001_ha,
>                                      (task->ata_task.atapi_packet[15] << 24)));
>         }
>
> -       /* Check for read log for failed drive and return */
> -       if (sata_cmd.sata_fis.command == 0x2f) {
> -               if (pm8001_ha_dev && ((pm8001_ha_dev->id & NCQ_READ_LOG_FLAG) ||
> -                       (pm8001_ha_dev->id & NCQ_ABORT_ALL_FLAG) ||
> -                       (pm8001_ha_dev->id & NCQ_2ND_RLE_FLAG))) {
> -                       struct task_status_struct *ts;
> -
> -                       pm8001_ha_dev->id &= 0xDFFFFFFF;
> -                       ts = &task->task_status;
> -
> -                       spin_lock_irqsave(&task->task_state_lock, flags);
> -                       ts->resp = SAS_TASK_COMPLETE;
> -                       ts->stat = SAS_SAM_STAT_GOOD;
> -                       task->task_state_flags &= ~SAS_TASK_STATE_PENDING;
> -                       task->task_state_flags |= SAS_TASK_STATE_DONE;
> -                       if (unlikely((task->task_state_flags &
> -                                       SAS_TASK_STATE_ABORTED))) {
> -                               spin_unlock_irqrestore(&task->task_state_lock,
> -                                                       flags);
> -                               pm8001_dbg(pm8001_ha, FAIL,
> -                                          "task 0x%p resp 0x%x  stat 0x%x but aborted by upper layer\n",
> -                                          task, ts->resp,
> -                                          ts->stat);
> -                               pm8001_ccb_task_free(pm8001_ha, ccb);
> -                               return 0;
> -                       } else {
> -                               spin_unlock_irqrestore(&task->task_state_lock,
> -                                                       flags);
> -                               pm8001_ccb_task_free_done(pm8001_ha, ccb);
> -                               atomic_dec(&pm8001_ha_dev->running_req);
> -                               return 0;
> -                       }
> -               }
> -       }
>         trace_pm80xx_request_issue(pm8001_ha->id,
>                                 ccb->device ? ccb->device->attached_phy : PM8001_MAX_PHYS,
>                                 ccb->ccb_tag, opc,
> --
> 2.35.3
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 0/8] libsas and drivers: NCQ error handling
@ 2022-10-17  9:20 John Garry
  2022-10-17  9:20 ` [PATCH v6 1/8] scsi: libsas: Add sas_ata_device_link_abort() John Garry
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: John Garry @ 2022-10-17  9:20 UTC (permalink / raw)
  To: jejb, martin.petersen, jinpu.wang, damien.lemoal
  Cc: linux-scsi, linux-kernel, linuxarm, yangxingui, niklas.cassel,
	John Garry

As reported in [0], the pm8001 driver NCQ error handling more or less
duplicates what libata does in link error handling, as follows:
- abort all commands
- do autopsy with read log ext 10 command
- reset the target to recover, if necessary

Indeed for the hisi_sas driver we want to add similar handling for NCQ
errors.

This series add a new libsas API - sas_ata_device_link_abort() - to handle
host NCQ errors, and fixes up pm8001 and hisi_sas drivers to use it.

A difference in the pm8001 driver NCQ error handling is that we send
SATA_ABORT per-task prior to read log ext10, but I feel that this should
not make a difference to the error handling.

Finally with these changes we can make the libsas task alloc/free APIs
private, which they should always have been.

Based on v6.1-rc1

[0] https://lore.kernel.org/linux-scsi/8fb3b093-55f0-1fab-81f4-e8519810a978@huawei.com/

Changes since v5:
- Change to set ATA_DRDY in sata dev fis for sas_ata_device_link_abort()
  and sas_ata_task_done()
- Add Niklas' tags (thanks!)
- Rebase

Changes since v4:
- Add Jason's tags (thanks)
- Rebase

Changes since v3:
- Add Damien's tags (thanks)
- Modify hisi_sas processing as follows:
  - use sas_task_abort() for rejected IO
  - Modify abort task processing to issue softreset in certain circumstances
- rebase

Changes since v2:
- Stop sending SATA_ABORT all for pm8001 handling
- Make "reset" optional in sas_ata_device_link_abort()
- Drop Jack's ACK

John Garry (6):
  scsi: libsas: Add sas_ata_device_link_abort()
  scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task()
  scsi: pm8001: Modify task abort handling for SATA task
  scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errors
  scsi: libsas: Make sas_{alloc, alloc_slow, free}_task() private
  scsi: libsas: Update SATA dev FIS in sas_ata_task_done()

Xingui Yang (2):
  scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw
  scsi: hisi_sas: Modify v3 HW SATA disk error state completion
    processing

 drivers/scsi/hisi_sas/hisi_sas.h       |   1 +
 drivers/scsi/hisi_sas/hisi_sas_main.c  |  26 +++-
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |  53 ++++++-
 drivers/scsi/libsas/sas_ata.c          |  19 ++-
 drivers/scsi/libsas/sas_init.c         |   3 -
 drivers/scsi/libsas/sas_internal.h     |   4 +
 drivers/scsi/pm8001/pm8001_hwi.c       | 186 ++++---------------------
 drivers/scsi/pm8001/pm8001_sas.c       |  14 +-
 drivers/scsi/pm8001/pm8001_sas.h       |   5 -
 drivers/scsi/pm8001/pm80xx_hwi.c       | 177 +++--------------------
 include/scsi/libsas.h                  |   4 -
 include/scsi/sas_ata.h                 |   6 +
 12 files changed, 148 insertions(+), 350 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 1/8] scsi: libsas: Add sas_ata_device_link_abort()
  2022-10-17  9:20 [PATCH v6 0/8] libsas and drivers: NCQ error handling John Garry
@ 2022-10-17  9:20 ` John Garry
  2022-10-17 10:52   ` Niklas Cassel
  2022-10-17  9:20 ` [PATCH v6 2/8] scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task() John Garry
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 17+ messages in thread
From: John Garry @ 2022-10-17  9:20 UTC (permalink / raw)
  To: jejb, martin.petersen, jinpu.wang, damien.lemoal
  Cc: linux-scsi, linux-kernel, linuxarm, yangxingui, niklas.cassel,
	John Garry

Similar to how AHCI handles NCQ errors in ahci_error_intr() ->
ata_port_abort() -> ata_do_link_abort(), add an NCQ error handler for LLDDs
to call to initiate a link abort.

This will mark all outstanding QCs as failed and kick-off EH.

Note:
A "force reset" argument is added for drivers which require the ATA error
handling to always reset the device.

A driver may require this feature for when SATA device per-SCSI cmnd
resources are only released during reset for ATA EH. As such, we need an
option to force reset to be done, regardless of what any EH autopsy
decides.

The SATA device FIS fields are set to indicate a device error from
ata_eh_analyze_tf().

Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Suggested-by: Tested-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com> # pm80xx
---
 drivers/scsi/libsas/sas_ata.c | 15 +++++++++++++++
 include/scsi/sas_ata.h        |  6 ++++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index d35c9296f738..61f64d54e67d 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -861,6 +861,21 @@ void sas_ata_wait_eh(struct domain_device *dev)
 	ata_port_wait_eh(ap);
 }
 
+void sas_ata_device_link_abort(struct domain_device *device, bool force_reset)
+{
+	struct ata_port *ap = device->sata_dev.ap;
+	struct ata_link *link = &ap->link;
+
+	device->sata_dev.fis[2] = ATA_ERR | ATA_DRDY; /* tf status */
+	device->sata_dev.fis[3] = ATA_ABORTED; /* tf error */
+
+	link->eh_info.err_mask |= AC_ERR_DEV;
+	if (force_reset)
+		link->eh_info.action |= ATA_EH_RESET;
+	ata_link_abort(link);
+}
+EXPORT_SYMBOL_GPL(sas_ata_device_link_abort);
+
 int sas_execute_ata_cmd(struct domain_device *device, u8 *fis, int force_phy_id)
 {
 	struct sas_tmf_task tmf_task = {};
diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
index a1df4f9d57a3..e47f0aec0722 100644
--- a/include/scsi/sas_ata.h
+++ b/include/scsi/sas_ata.h
@@ -32,6 +32,7 @@ void sas_probe_sata(struct asd_sas_port *port);
 void sas_suspend_sata(struct asd_sas_port *port);
 void sas_resume_sata(struct asd_sas_port *port);
 void sas_ata_end_eh(struct ata_port *ap);
+void sas_ata_device_link_abort(struct domain_device *dev, bool force_reset);
 int sas_execute_ata_cmd(struct domain_device *device, u8 *fis,
 			int force_phy_id);
 int sas_ata_wait_after_reset(struct domain_device *dev, unsigned long deadline);
@@ -87,6 +88,11 @@ static inline void sas_ata_end_eh(struct ata_port *ap)
 {
 }
 
+static inline void sas_ata_device_link_abort(struct domain_device *dev,
+					     bool force_reset)
+{
+}
+
 static inline int sas_execute_ata_cmd(struct domain_device *device, u8 *fis,
 				      int force_phy_id)
 {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 2/8] scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task()
  2022-10-17  9:20 [PATCH v6 0/8] libsas and drivers: NCQ error handling John Garry
  2022-10-17  9:20 ` [PATCH v6 1/8] scsi: libsas: Add sas_ata_device_link_abort() John Garry
@ 2022-10-17  9:20 ` John Garry
  2022-10-17  9:20 ` [PATCH v6 3/8] scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw John Garry
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: John Garry @ 2022-10-17  9:20 UTC (permalink / raw)
  To: jejb, martin.petersen, jinpu.wang, damien.lemoal
  Cc: linux-scsi, linux-kernel, linuxarm, yangxingui, niklas.cassel,
	John Garry

Each branch currently defines a slot variable independently, and it is
neater to move it to the function head.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 699b07abb6b0..8303aa5eaf25 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1547,6 +1547,7 @@ static int hisi_sas_abort_task(struct sas_task *task)
 	struct hisi_sas_internal_abort_data internal_abort_data = { false };
 	struct domain_device *device = task->dev;
 	struct hisi_sas_device *sas_dev = device->lldd_dev;
+	struct hisi_sas_slot *slot = task->lldd_task;
 	struct hisi_hba *hisi_hba;
 	struct device *dev;
 	int rc = TMF_RESP_FUNC_FAILED;
@@ -1560,7 +1561,6 @@ static int hisi_sas_abort_task(struct sas_task *task)
 
 	spin_lock_irqsave(&task->task_state_lock, flags);
 	if (task->task_state_flags & SAS_TASK_STATE_DONE) {
-		struct hisi_sas_slot *slot = task->lldd_task;
 		struct hisi_sas_cq *cq;
 
 		if (slot) {
@@ -1578,8 +1578,7 @@ static int hisi_sas_abort_task(struct sas_task *task)
 	task->task_state_flags |= SAS_TASK_STATE_ABORTED;
 	spin_unlock_irqrestore(&task->task_state_lock, flags);
 
-	if (task->lldd_task && task->task_proto & SAS_PROTOCOL_SSP) {
-		struct hisi_sas_slot *slot = task->lldd_task;
+	if (slot && task->task_proto & SAS_PROTOCOL_SSP) {
 		u16 tag = slot->idx;
 		int rc2;
 
@@ -1613,9 +1612,8 @@ static int hisi_sas_abort_task(struct sas_task *task)
 			hisi_sas_dereg_device(hisi_hba, device);
 			rc = hisi_sas_softreset_ata_disk(device);
 		}
-	} else if (task->lldd_task && task->task_proto & SAS_PROTOCOL_SMP) {
+	} else if (slot && task->task_proto & SAS_PROTOCOL_SMP) {
 		/* SMP */
-		struct hisi_sas_slot *slot = task->lldd_task;
 		u32 tag = slot->idx;
 		struct hisi_sas_cq *cq = &hisi_hba->cq[slot->dlvry_queue];
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 3/8] scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw
  2022-10-17  9:20 [PATCH v6 0/8] libsas and drivers: NCQ error handling John Garry
  2022-10-17  9:20 ` [PATCH v6 1/8] scsi: libsas: Add sas_ata_device_link_abort() John Garry
  2022-10-17  9:20 ` [PATCH v6 2/8] scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task() John Garry
@ 2022-10-17  9:20 ` John Garry
  2022-10-17  9:20 ` [PATCH v6 4/8] scsi: hisi_sas: Modify v3 HW SATA disk error state completion processing John Garry
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: John Garry @ 2022-10-17  9:20 UTC (permalink / raw)
  To: jejb, martin.petersen, jinpu.wang, damien.lemoal
  Cc: linux-scsi, linux-kernel, linuxarm, yangxingui, niklas.cassel,
	John Garry

From: Xingui Yang <yangxingui@huawei.com>

When CQ header dw3 SATA_DISK_ERR is set it means this SATA disk is in error
state and the current IPTT is invalid. An invalid IPTT does not correspond
to any slot.

In this scenario, new I/Os that delivered to disk will be rejected by the
controller and all I/Os remaining in the disk should be aborted, which we
add here with the sas_ata_device_link_abort() call.

In hisi_sas_abort_task() we don't want to issue a softreset as it may
cause info to be lost in the target disk for the ATA EH autopsy. In this
case, just release resources - the disk won't return other I/Os normally
after NCQ Error, so this is safe.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h       |  1 +
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 18 +++++++++-
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 48 ++++++++++++++++++++++++--
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 9aebf4a26b13..6f8a52a1b808 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -104,6 +104,7 @@ enum {
 enum dev_status {
 	HISI_SAS_DEV_INIT,
 	HISI_SAS_DEV_NORMAL,
+	HISI_SAS_DEV_NCQ_ERR,
 };
 
 enum {
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 8303aa5eaf25..4c37ae9eb6b6 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1604,13 +1604,26 @@ static int hisi_sas_abort_task(struct sas_task *task)
 	} else if (task->task_proto & SAS_PROTOCOL_SATA ||
 		task->task_proto & SAS_PROTOCOL_STP) {
 		if (task->dev->dev_type == SAS_SATA_DEV) {
+			struct ata_queued_cmd *qc = task->uldd_task;
+
 			rc = hisi_sas_internal_task_abort_dev(sas_dev, false);
 			if (rc < 0) {
 				dev_err(dev, "abort task: internal abort failed\n");
 				goto out;
 			}
 			hisi_sas_dereg_device(hisi_hba, device);
-			rc = hisi_sas_softreset_ata_disk(device);
+
+			/*
+			 * If an ATA internal command times out in ATA EH, it
+			 * need to execute soft reset, so check the scsicmd
+			 */
+			if ((sas_dev->dev_status == HISI_SAS_DEV_NCQ_ERR) &&
+			    qc && qc->scsicmd) {
+				hisi_sas_do_release_task(hisi_hba, task, slot);
+				rc = TMF_RESP_FUNC_COMPLETE;
+			} else {
+				rc = hisi_sas_softreset_ata_disk(device);
+			}
 		}
 	} else if (slot && task->task_proto & SAS_PROTOCOL_SMP) {
 		/* SMP */
@@ -1727,6 +1740,9 @@ static int hisi_sas_I_T_nexus_reset(struct domain_device *device)
 	struct device *dev = hisi_hba->dev;
 	int rc;
 
+	if (sas_dev->dev_status == HISI_SAS_DEV_NCQ_ERR)
+		sas_dev->dev_status = HISI_SAS_DEV_NORMAL;
+
 	rc = hisi_sas_internal_task_abort_dev(sas_dev, false);
 	if (rc < 0) {
 		dev_err(dev, "I_T nexus reset: internal abort (%d)\n", rc);
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index d56b4bfd2767..0ae8a60aaf93 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -404,6 +404,11 @@
 #define CMPLT_HDR_CMPLT_MSK		(0x3 << CMPLT_HDR_CMPLT_OFF)
 #define CMPLT_HDR_ERROR_PHASE_OFF   2
 #define CMPLT_HDR_ERROR_PHASE_MSK   (0xff << CMPLT_HDR_ERROR_PHASE_OFF)
+/* bit[9:2] Error Phase */
+#define ERR_PHASE_RESPONSE_FRAME_REV_STAGE_OFF	\
+					8
+#define ERR_PHASE_RESPONSE_FRAME_REV_STAGE_MSK	\
+	(0x1 << ERR_PHASE_RESPONSE_FRAME_REV_STAGE_OFF)
 #define CMPLT_HDR_RSPNS_XFRD_OFF	10
 #define CMPLT_HDR_RSPNS_XFRD_MSK	(0x1 << CMPLT_HDR_RSPNS_XFRD_OFF)
 #define CMPLT_HDR_RSPNS_GOOD_OFF	11
@@ -423,8 +428,15 @@
 #define CMPLT_HDR_DEV_ID_OFF		16
 #define CMPLT_HDR_DEV_ID_MSK		(0xffff << CMPLT_HDR_DEV_ID_OFF)
 /* dw3 */
+#define CMPLT_HDR_SATA_DISK_ERR_OFF	16
+#define CMPLT_HDR_SATA_DISK_ERR_MSK	(0x1 << CMPLT_HDR_SATA_DISK_ERR_OFF)
 #define CMPLT_HDR_IO_IN_TARGET_OFF	17
 #define CMPLT_HDR_IO_IN_TARGET_MSK	(0x1 << CMPLT_HDR_IO_IN_TARGET_OFF)
+/* bit[23:18] ERR_FIS_ATA_STATUS */
+#define FIS_ATA_STATUS_ERR_OFF		18
+#define FIS_ATA_STATUS_ERR_MSK		(0x1 << FIS_ATA_STATUS_ERR_OFF)
+#define FIS_TYPE_SDB_OFF		31
+#define FIS_TYPE_SDB_MSK		(0x1 << FIS_TYPE_SDB_OFF)
 
 /* ITCT header */
 /* qw0 */
@@ -2148,6 +2160,18 @@ static irqreturn_t fatal_axi_int_v3_hw(int irq_no, void *p)
 	return IRQ_HANDLED;
 }
 
+static bool is_ncq_err_v3_hw(struct hisi_sas_complete_v3_hdr *complete_hdr)
+{
+	u32 dw0, dw3;
+
+	dw0 = le32_to_cpu(complete_hdr->dw0);
+	dw3 = le32_to_cpu(complete_hdr->dw3);
+
+	return (dw0 & ERR_PHASE_RESPONSE_FRAME_REV_STAGE_MSK) &&
+	       (dw3 & FIS_TYPE_SDB_MSK) &&
+	       (dw3 & FIS_ATA_STATUS_ERR_MSK);
+}
+
 static bool
 slot_err_v3_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
 	       struct hisi_sas_slot *slot)
@@ -2381,14 +2405,34 @@ static irqreturn_t  cq_thread_v3_hw(int irq_no, void *p)
 	while (rd_point != wr_point) {
 		struct hisi_sas_complete_v3_hdr *complete_hdr;
 		struct device *dev = hisi_hba->dev;
-		u32 dw1;
+		u32 dw0, dw1, dw3;
 		int iptt;
 
 		complete_hdr = &complete_queue[rd_point];
+		dw0 = le32_to_cpu(complete_hdr->dw0);
 		dw1 = le32_to_cpu(complete_hdr->dw1);
+		dw3 = le32_to_cpu(complete_hdr->dw3);
 
 		iptt = dw1 & CMPLT_HDR_IPTT_MSK;
-		if (likely(iptt < HISI_SAS_COMMAND_ENTRIES_V3_HW)) {
+		if (unlikely((dw0 & CMPLT_HDR_CMPLT_MSK) == 0x3) &&
+			     (dw3 & CMPLT_HDR_SATA_DISK_ERR_MSK)) {
+			int device_id = (dw1 & CMPLT_HDR_DEV_ID_MSK) >>
+					CMPLT_HDR_DEV_ID_OFF;
+			struct hisi_sas_itct *itct =
+				&hisi_hba->itct[device_id];
+			struct hisi_sas_device *sas_dev =
+				&hisi_hba->devices[device_id];
+			struct domain_device *device = sas_dev->sas_device;
+
+			dev_err(dev, "erroneous completion disk err dev id=%d sas_addr=0x%llx CQ hdr: 0x%x 0x%x 0x%x 0x%x\n",
+				device_id, itct->sas_addr, dw0, dw1,
+				complete_hdr->act, dw3);
+
+			if (is_ncq_err_v3_hw(complete_hdr))
+				sas_dev->dev_status = HISI_SAS_DEV_NCQ_ERR;
+
+			sas_ata_device_link_abort(device, true);
+		} else if (likely(iptt < HISI_SAS_COMMAND_ENTRIES_V3_HW)) {
 			slot = &hisi_hba->slot_info[iptt];
 			slot->cmplt_queue_slot = rd_point;
 			slot->cmplt_queue = queue;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 4/8] scsi: hisi_sas: Modify v3 HW SATA disk error state completion processing
  2022-10-17  9:20 [PATCH v6 0/8] libsas and drivers: NCQ error handling John Garry
                   ` (2 preceding siblings ...)
  2022-10-17  9:20 ` [PATCH v6 3/8] scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw John Garry
@ 2022-10-17  9:20 ` John Garry
  2022-10-17 10:45   ` Niklas Cassel
  2022-10-17  9:20 ` [PATCH v6 5/8] scsi: pm8001: Modify task abort handling for SATA task John Garry
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 17+ messages in thread
From: John Garry @ 2022-10-17  9:20 UTC (permalink / raw)
  To: jejb, martin.petersen, jinpu.wang, damien.lemoal
  Cc: linux-scsi, linux-kernel, linuxarm, yangxingui, niklas.cassel,
	John Garry

From: Xingui Yang <yangxingui@huawei.com>

When an NCQ error occurs, the  controller will abnormally complete the I/Os
that are newly delivered to disk, and bit8 in CQ dw3 will be set which
indicates that the SATA disk is in error state. The current processing flow
is to set ts->stat to SAS_OPEN_REJECT and then sas_ata_task_done() will
set FIS stat to ATA_ERR. After analyzing the IO by ata_eh_analyze_tf(),
err_mask will set to AC_ERR_HSM. If media error occurs for four times
within 10 minutes and the chip rejects new I/Os for four times, NCQ will
be disabled due to excessive errors, which is undesirable.

Therefore, use sas_task_abort() to handle abnormally completed I/Os when
SATA disk is in error state, as these abnormally completed I/Os are already
processed by sas_ata_device_link_abort() and qc->flag are set to
ATA_QCFLAG_FAILED. If sas_task_abort() is used, qc->err_mask will not be
modified in EH. Unlike the current process flow, it will not increase
the count of ECAT_TOUT_HSM and not turn off NCQ. Like other I/Os on the
disk that do not have an error but do not return after the NCQ error, they
are retried after the EH.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 0ae8a60aaf93..0c3fcb807806 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -428,6 +428,8 @@
 #define CMPLT_HDR_DEV_ID_OFF		16
 #define CMPLT_HDR_DEV_ID_MSK		(0xffff << CMPLT_HDR_DEV_ID_OFF)
 /* dw3 */
+#define SATA_DISK_IN_ERROR_STATUS_OFF	8
+#define SATA_DISK_IN_ERROR_STATUS_MSK	(0x1 << SATA_DISK_IN_ERROR_STATUS_OFF)
 #define CMPLT_HDR_SATA_DISK_ERR_OFF	16
 #define CMPLT_HDR_SATA_DISK_ERR_MSK	(0x1 << CMPLT_HDR_SATA_DISK_ERR_OFF)
 #define CMPLT_HDR_IO_IN_TARGET_OFF	17
@@ -2219,7 +2221,8 @@ slot_err_v3_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
 		} else if (dma_rx_err_type & RX_DATA_LEN_UNDERFLOW_MSK) {
 			ts->residual = trans_tx_fail_type;
 			ts->stat = SAS_DATA_UNDERRUN;
-		} else if (dw3 & CMPLT_HDR_IO_IN_TARGET_MSK) {
+		} else if ((dw3 & CMPLT_HDR_IO_IN_TARGET_MSK) ||
+			   (dw3 & SATA_DISK_IN_ERROR_STATUS_MSK)) {
 			ts->stat = SAS_PHY_DOWN;
 			slot->abort = 1;
 		} else {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 5/8] scsi: pm8001: Modify task abort handling for SATA task
  2022-10-17  9:20 [PATCH v6 0/8] libsas and drivers: NCQ error handling John Garry
                   ` (3 preceding siblings ...)
  2022-10-17  9:20 ` [PATCH v6 4/8] scsi: hisi_sas: Modify v3 HW SATA disk error state completion processing John Garry
@ 2022-10-17  9:20 ` John Garry
  2022-10-17  9:20 ` [PATCH v6 6/8] scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errors John Garry
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: John Garry @ 2022-10-17  9:20 UTC (permalink / raw)
  To: jejb, martin.petersen, jinpu.wang, damien.lemoal
  Cc: linux-scsi, linux-kernel, linuxarm, yangxingui, niklas.cassel,
	John Garry

When we try to abort a SATA task, the CCB of the task which we are trying
to avoid may still complete. In this case, we should not touch the task
associated with that CCB as we can race with libsas freeing the last later
in sas_eh_handle_sas_errors() -> sas_eh_finish_cmd() for when
TASK_IS_ABORTED is returned from sas_scsi_find_task()

Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com> # pm80xx
---
 drivers/scsi/pm8001/pm8001_hwi.c | 15 +++++++++++++--
 drivers/scsi/pm8001/pm8001_sas.c |  8 ++++++++
 drivers/scsi/pm8001/pm80xx_hwi.c | 14 ++++++++++----
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c
index 628b08ba6770..c0adc3a9d196 100644
--- a/drivers/scsi/pm8001/pm8001_hwi.c
+++ b/drivers/scsi/pm8001/pm8001_hwi.c
@@ -2295,7 +2295,9 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha, void *piomb)
 		if (t->dev && (t->dev->lldd_dev))
 			pm8001_dev = t->dev->lldd_dev;
 	} else {
-		pm8001_dbg(pm8001_ha, FAIL, "task null\n");
+		pm8001_dbg(pm8001_ha, FAIL, "task null, freeing CCB tag %d\n",
+			   ccb->ccb_tag);
+		pm8001_ccb_free(pm8001_ha, ccb);
 		return;
 	}
 
@@ -2675,8 +2677,17 @@ static void mpi_sata_event(struct pm8001_hba_info *pm8001_ha, void *piomb)
 	pm8001_dev = ccb->device;
 	if (event)
 		pm8001_dbg(pm8001_ha, FAIL, "sata IO status 0x%x\n", event);
-	if (unlikely(!t || !t->lldd_task || !t->dev))
+
+	if (unlikely(!t)) {
+		pm8001_dbg(pm8001_ha, FAIL, "task null, freeing CCB tag %d\n",
+			   ccb->ccb_tag);
+		pm8001_ccb_free(pm8001_ha, ccb);
 		return;
+	}
+
+	if (unlikely(!t->lldd_task || !t->dev))
+		return;
+
 	ts = &t->task_status;
 	pm8001_dbg(pm8001_ha, DEVIO,
 		   "port_id:0x%x, device_id:0x%x, tag:0x%x, event:0x%x\n",
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index 8e3f2f9ddaac..d5ec29f69be3 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -983,6 +983,7 @@ int pm8001_query_task(struct sas_task *task)
 /*  mandatory SAM-3, still need free task/ccb info, abort the specified task */
 int pm8001_abort_task(struct sas_task *task)
 {
+	struct pm8001_ccb_info *ccb = task->lldd_task;
 	unsigned long flags;
 	u32 tag;
 	struct domain_device *dev ;
@@ -1113,6 +1114,13 @@ int pm8001_abort_task(struct sas_task *task)
 				pm8001_dev, DS_OPERATIONAL);
 			wait_for_completion(&completion);
 		} else {
+			/*
+			 * Ensure that if we see a completion for the ccb
+			 * associated with the task which we are trying to
+			 * abort then we should not touch the sas_task as it
+			 * may race with libsas freeing it when return here.
+			 */
+			ccb->task = NULL;
 			ret = sas_execute_internal_abort_single(dev, tag, 0, NULL);
 		}
 		rc = TMF_RESP_FUNC_COMPLETE;
diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c
index f8b8624458f7..dd0e06983cd3 100644
--- a/drivers/scsi/pm8001/pm80xx_hwi.c
+++ b/drivers/scsi/pm8001/pm80xx_hwi.c
@@ -2396,7 +2396,9 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha,
 		if (t->dev && (t->dev->lldd_dev))
 			pm8001_dev = t->dev->lldd_dev;
 	} else {
-		pm8001_dbg(pm8001_ha, FAIL, "task null\n");
+		pm8001_dbg(pm8001_ha, FAIL, "task null, freeing CCB tag %d\n",
+			   ccb->ccb_tag);
+		pm8001_ccb_free(pm8001_ha, ccb);
 		return;
 	}
 
@@ -2813,12 +2815,16 @@ static void mpi_sata_event(struct pm8001_hba_info *pm8001_ha,
 	ccb = &pm8001_ha->ccb_info[tag];
 	t = ccb->task;
 	pm8001_dev = ccb->device;
-
-	if (unlikely(!t || !t->lldd_task || !t->dev)) {
-		pm8001_dbg(pm8001_ha, FAIL, "task or dev null\n");
+	if (unlikely(!t)) {
+		pm8001_dbg(pm8001_ha, FAIL, "task null, freeing CCB tag %d\n",
+			   ccb->ccb_tag);
+		pm8001_ccb_free(pm8001_ha, ccb);
 		return;
 	}
 
+	if (unlikely(!t->lldd_task || !t->dev))
+		return;
+
 	ts = &t->task_status;
 	pm8001_dbg(pm8001_ha, IOERR, "port_id:0x%x, tag:0x%x, event:0x%x\n",
 		   port_id, tag, event);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 6/8] scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errors
  2022-10-17  9:20 [PATCH v6 0/8] libsas and drivers: NCQ error handling John Garry
                   ` (4 preceding siblings ...)
  2022-10-17  9:20 ` [PATCH v6 5/8] scsi: pm8001: Modify task abort handling for SATA task John Garry
@ 2022-10-17  9:20 ` John Garry
  2022-10-17  9:01   ` Jinpu Wang
  2022-10-17  9:20 ` [PATCH v6 7/8] scsi: libsas: Make sas_{alloc, alloc_slow, free}_task() private John Garry
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 17+ messages in thread
From: John Garry @ 2022-10-17  9:20 UTC (permalink / raw)
  To: jejb, martin.petersen, jinpu.wang, damien.lemoal
  Cc: linux-scsi, linux-kernel, linuxarm, yangxingui, niklas.cassel,
	John Garry

In commit c6b9ef5779c3 ("[SCSI] pm80xx: NCQ error handling changes") the
driver had support added to handle NCQ errors but much of what is done
in this handling is duplicated from the libata EH.

In that named commit we handle in 2x main steps:
a. Issue read log ext10 to examine and clear the errors
b. Issue SATA_ABORT all command

Indeed, in libata EH, we do similar to above:
a. ata_do_eh() -> ata_eh_autopsy() -> ata_eh_link_autopsy() ->
   ata_eh_analyze_ncq_error() -> ata_eh_read_log_10h()
b. ata_do_eh() -> ata_eh_recover() which will issue a device soft reset
   or hard reset

Since there is so much duplication, use sas_ata_device_link_abort() which
will abort all pending IOs and kick of ATA EH which will do the steps,
above.

However we will not follow the advisory to send the SATA_ABORT all command
after the autopsy in read log ext10. Indeed, in libsas EH, we already send
a per-task SATA_ABORT command, and this is prior to the ATA EH kicking in
and issuing the read log ext10 in the recovery process. I judge that this
is ok as the SATA_ABORT command does not actually send any protocol on the
link to abort IO on the other side, so would not change any state on the
disk (for the read log ext10 command).

Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com> # pm80xx
---
 drivers/scsi/pm8001/pm8001_hwi.c | 171 +++----------------------------
 drivers/scsi/pm8001/pm8001_sas.c |   6 --
 drivers/scsi/pm8001/pm8001_sas.h |   5 -
 drivers/scsi/pm8001/pm80xx_hwi.c | 163 ++---------------------------
 4 files changed, 19 insertions(+), 326 deletions(-)

diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c
index c0adc3a9d196..ec1a9ab61814 100644
--- a/drivers/scsi/pm8001/pm8001_hwi.c
+++ b/drivers/scsi/pm8001/pm8001_hwi.c
@@ -1724,7 +1724,14 @@ void pm8001_work_fn(struct work_struct *work)
 				pm8001_free_dev(pm8001_dev);
 			}
 		}
-	}	break;
+	}
+	break;
+	case IO_XFER_ERROR_ABORTED_NCQ_MODE:
+	{
+		dev = pm8001_dev->sas_device;
+		sas_ata_device_link_abort(dev, false);
+	}
+	break;
 	}
 	kfree(pw);
 }
@@ -1748,110 +1755,6 @@ int pm8001_handle_event(struct pm8001_hba_info *pm8001_ha, void *data,
 	return ret;
 }
 
-static void pm8001_send_abort_all(struct pm8001_hba_info *pm8001_ha,
-		struct pm8001_device *pm8001_ha_dev)
-{
-	struct pm8001_ccb_info *ccb;
-	struct sas_task *task;
-	struct task_abort_req task_abort;
-	u32 opc = OPC_INB_SATA_ABORT;
-	int ret;
-
-	pm8001_ha_dev->id |= NCQ_ABORT_ALL_FLAG;
-	pm8001_ha_dev->id &= ~NCQ_READ_LOG_FLAG;
-
-	task = sas_alloc_slow_task(GFP_ATOMIC);
-	if (!task) {
-		pm8001_dbg(pm8001_ha, FAIL, "cannot allocate task\n");
-		return;
-	}
-
-	task->task_done = pm8001_task_done;
-
-	ccb = pm8001_ccb_alloc(pm8001_ha, pm8001_ha_dev, task);
-	if (!ccb) {
-		sas_free_task(task);
-		return;
-	}
-
-	memset(&task_abort, 0, sizeof(task_abort));
-	task_abort.abort_all = cpu_to_le32(1);
-	task_abort.device_id = cpu_to_le32(pm8001_ha_dev->device_id);
-	task_abort.tag = cpu_to_le32(ccb->ccb_tag);
-
-	ret = pm8001_mpi_build_cmd(pm8001_ha, 0, opc, &task_abort,
-				   sizeof(task_abort), 0);
-	if (ret) {
-		sas_free_task(task);
-		pm8001_ccb_free(pm8001_ha, ccb);
-	}
-}
-
-static void pm8001_send_read_log(struct pm8001_hba_info *pm8001_ha,
-		struct pm8001_device *pm8001_ha_dev)
-{
-	struct sata_start_req sata_cmd;
-	int res;
-	struct pm8001_ccb_info *ccb;
-	struct sas_task *task = NULL;
-	struct host_to_dev_fis fis;
-	struct domain_device *dev;
-	u32 opc = OPC_INB_SATA_HOST_OPSTART;
-
-	task = sas_alloc_slow_task(GFP_ATOMIC);
-	if (!task) {
-		pm8001_dbg(pm8001_ha, FAIL, "cannot allocate task !!!\n");
-		return;
-	}
-	task->task_done = pm8001_task_done;
-
-	/*
-	 * Allocate domain device by ourselves as libsas is not going to
-	 * provide any.
-	 */
-	dev = kzalloc(sizeof(struct domain_device), GFP_ATOMIC);
-	if (!dev) {
-		sas_free_task(task);
-		pm8001_dbg(pm8001_ha, FAIL,
-			   "Domain device cannot be allocated\n");
-		return;
-	}
-	task->dev = dev;
-	task->dev->lldd_dev = pm8001_ha_dev;
-
-	ccb = pm8001_ccb_alloc(pm8001_ha, pm8001_ha_dev, task);
-	if (!ccb) {
-		sas_free_task(task);
-		kfree(dev);
-		return;
-	}
-
-	pm8001_ha_dev->id |= NCQ_READ_LOG_FLAG;
-	pm8001_ha_dev->id |= NCQ_2ND_RLE_FLAG;
-
-	/* construct read log FIS */
-	memset(&fis, 0, sizeof(struct host_to_dev_fis));
-	fis.fis_type = 0x27;
-	fis.flags = 0x80;
-	fis.command = ATA_CMD_READ_LOG_EXT;
-	fis.lbal = 0x10;
-	fis.sector_count = 0x1;
-
-	memset(&sata_cmd, 0, sizeof(sata_cmd));
-	sata_cmd.tag = cpu_to_le32(ccb->ccb_tag);
-	sata_cmd.device_id = cpu_to_le32(pm8001_ha_dev->device_id);
-	sata_cmd.ncqtag_atap_dir_m = cpu_to_le32((0x1 << 7) | (0x5 << 9));
-	memcpy(&sata_cmd.sata_fis, &fis, sizeof(struct host_to_dev_fis));
-
-	res = pm8001_mpi_build_cmd(pm8001_ha, 0, opc, &sata_cmd,
-				   sizeof(sata_cmd), 0);
-	if (res) {
-		sas_free_task(task);
-		pm8001_ccb_free(pm8001_ha, ccb);
-		kfree(dev);
-	}
-}
-
 /**
  * mpi_ssp_completion- process the event that FW response to the SSP request.
  * @pm8001_ha: our hba card information
@@ -2301,8 +2204,7 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha, void *piomb)
 		return;
 	}
 
-	if ((pm8001_dev && !(pm8001_dev->id & NCQ_READ_LOG_FLAG))
-		&& unlikely(!t || !t->lldd_task || !t->dev)) {
+	if (pm8001_dev && unlikely(!t || !t->lldd_task || !t->dev)) {
 		pm8001_dbg(pm8001_ha, FAIL, "task or dev null\n");
 		return;
 	}
@@ -2360,15 +2262,6 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha, void *piomb)
 		if (param == 0) {
 			ts->resp = SAS_TASK_COMPLETE;
 			ts->stat = SAS_SAM_STAT_GOOD;
-			/* check if response is for SEND READ LOG */
-			if (pm8001_dev &&
-			    (pm8001_dev->id & NCQ_READ_LOG_FLAG)) {
-				pm8001_send_abort_all(pm8001_ha, pm8001_dev);
-				/* Free the tag */
-				pm8001_tag_free(pm8001_ha, tag);
-				sas_free_task(t);
-				return;
-			}
 		} else {
 			u8 len;
 			ts->resp = SAS_TASK_COMPLETE;
@@ -2666,9 +2559,10 @@ static void mpi_sata_event(struct pm8001_hba_info *pm8001_ha, void *piomb)
 	if (event == IO_XFER_ERROR_ABORTED_NCQ_MODE) {
 		/* find device using device id */
 		pm8001_dev = pm8001_find_dev(pm8001_ha, dev_id);
-		/* send read log extension */
 		if (pm8001_dev)
-			pm8001_send_read_log(pm8001_ha, pm8001_dev);
+			pm8001_handle_event(pm8001_ha,
+				pm8001_dev,
+				IO_XFER_ERROR_ABORTED_NCQ_MODE);
 		return;
 	}
 
@@ -3649,12 +3543,7 @@ int pm8001_mpi_task_abort_resp(struct pm8001_hba_info *pm8001_ha, void *piomb)
 	pm8001_ccb_task_free(pm8001_ha, ccb);
 	mb();
 
-	if (pm8001_dev->id & NCQ_ABORT_ALL_FLAG) {
-		sas_free_task(t);
-		pm8001_dev->id &= ~NCQ_ABORT_ALL_FLAG;
-	} else {
-		t->task_done(t);
-	}
+	t->task_done(t);
 
 	return 0;
 }
@@ -4206,7 +4095,6 @@ static int pm8001_chip_sata_req(struct pm8001_hba_info *pm8001_ha,
 	u64 phys_addr;
 	u32 ATAP = 0x0;
 	u32 dir;
-	unsigned long flags;
 	u32  opc = OPC_INB_SATA_HOST_OPSTART;
 
 	memset(&sata_cmd, 0, sizeof(sata_cmd));
@@ -4261,39 +4149,6 @@ static int pm8001_chip_sata_req(struct pm8001_hba_info *pm8001_ha,
 		sata_cmd.esgl = 0;
 	}
 
-	/* Check for read log for failed drive and return */
-	if (sata_cmd.sata_fis.command == 0x2f) {
-		if (((pm8001_ha_dev->id & NCQ_READ_LOG_FLAG) ||
-			(pm8001_ha_dev->id & NCQ_ABORT_ALL_FLAG) ||
-			(pm8001_ha_dev->id & NCQ_2ND_RLE_FLAG))) {
-			struct task_status_struct *ts;
-
-			pm8001_ha_dev->id &= 0xDFFFFFFF;
-			ts = &task->task_status;
-
-			spin_lock_irqsave(&task->task_state_lock, flags);
-			ts->resp = SAS_TASK_COMPLETE;
-			ts->stat = SAS_SAM_STAT_GOOD;
-			task->task_state_flags &= ~SAS_TASK_STATE_PENDING;
-			task->task_state_flags |= SAS_TASK_STATE_DONE;
-			if (unlikely((task->task_state_flags &
-					SAS_TASK_STATE_ABORTED))) {
-				spin_unlock_irqrestore(&task->task_state_lock,
-							flags);
-				pm8001_dbg(pm8001_ha, FAIL,
-					   "task 0x%p resp 0x%x  stat 0x%x but aborted by upper layer\n",
-					   task, ts->resp,
-					   ts->stat);
-				pm8001_ccb_task_free(pm8001_ha, ccb);
-			} else {
-				spin_unlock_irqrestore(&task->task_state_lock,
-							flags);
-				pm8001_ccb_task_free_done(pm8001_ha, ccb);
-				return 0;
-			}
-		}
-	}
-
 	return pm8001_mpi_build_cmd(pm8001_ha, 0, opc, &sata_cmd,
 				    sizeof(sata_cmd), 0);
 }
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index d5ec29f69be3..2d84ae95a1f9 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -687,12 +687,6 @@ int pm8001_dev_found(struct domain_device *dev)
 	return pm8001_dev_found_notify(dev);
 }
 
-void pm8001_task_done(struct sas_task *task)
-{
-	del_timer(&task->slow_task->timer);
-	complete(&task->slow_task->completion);
-}
-
 #define PM8001_TASK_TIMEOUT 20
 
 /**
diff --git a/drivers/scsi/pm8001/pm8001_sas.h b/drivers/scsi/pm8001/pm8001_sas.h
index b08f52673889..16a753d5e8a7 100644
--- a/drivers/scsi/pm8001/pm8001_sas.h
+++ b/drivers/scsi/pm8001/pm8001_sas.h
@@ -579,10 +579,6 @@ struct pm8001_fw_image_header {
 #define FLASH_UPDATE_DNLD_NOT_SUPPORTED		0x10
 #define FLASH_UPDATE_DISABLED			0x11
 
-#define	NCQ_READ_LOG_FLAG			0x80000000
-#define	NCQ_ABORT_ALL_FLAG			0x40000000
-#define	NCQ_2ND_RLE_FLAG			0x20000000
-
 /* Device states */
 #define DS_OPERATIONAL				0x01
 #define DS_PORT_IN_RESET			0x02
@@ -709,7 +705,6 @@ int pm8001_mpi_fw_flash_update_resp(struct pm8001_hba_info *pm8001_ha,
 int pm8001_mpi_general_event(struct pm8001_hba_info *pm8001_ha, void *piomb);
 int pm8001_mpi_task_abort_resp(struct pm8001_hba_info *pm8001_ha, void *piomb);
 struct sas_task *pm8001_alloc_task(void);
-void pm8001_task_done(struct sas_task *task);
 void pm8001_free_task(struct sas_task *task);
 void pm8001_tag_free(struct pm8001_hba_info *pm8001_ha, u32 tag);
 struct pm8001_device *pm8001_find_dev(struct pm8001_hba_info *pm8001_ha,
diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c
index dd0e06983cd3..4484c498bcb6 100644
--- a/drivers/scsi/pm8001/pm80xx_hwi.c
+++ b/drivers/scsi/pm8001/pm80xx_hwi.c
@@ -1778,113 +1778,6 @@ pm80xx_chip_interrupt_disable(struct pm8001_hba_info *pm8001_ha, u8 vec)
 	pm80xx_chip_intx_interrupt_disable(pm8001_ha);
 }
 
-static void pm80xx_send_abort_all(struct pm8001_hba_info *pm8001_ha,
-		struct pm8001_device *pm8001_ha_dev)
-{
-	struct pm8001_ccb_info *ccb;
-	struct sas_task *task;
-	struct task_abort_req task_abort;
-	u32 opc = OPC_INB_SATA_ABORT;
-	int ret;
-
-	pm8001_ha_dev->id |= NCQ_ABORT_ALL_FLAG;
-	pm8001_ha_dev->id &= ~NCQ_READ_LOG_FLAG;
-
-	task = sas_alloc_slow_task(GFP_ATOMIC);
-	if (!task) {
-		pm8001_dbg(pm8001_ha, FAIL, "cannot allocate task\n");
-		return;
-	}
-	task->task_done = pm8001_task_done;
-
-	ccb = pm8001_ccb_alloc(pm8001_ha, pm8001_ha_dev, task);
-	if (!ccb) {
-		sas_free_task(task);
-		return;
-	}
-
-	memset(&task_abort, 0, sizeof(task_abort));
-	task_abort.abort_all = cpu_to_le32(1);
-	task_abort.device_id = cpu_to_le32(pm8001_ha_dev->device_id);
-	task_abort.tag = cpu_to_le32(ccb->ccb_tag);
-
-	ret = pm8001_mpi_build_cmd(pm8001_ha, 0, opc, &task_abort,
-				   sizeof(task_abort), 0);
-	pm8001_dbg(pm8001_ha, FAIL, "Executing abort task end\n");
-	if (ret) {
-		sas_free_task(task);
-		pm8001_ccb_free(pm8001_ha, ccb);
-	}
-}
-
-static void pm80xx_send_read_log(struct pm8001_hba_info *pm8001_ha,
-		struct pm8001_device *pm8001_ha_dev)
-{
-	struct sata_start_req sata_cmd;
-	int res;
-	struct pm8001_ccb_info *ccb;
-	struct sas_task *task = NULL;
-	struct host_to_dev_fis fis;
-	struct domain_device *dev;
-	u32 opc = OPC_INB_SATA_HOST_OPSTART;
-
-	task = sas_alloc_slow_task(GFP_ATOMIC);
-	if (!task) {
-		pm8001_dbg(pm8001_ha, FAIL, "cannot allocate task !!!\n");
-		return;
-	}
-	task->task_done = pm8001_task_done;
-
-	/*
-	 * Allocate domain device by ourselves as libsas is not going to
-	 * provide any.
-	 */
-	dev = kzalloc(sizeof(struct domain_device), GFP_ATOMIC);
-	if (!dev) {
-		sas_free_task(task);
-		pm8001_dbg(pm8001_ha, FAIL,
-			   "Domain device cannot be allocated\n");
-		return;
-	}
-
-	task->dev = dev;
-	task->dev->lldd_dev = pm8001_ha_dev;
-
-	ccb = pm8001_ccb_alloc(pm8001_ha, pm8001_ha_dev, task);
-	if (!ccb) {
-		sas_free_task(task);
-		kfree(dev);
-		return;
-	}
-
-	pm8001_ha_dev->id |= NCQ_READ_LOG_FLAG;
-	pm8001_ha_dev->id |= NCQ_2ND_RLE_FLAG;
-
-	memset(&sata_cmd, 0, sizeof(sata_cmd));
-
-	/* construct read log FIS */
-	memset(&fis, 0, sizeof(struct host_to_dev_fis));
-	fis.fis_type = 0x27;
-	fis.flags = 0x80;
-	fis.command = ATA_CMD_READ_LOG_EXT;
-	fis.lbal = 0x10;
-	fis.sector_count = 0x1;
-
-	sata_cmd.tag = cpu_to_le32(ccb->ccb_tag);
-	sata_cmd.device_id = cpu_to_le32(pm8001_ha_dev->device_id);
-	sata_cmd.ncqtag_atap_dir_m_dad = cpu_to_le32(((0x1 << 7) | (0x5 << 9)));
-	memcpy(&sata_cmd.sata_fis, &fis, sizeof(struct host_to_dev_fis));
-
-	res = pm8001_mpi_build_cmd(pm8001_ha, 0, opc, &sata_cmd,
-				   sizeof(sata_cmd), 0);
-	pm8001_dbg(pm8001_ha, FAIL, "Executing read log end\n");
-	if (res) {
-		sas_free_task(task);
-		pm8001_ccb_free(pm8001_ha, ccb);
-		kfree(dev);
-	}
-}
-
 /**
  * mpi_ssp_completion - process the event that FW response to the SSP request.
  * @pm8001_ha: our hba card information
@@ -2402,11 +2295,9 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha,
 		return;
 	}
 
-	if ((pm8001_dev && !(pm8001_dev->id & NCQ_READ_LOG_FLAG))
-		&& unlikely(!t || !t->lldd_task || !t->dev)) {
-		pm8001_dbg(pm8001_ha, FAIL, "task or dev null\n");
+
+	if (pm8001_dev && unlikely(!t->lldd_task || !t->dev))
 		return;
-	}
 
 	ts = &t->task_status;
 
@@ -2463,15 +2354,6 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha,
 		if (param == 0) {
 			ts->resp = SAS_TASK_COMPLETE;
 			ts->stat = SAS_SAM_STAT_GOOD;
-			/* check if response is for SEND READ LOG */
-			if (pm8001_dev &&
-			    (pm8001_dev->id & NCQ_READ_LOG_FLAG)) {
-				pm80xx_send_abort_all(pm8001_ha, pm8001_dev);
-				/* Free the tag */
-				pm8001_tag_free(pm8001_ha, tag);
-				sas_free_task(t);
-				return;
-			}
 		} else {
 			u8 len;
 			ts->resp = SAS_TASK_COMPLETE;
@@ -2806,9 +2688,11 @@ static void mpi_sata_event(struct pm8001_hba_info *pm8001_ha,
 	if (event == IO_XFER_ERROR_ABORTED_NCQ_MODE) {
 		/* find device using device id */
 		pm8001_dev = pm8001_find_dev(pm8001_ha, dev_id);
-		/* send read log extension */
+		/* send read log extension by aborting the link - libata does what we want */
 		if (pm8001_dev)
-			pm80xx_send_read_log(pm8001_ha, pm8001_dev);
+			pm8001_handle_event(pm8001_ha,
+				pm8001_dev,
+				IO_XFER_ERROR_ABORTED_NCQ_MODE);
 		return;
 	}
 
@@ -4556,7 +4440,6 @@ static int pm80xx_chip_sata_req(struct pm8001_hba_info *pm8001_ha,
 	u32 end_addr_high, end_addr_low;
 	u32 ATAP = 0x0;
 	u32 dir;
-	unsigned long flags;
 	u32 opc = OPC_INB_SATA_HOST_OPSTART;
 	memset(&sata_cmd, 0, sizeof(sata_cmd));
 
@@ -4735,40 +4618,6 @@ static int pm80xx_chip_sata_req(struct pm8001_hba_info *pm8001_ha,
 				     (task->ata_task.atapi_packet[15] << 24)));
 	}
 
-	/* Check for read log for failed drive and return */
-	if (sata_cmd.sata_fis.command == 0x2f) {
-		if (pm8001_ha_dev && ((pm8001_ha_dev->id & NCQ_READ_LOG_FLAG) ||
-			(pm8001_ha_dev->id & NCQ_ABORT_ALL_FLAG) ||
-			(pm8001_ha_dev->id & NCQ_2ND_RLE_FLAG))) {
-			struct task_status_struct *ts;
-
-			pm8001_ha_dev->id &= 0xDFFFFFFF;
-			ts = &task->task_status;
-
-			spin_lock_irqsave(&task->task_state_lock, flags);
-			ts->resp = SAS_TASK_COMPLETE;
-			ts->stat = SAS_SAM_STAT_GOOD;
-			task->task_state_flags &= ~SAS_TASK_STATE_PENDING;
-			task->task_state_flags |= SAS_TASK_STATE_DONE;
-			if (unlikely((task->task_state_flags &
-					SAS_TASK_STATE_ABORTED))) {
-				spin_unlock_irqrestore(&task->task_state_lock,
-							flags);
-				pm8001_dbg(pm8001_ha, FAIL,
-					   "task 0x%p resp 0x%x  stat 0x%x but aborted by upper layer\n",
-					   task, ts->resp,
-					   ts->stat);
-				pm8001_ccb_task_free(pm8001_ha, ccb);
-				return 0;
-			} else {
-				spin_unlock_irqrestore(&task->task_state_lock,
-							flags);
-				pm8001_ccb_task_free_done(pm8001_ha, ccb);
-				atomic_dec(&pm8001_ha_dev->running_req);
-				return 0;
-			}
-		}
-	}
 	trace_pm80xx_request_issue(pm8001_ha->id,
 				ccb->device ? ccb->device->attached_phy : PM8001_MAX_PHYS,
 				ccb->ccb_tag, opc,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 7/8] scsi: libsas: Make sas_{alloc, alloc_slow, free}_task() private
  2022-10-17  9:20 [PATCH v6 0/8] libsas and drivers: NCQ error handling John Garry
                   ` (5 preceding siblings ...)
  2022-10-17  9:20 ` [PATCH v6 6/8] scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errors John Garry
@ 2022-10-17  9:20 ` John Garry
  2022-10-17  9:20 ` [PATCH v6 8/8] scsi: libsas: Update SATA dev FIS in sas_ata_task_done() John Garry
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: John Garry @ 2022-10-17  9:20 UTC (permalink / raw)
  To: jejb, martin.petersen, jinpu.wang, damien.lemoal
  Cc: linux-scsi, linux-kernel, linuxarm, yangxingui, niklas.cassel,
	John Garry

We have no users outside libsas any longer, so make sas_alloc_task(),
sas_alloc_slow_task(), and sas_free_task() private.

Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com> # pm80xx
---
 drivers/scsi/libsas/sas_init.c     | 3 ---
 drivers/scsi/libsas/sas_internal.h | 4 ++++
 include/scsi/libsas.h              | 4 ----
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index e4f77072a58d..f2c05ebeb72f 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -35,7 +35,6 @@ struct sas_task *sas_alloc_task(gfp_t flags)
 
 	return task;
 }
-EXPORT_SYMBOL_GPL(sas_alloc_task);
 
 struct sas_task *sas_alloc_slow_task(gfp_t flags)
 {
@@ -56,7 +55,6 @@ struct sas_task *sas_alloc_slow_task(gfp_t flags)
 
 	return task;
 }
-EXPORT_SYMBOL_GPL(sas_alloc_slow_task);
 
 void sas_free_task(struct sas_task *task)
 {
@@ -65,7 +63,6 @@ void sas_free_task(struct sas_task *task)
 		kmem_cache_free(sas_task_cache, task);
 	}
 }
-EXPORT_SYMBOL_GPL(sas_free_task);
 
 /*------------ SAS addr hash -----------*/
 void sas_hash_addr(u8 *hashed, const u8 *sas_addr)
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 8d0ad3abc7b5..b54bcf3c9a9d 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -52,6 +52,10 @@ void sas_unregister_phys(struct sas_ha_struct *sas_ha);
 struct asd_sas_event *sas_alloc_event(struct asd_sas_phy *phy, gfp_t gfp_flags);
 void sas_free_event(struct asd_sas_event *event);
 
+struct sas_task *sas_alloc_task(gfp_t flags);
+struct sas_task *sas_alloc_slow_task(gfp_t flags);
+void sas_free_task(struct sas_task *task);
+
 int  sas_register_ports(struct sas_ha_struct *sas_ha);
 void sas_unregister_ports(struct sas_ha_struct *sas_ha);
 
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 2dbead74a2af..f86b56bf7833 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -639,10 +639,6 @@ struct sas_task_slow {
 #define SAS_TASK_STATE_ABORTED      4
 #define SAS_TASK_NEED_DEV_RESET     8
 
-extern struct sas_task *sas_alloc_task(gfp_t flags);
-extern struct sas_task *sas_alloc_slow_task(gfp_t flags);
-extern void sas_free_task(struct sas_task *task);
-
 static inline bool sas_is_internal_abort(struct sas_task *task)
 {
 	return task->task_proto == SAS_PROTOCOL_INTERNAL_ABORT;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 8/8] scsi: libsas: Update SATA dev FIS in sas_ata_task_done()
  2022-10-17  9:20 [PATCH v6 0/8] libsas and drivers: NCQ error handling John Garry
                   ` (6 preceding siblings ...)
  2022-10-17  9:20 ` [PATCH v6 7/8] scsi: libsas: Make sas_{alloc, alloc_slow, free}_task() private John Garry
@ 2022-10-17  9:20 ` John Garry
  2022-10-17 10:41   ` Niklas Cassel
  2022-10-18  2:40 ` [PATCH v6 0/8] libsas and drivers: NCQ error handling Martin K. Petersen
  2022-10-22  3:52 ` Martin K. Petersen
  9 siblings, 1 reply; 17+ messages in thread
From: John Garry @ 2022-10-17  9:20 UTC (permalink / raw)
  To: jejb, martin.petersen, jinpu.wang, damien.lemoal
  Cc: linux-scsi, linux-kernel, linuxarm, yangxingui, niklas.cassel,
	John Garry

In sas_ata_task_done(), for commands which complete with error we set
the SATA dev FIS status field with ATA_ERR. In ata_eh_analyze_tf() this
would be interpreted as a HSM error. Set ATA_DRDY, which will lead libata
to judge as a device error, which is a safer bet.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/libsas/sas_ata.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 61f64d54e67d..78e6046fb55a 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -139,8 +139,8 @@ static void sas_ata_task_done(struct sas_task *task)
 				qc->flags |= ATA_QCFLAG_FAILED;
 			}
 
-			dev->sata_dev.fis[3] = 0x04; /* status err */
-			dev->sata_dev.fis[2] = ATA_ERR;
+			dev->sata_dev.fis[2] = ATA_ERR | ATA_DRDY; /* tf status */
+			dev->sata_dev.fis[3] = ATA_ABORTED; /* tf error */
 		}
 	}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 8/8] scsi: libsas: Update SATA dev FIS in sas_ata_task_done()
  2022-10-17  9:20 ` [PATCH v6 8/8] scsi: libsas: Update SATA dev FIS in sas_ata_task_done() John Garry
@ 2022-10-17 10:41   ` Niklas Cassel
  0 siblings, 0 replies; 17+ messages in thread
From: Niklas Cassel @ 2022-10-17 10:41 UTC (permalink / raw)
  To: John Garry
  Cc: jejb, martin.petersen, jinpu.wang, damien.lemoal, linux-scsi,
	linux-kernel, linuxarm, yangxingui

On Mon, Oct 17, 2022 at 05:20:35PM +0800, John Garry wrote:
> In sas_ata_task_done(), for commands which complete with error we set
> the SATA dev FIS status field with ATA_ERR. In ata_eh_analyze_tf() this
> would be interpreted as a HSM error. Set ATA_DRDY, which will lead libata
> to judge as a device error, which is a safer bet.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  drivers/scsi/libsas/sas_ata.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
> index 61f64d54e67d..78e6046fb55a 100644
> --- a/drivers/scsi/libsas/sas_ata.c
> +++ b/drivers/scsi/libsas/sas_ata.c
> @@ -139,8 +139,8 @@ static void sas_ata_task_done(struct sas_task *task)
>  				qc->flags |= ATA_QCFLAG_FAILED;
>  			}
>  
> -			dev->sata_dev.fis[3] = 0x04; /* status err */
> -			dev->sata_dev.fis[2] = ATA_ERR;
> +			dev->sata_dev.fis[2] = ATA_ERR | ATA_DRDY; /* tf status */
> +			dev->sata_dev.fis[3] = ATA_ABORTED; /* tf error */
>  		}
>  	}
>  
> -- 
> 2.35.3
> 

Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/8] scsi: hisi_sas: Modify v3 HW SATA disk error state completion processing
  2022-10-17  9:20 ` [PATCH v6 4/8] scsi: hisi_sas: Modify v3 HW SATA disk error state completion processing John Garry
@ 2022-10-17 10:45   ` Niklas Cassel
  2022-10-17 11:01     ` John Garry
  0 siblings, 1 reply; 17+ messages in thread
From: Niklas Cassel @ 2022-10-17 10:45 UTC (permalink / raw)
  To: John Garry
  Cc: jejb, martin.petersen, jinpu.wang, damien.lemoal, linux-scsi,
	linux-kernel, linuxarm, yangxingui

On Mon, Oct 17, 2022 at 05:20:31PM +0800, John Garry wrote:
> From: Xingui Yang <yangxingui@huawei.com>
> 
> When an NCQ error occurs, the  controller will abnormally complete the I/Os
> that are newly delivered to disk, and bit8 in CQ dw3 will be set which
> indicates that the SATA disk is in error state. The current processing flow
> is to set ts->stat to SAS_OPEN_REJECT and then sas_ata_task_done() will
> set FIS stat to ATA_ERR. After analyzing the IO by ata_eh_analyze_tf(),
> err_mask will set to AC_ERR_HSM. If media error occurs for four times
> within 10 minutes and the chip rejects new I/Os for four times, NCQ will
> be disabled due to excessive errors, which is undesirable.

With patch 8/8 in this series, will it still set err_mask to AC_ERR_HSM?
If not, is this patch still needed with patch 8/8?


Kind regards,
Niklas

> 
> Therefore, use sas_task_abort() to handle abnormally completed I/Os when
> SATA disk is in error state, as these abnormally completed I/Os are already
> processed by sas_ata_device_link_abort() and qc->flag are set to
> ATA_QCFLAG_FAILED. If sas_task_abort() is used, qc->err_mask will not be
> modified in EH. Unlike the current process flow, it will not increase
> the count of ECAT_TOUT_HSM and not turn off NCQ. Like other I/Os on the
> disk that do not have an error but do not return after the NCQ error, they
> are retried after the EH.
> 
> Signed-off-by: Xingui Yang <yangxingui@huawei.com>
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> index 0ae8a60aaf93..0c3fcb807806 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> @@ -428,6 +428,8 @@
>  #define CMPLT_HDR_DEV_ID_OFF		16
>  #define CMPLT_HDR_DEV_ID_MSK		(0xffff << CMPLT_HDR_DEV_ID_OFF)
>  /* dw3 */
> +#define SATA_DISK_IN_ERROR_STATUS_OFF	8
> +#define SATA_DISK_IN_ERROR_STATUS_MSK	(0x1 << SATA_DISK_IN_ERROR_STATUS_OFF)
>  #define CMPLT_HDR_SATA_DISK_ERR_OFF	16
>  #define CMPLT_HDR_SATA_DISK_ERR_MSK	(0x1 << CMPLT_HDR_SATA_DISK_ERR_OFF)
>  #define CMPLT_HDR_IO_IN_TARGET_OFF	17
> @@ -2219,7 +2221,8 @@ slot_err_v3_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
>  		} else if (dma_rx_err_type & RX_DATA_LEN_UNDERFLOW_MSK) {
>  			ts->residual = trans_tx_fail_type;
>  			ts->stat = SAS_DATA_UNDERRUN;
> -		} else if (dw3 & CMPLT_HDR_IO_IN_TARGET_MSK) {
> +		} else if ((dw3 & CMPLT_HDR_IO_IN_TARGET_MSK) ||
> +			   (dw3 & SATA_DISK_IN_ERROR_STATUS_MSK)) {
>  			ts->stat = SAS_PHY_DOWN;
>  			slot->abort = 1;
>  		} else {
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 1/8] scsi: libsas: Add sas_ata_device_link_abort()
  2022-10-17  9:20 ` [PATCH v6 1/8] scsi: libsas: Add sas_ata_device_link_abort() John Garry
@ 2022-10-17 10:52   ` Niklas Cassel
  2022-10-17 10:57     ` John Garry
  0 siblings, 1 reply; 17+ messages in thread
From: Niklas Cassel @ 2022-10-17 10:52 UTC (permalink / raw)
  To: John Garry
  Cc: jejb, martin.petersen, jinpu.wang, damien.lemoal, linux-scsi,
	linux-kernel, linuxarm, yangxingui

On Mon, Oct 17, 2022 at 05:20:28PM +0800, John Garry wrote:
> Similar to how AHCI handles NCQ errors in ahci_error_intr() ->
> ata_port_abort() -> ata_do_link_abort(), add an NCQ error handler for LLDDs
> to call to initiate a link abort.
> 
> This will mark all outstanding QCs as failed and kick-off EH.
> 
> Note:
> A "force reset" argument is added for drivers which require the ATA error
> handling to always reset the device.
> 
> A driver may require this feature for when SATA device per-SCSI cmnd
> resources are only released during reset for ATA EH. As such, we need an
> option to force reset to be done, regardless of what any EH autopsy
> decides.
> 
> The SATA device FIS fields are set to indicate a device error from
> ata_eh_analyze_tf().
> 
> Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Suggested-by: Tested-by: Niklas Cassel <niklas.cassel@wdc.com>

Nit: is this perhaps a typo?
(Since there is another Tested-by tag later in the list.)

Checkpatch doesn't complain, so I guess no biggie,
but might be worth fixing if you decide to roll a v7.


Kind regards,
Niklas

> Signed-off-by: John Garry <john.garry@huawei.com>
> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Reviewed-by: Jason Yan <yanaijie@huawei.com>
> Tested-by: Niklas Cassel <niklas.cassel@wdc.com> # pm80xx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 1/8] scsi: libsas: Add sas_ata_device_link_abort()
  2022-10-17 10:52   ` Niklas Cassel
@ 2022-10-17 10:57     ` John Garry
  0 siblings, 0 replies; 17+ messages in thread
From: John Garry @ 2022-10-17 10:57 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: jejb, martin.petersen, jinpu.wang, damien.lemoal, linux-scsi,
	linux-kernel, linuxarm, yangxingui

On 17/10/2022 11:52, Niklas Cassel wrote:
>> The SATA device FIS fields are set to indicate a device error from
>> ata_eh_analyze_tf().
>>
>> Suggested-by: Damien Le Moal<damien.lemoal@opensource.wdc.com>
>> Suggested-by: Tested-by: Niklas Cassel<niklas.cassel@wdc.com>
> Nit: is this perhaps a typo?

Yeah, a copy and paste error.

> (Since there is another Tested-by tag later in the list.)
> 
> Checkpatch doesn't complain, so I guess no biggie,
> but might be worth fixing if you decide to roll a v7.

Sure

Thanks,
John

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/8] scsi: hisi_sas: Modify v3 HW SATA disk error state completion processing
  2022-10-17 10:45   ` Niklas Cassel
@ 2022-10-17 11:01     ` John Garry
  0 siblings, 0 replies; 17+ messages in thread
From: John Garry @ 2022-10-17 11:01 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: jejb, martin.petersen, jinpu.wang, damien.lemoal, linux-scsi,
	linux-kernel, linuxarm, yangxingui

On 17/10/2022 11:45, Niklas Cassel wrote:
> On Mon, Oct 17, 2022 at 05:20:31PM +0800, John Garry wrote:
>> From: Xingui Yang<yangxingui@huawei.com>
>>
>> When an NCQ error occurs, the  controller will abnormally complete the I/Os
>> that are newly delivered to disk, and bit8 in CQ dw3 will be set which
>> indicates that the SATA disk is in error state. The current processing flow
>> is to set ts->stat to SAS_OPEN_REJECT and then sas_ata_task_done() will
>> set FIS stat to ATA_ERR. After analyzing the IO by ata_eh_analyze_tf(),
>> err_mask will set to AC_ERR_HSM. If media error occurs for four times
>> within 10 minutes and the chip rejects new I/Os for four times, NCQ will
>> be disabled due to excessive errors, which is undesirable.
> With patch 8/8 in this series, will it still set err_mask to AC_ERR_HSM?
> If not, is this patch still needed with patch 8/8?
> 

I spoke to colleague Xingui Yang and he does not think that it is still 
required. But I was leaving it in as it still looked like a sensible 
change. I suppose I can drop it.

Thanks,
John

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 0/8] libsas and drivers: NCQ error handling
  2022-10-17  9:20 [PATCH v6 0/8] libsas and drivers: NCQ error handling John Garry
                   ` (7 preceding siblings ...)
  2022-10-17  9:20 ` [PATCH v6 8/8] scsi: libsas: Update SATA dev FIS in sas_ata_task_done() John Garry
@ 2022-10-18  2:40 ` Martin K. Petersen
  2022-10-22  3:52 ` Martin K. Petersen
  9 siblings, 0 replies; 17+ messages in thread
From: Martin K. Petersen @ 2022-10-18  2:40 UTC (permalink / raw)
  To: John Garry
  Cc: jejb, martin.petersen, jinpu.wang, damien.lemoal, linux-scsi,
	linux-kernel, linuxarm, yangxingui, niklas.cassel


John,

> As reported in [0], the pm8001 driver NCQ error handling more or less
> duplicates what libata does in link error handling, as follows:
> - abort all commands
> - do autopsy with read log ext 10 command
> - reset the target to recover, if necessary

Applied to 6.2/scsi-staging, thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 0/8] libsas and drivers: NCQ error handling
  2022-10-17  9:20 [PATCH v6 0/8] libsas and drivers: NCQ error handling John Garry
                   ` (8 preceding siblings ...)
  2022-10-18  2:40 ` [PATCH v6 0/8] libsas and drivers: NCQ error handling Martin K. Petersen
@ 2022-10-22  3:52 ` Martin K. Petersen
  9 siblings, 0 replies; 17+ messages in thread
From: Martin K. Petersen @ 2022-10-22  3:52 UTC (permalink / raw)
  To: jejb, jinpu.wang, John Garry, damien.lemoal
  Cc: Martin K . Petersen, linux-kernel, linuxarm, linux-scsi,
	niklas.cassel, yangxingui

On Mon, 17 Oct 2022 17:20:27 +0800, John Garry wrote:

> As reported in [0], the pm8001 driver NCQ error handling more or less
> duplicates what libata does in link error handling, as follows:
> - abort all commands
> - do autopsy with read log ext 10 command
> - reset the target to recover, if necessary
> 
> Indeed for the hisi_sas driver we want to add similar handling for NCQ
> errors.
> 
> [...]

Applied to 6.2/scsi-queue, thanks!

[1/8] scsi: libsas: Add sas_ata_device_link_abort()
      https://git.kernel.org/mkp/scsi/c/44112922674b
[2/8] scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task()
      https://git.kernel.org/mkp/scsi/c/4b329abc9180
[3/8] scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw
      https://git.kernel.org/mkp/scsi/c/930d97dabdd5
[4/8] scsi: hisi_sas: Modify v3 HW SATA disk error state completion processing
      https://git.kernel.org/mkp/scsi/c/4ef4f1a61555
[5/8] scsi: pm8001: Modify task abort handling for SATA task
      https://git.kernel.org/mkp/scsi/c/0b639decf651
[6/8] scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errors
      https://git.kernel.org/mkp/scsi/c/811be570a9a8
[7/8] scsi: libsas: Make sas_{alloc, alloc_slow, free}_task() private
      https://git.kernel.org/mkp/scsi/c/8e8d43642f2f
[8/8] scsi: libsas: Update SATA dev FIS in sas_ata_task_done()
      https://git.kernel.org/mkp/scsi/c/cc22efbec011

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-10-22  3:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-17  9:20 [PATCH v6 0/8] libsas and drivers: NCQ error handling John Garry
2022-10-17  9:20 ` [PATCH v6 1/8] scsi: libsas: Add sas_ata_device_link_abort() John Garry
2022-10-17 10:52   ` Niklas Cassel
2022-10-17 10:57     ` John Garry
2022-10-17  9:20 ` [PATCH v6 2/8] scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task() John Garry
2022-10-17  9:20 ` [PATCH v6 3/8] scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw John Garry
2022-10-17  9:20 ` [PATCH v6 4/8] scsi: hisi_sas: Modify v3 HW SATA disk error state completion processing John Garry
2022-10-17 10:45   ` Niklas Cassel
2022-10-17 11:01     ` John Garry
2022-10-17  9:20 ` [PATCH v6 5/8] scsi: pm8001: Modify task abort handling for SATA task John Garry
2022-10-17  9:20 ` [PATCH v6 6/8] scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errors John Garry
2022-10-17  9:01   ` Jinpu Wang
2022-10-17  9:20 ` [PATCH v6 7/8] scsi: libsas: Make sas_{alloc, alloc_slow, free}_task() private John Garry
2022-10-17  9:20 ` [PATCH v6 8/8] scsi: libsas: Update SATA dev FIS in sas_ata_task_done() John Garry
2022-10-17 10:41   ` Niklas Cassel
2022-10-18  2:40 ` [PATCH v6 0/8] libsas and drivers: NCQ error handling Martin K. Petersen
2022-10-22  3:52 ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).