linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Damien Le Moal <damien.lemoal@opensource.wdc.com>,
	<jejb@linux.ibm.com>, <martin.petersen@oracle.com>,
	<jinpu.wang@cloud.ionos.com>, <yangxingui@huawei.com>
Cc: <linux-scsi@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linuxarm@huawei.com>, <hare@suse.de>
Subject: Re: [PATCH v2 2/6] scsi: libsas: Add sas_ata_device_link_abort()
Date: Thu, 18 Aug 2022 13:09:53 +0100	[thread overview]
Message-ID: <eb3465a2-335e-a605-ba8a-4cce790b5b02@huawei.com> (raw)
In-Reply-To: <baf63982-810c-85eb-b28f-99ab0517c6ba@opensource.wdc.com>

On 17/08/2022 18:14, Damien Le Moal wrote:
>> The ATA autopsy has a found medium error and decided that reset is not
>> required - this is similar in that regard to the "unaligned write to a
>> sequential write required zone on SMR" error you mentioned from your
>> test previously. The problem in this is that for hisi_sas we depend on
>> disk reset to release driver resources associated with ATA QCs. That is
>> because it is only after reset that we can guarantee that no IO
>> associated with the disk will complete in HW and it is safe to release
>> the resources.
> If you had an error, then you already are guaranteed that you will not see any
> completion at all since the SATA drive is in error mode already. But I see the
> point here. The HBA internal qc resources need to be cleared and that seems to
> be done only with a device reset big hammer.

Yeah, unfortunately

> 
>> But pm8001 seems different here with regards releasing resources. I find
>> that when EH kicks in from NCQ error and libsas tries to abort missing
>> commands, the pm8001_abort_task() -> sas_execute_internal_abort_single()
>> causes the original IO to complete as aborted - this is good, as then we
>> may release the resources there. hisi_sas has no such feature.
>>
>> But the pm8001 manual and current driver indicate that the
>> OPC_INB_SATA_ABORT command should be sent after read log ext when
>> handling NCQ error, regardless of an autopsy. I send OPC_INB_SATA_ABORT
>> in ata_eh_reset() -> pm8001_I_T_nexus_reset() -> pm8001_send_abort_all()
> You lost me: ata_eh_recover() will call ata_eh_reset() only if the ATA_EH_RESET
> action flag is set. So are you saying that even though it is not needed, you
> still need to set ATA_EH_RESET for pm8001 ?

As below, it was the only location I found suitable to call 
pm8001_send_abort_all().

However I am not really sure it is required now. For pm8001 NCQ error 
handling we require 2x steps:
- read log ext
- Send OPC_INB_SATA_ABORT - we do this in pm8001_send_abort_all()

pm8001_send_abort_all() sends OPC_INB_SATA_ABORT in "device abort all" 
mode, meaning any IO in the HBA is aborted for the device. But we are 
also earlier in EH sending OPC_INB_SATA_ABORT for individual IOs in 
sas_eh_handle_sas_errors() -> sas_scsi_find_task() -> 
pm8001_abort_task() -> sas_execute_internal_abort_single() -> ... 
send_abort_task()

So I don't think that the pm8001_send_abort_all() call has any effect, 
as we're already aborting any outstanding IO earlier.

Admittedly the order of the 2x steps is different, but 
OPC_INB_SATA_ABORT does not send any protocol message to the disk, so 
would not affect anything subsequently read with read log ext.

Having said all that, it may be wise to still send 
pm8001_send_abort_all()...

> 
>> As I mentioned before, I saw nowhere better to call
>> pm8001_send_abort_all() for this. I would rather not do it in
>> ata_eh_reset() -> pm8001_I_T_nexus_reset()
> We could add a new op ->eh_link_autopsy which we can call if defined after the
> call to ata_eh_analyze_ncq_error() in ata_eh_link_autopsy(). With that, you can
> set ATA_EH_RESET for the hisi driver and only do pm8001_send_abort_all() for
> pm8001 (that will be done after the read log 10h).

hmmmm.... seems unfortunate if we need to add a new op just for this.

If we supported ata_port_operations.softreset CB for libsas, then it 
seems a good location to issue pm8001_send_abort_all(). However, ATA EH 
always prefers hardreset over softreset if both supported - do you see 
any scope to change this so that we could use softreset?

> 
>> How about this modified approach:
>> - Continue to set ATA_EH_RESET in sas_ata_device_link_abort()
>> - pm8001_I_T_nexus_reset() will only call pm8001_send_abort_all() when
>> the driver is in NCQ error state and not do a hard reset in that case.
>> 	- But I am not sure if that works as the autopsy for NCQ error may have
>> decided that a hardreset was really required. Hmmm..
> See the above. the new op may decided a reset is needed (hisi case) and even if
> the standard autopsy does not make that decision, the flag is set and
> ata_eh_recover() will reset the device. For the pm8001 case, no reset set with
> the new op, only pm8001_send_abort_all(). And even if ata_eh_link_autopsy()
> decides that a reset is needed after calling the new op, that would still be OK
> I think.

I just think that for hisi_sas a reset is always required unfortunately 
- either an ATA softreset or hardreset. For pm8001, as you say, we can 
let autopsy decide whether the big hammer (hard) reset is required.

Thanks,
John

  reply	other threads:[~2022-08-18 12:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17 14:52 [PATCH v2 0/6] libsas and drivers: NCQ error handling John Garry
2022-08-17 14:52 ` [PATCH v2 1/6] scsi: pm8001: Modify task abort handling for SATA task John Garry
2022-08-17 14:52 ` [PATCH v2 2/6] scsi: libsas: Add sas_ata_device_link_abort() John Garry
2022-08-17 16:04   ` Damien Le Moal
2022-08-17 16:54     ` John Garry
2022-08-17 17:14       ` Damien Le Moal
2022-08-18 12:09         ` John Garry [this message]
2022-09-02 16:19           ` John Garry
2022-09-05 23:23             ` Damien Le Moal
2022-08-17 14:52 ` [PATCH v2 3/6] scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errors John Garry
2022-08-17 14:52 ` [PATCH v2 4/6] scsi: hisi_sas: Don't issue ATA softreset in hisi_sas_abort_task() John Garry
2022-08-17 14:52 ` [PATCH v2 5/6] scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw John Garry
2022-08-17 14:52 ` [PATCH v2 6/6] scsi: libsas: Make sas_{alloc, alloc_slow, free}_task() private John Garry
2022-08-17 15:52 ` [PATCH v2 0/6] libsas and drivers: NCQ error handling Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eb3465a2-335e-a605-ba8a-4cce790b5b02@huawei.com \
    --to=john.garry@huawei.com \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=hare@suse.de \
    --cc=jejb@linux.ibm.com \
    --cc=jinpu.wang@cloud.ionos.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=martin.petersen@oracle.com \
    --cc=yangxingui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).