linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: "geert@linux-m68k.org" <geert@linux-m68k.org>,
	"damien.lemoal@opensource.wdc.com"
	<damien.lemoal@opensource.wdc.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	"linux-renesas@vger.kernel.org" <linux-renesas@vger.kernel.org>
Subject: Re: [PATCH v9 3/5] libata: support concurrent positioning ranges log
Date: Thu, 4 Nov 2021 06:20:30 +0000	[thread overview]
Message-ID: <2c26ad6c724a49d287903ccb838bce8712ae9461.camel@wdc.com> (raw)
In-Reply-To: <CAMuHMdVyUDp1YMkq7e7tu30L=7U7-WV-Ota5KdMddUivUzt50Q@mail.gmail.com>

On Tue, 2021-11-02 at 15:02 +0100, Geert Uytterhoeven wrote:
> Hi Damien,
> 
> On Tue, Nov 2, 2021 at 12:42 PM Damien Le Moal
> <damien.lemoal@opensource.wdc.com> wrote:
> > On 2021/11/02 19:40, Geert Uytterhoeven wrote:
> > > On Wed, 27 Oct 2021, Damien Le Moal wrote:
> > > > Add support to discover if an ATA device supports the Concurrent
> > > > Positioning Ranges data log (address 0x47), indicating that the device
> > > > is capable of seeking to multiple different locations in parallel using
> > > > multiple actuators serving different LBA ranges.
> > > > 
> > > > Also add support to translate the concurrent positioning ranges log
> > > > into its equivalent Concurrent Positioning Ranges VPD page B9h in
> > > > libata-scsi.c.
> > > > 
> > > > The format of the Concurrent Positioning Ranges Log is defined in ACS-5
> > > > r9.
> > > > 
> > > > Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> > > 
> > > Thanks for your patch, which is now commit fe22e1c2f705676a ("libata:
> > > support concurrent positioning ranges log") upstream.
> > > 
> > > During resume from s2ram on Renesas Salvator-XS, I now see more scary
> > > messages than before:
> > > 
> > >       ata1: link resume succeeded after 1 retries
> > >       ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > >      +ata1.00: qc timeout (cmd 0x2f)
> > >      +ata1.00: Read log page 0x00 failed, Emask 0x4
> > >      +ata1.00: ATA Identify Device Log not supported
> > >      +ata1.00: failed to set xfermode (err_mask=0x40)
> > >       ata1: link resume succeeded after 1 retries
> > >       ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > >      +ata1.00: ATA Identify Device Log not supported
> > >      +ata1.00: ATA Identify Device Log not supported
> > >       ata1.00: configured for UDMA/133
> > > 
> > > I guess this is expected?
> > 
> > Nope, it is not. The problem is actually not the concurrent positioning log, or
> > any other log, being supported or not.
> > 
> > Notice the qc timeout ? On device scan after coming out of sleep, or even simply
> > doing a rmmod ahci+modprobe ahci, the read log commands issued during device
> > revalidate timeout fairly easily as they are issued while the drive is not
> > necessarilly fully restarted yet. These errors happen fairly easily due to the
> > command timeout setting in libata being too short, I think, for the "restart"
> > case. On a clean boot, they do not happen as longer timeouts are used in that case.
> > 
> > I identified this problem recently while testing stuff: I was doing rmmod of ata
> > modules and then modprobe of newly compiled modules for tests and noticed these
> > timeouts. Increasing the timeout values, they disappear. I am however still
> > scratching my head about the best way to address this. Still digging about this
> > to first make sure this is really about timeouts being set too short.
> 
> There's indeed something timing-related going on.  Sometimes I get
> during resume (s2idle or s2ram):
> 
>     ata1.00: qc timeout (cmd 0x2f)
>     ata1.00: Read log page 0x00 failed, Emask 0x4
>     ata1.00: ATA Identify Device Log not supported
>     ata1.00: failed to set xfermode (err_mask=0x40)
>     ata1.00: limiting speed to UDMA/133:PIO3
>     ata1: link resume succeeded after 1 retries
>     ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>     ata1.00: NODEV after polling detection
>     ata1.00: revalidation failed (errno=-2)
>     ata1.00: disabled
>     ata1: link resume succeeded after 1 retries
>     ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>     sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=0x04
> driverbyte=DRIVER_OK
>     sd 0:0:0:0: [sda] Read Capacity(16) failed: Result: hostbyte=0x04
> driverbyte=DRIVER_OK
>     sd 0:0:0:0: [sda] Sense not available.
>     sd 0:0:0:0: [sda] Read Capacity(10) failed: Result: hostbyte=0x04
> driverbyte=DRIVER_OK
>     sd 0:0:0:0: [sda] Sense not available.
>     sd 0:0:0:0: [sda] 0 512-byte logical blocks: (0 B/0 B)
>     sda: detected capacity change from 320173056 to 0
> 
> after which the drive is no longer functional...

Geert,

Could you try with the following patch added to see if the problem goes away ?


diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index bf9c4b6c5c3d..e53f4ea71d38 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -93,6 +93,12 @@ static const unsigned long ata_eh_identify_timeouts[] = {
 	ULONG_MAX,
 };
 
+static const unsigned long ata_eh_revalidate_timeouts[] = {
+	15000,	/* Some drives are slow to read log pages when waking-up */
+	15000,  /* combined time till here is enough even for media access */
+	ULONG_MAX,
+};
+
 static const unsigned long ata_eh_flush_timeouts[] = {
 	15000,	/* be generous with flush */
 	15000,  /* ditto */
@@ -129,16 +135,17 @@ static const struct ata_eh_cmd_timeout_ent
 ata_eh_cmd_timeout_table[ATA_EH_CMD_TIMEOUT_TABLE_SIZE] = {
 	{ .commands = CMDS(ATA_CMD_ID_ATA, ATA_CMD_ID_ATAPI),
 	  .timeouts = ata_eh_identify_timeouts, },
-	{ .commands = CMDS(ATA_CMD_READ_NATIVE_MAX,
ATA_CMD_READ_NATIVE_MAX_EXT),
-	  .timeouts = ata_eh_other_timeouts, },
-	{ .commands = CMDS(ATA_CMD_SET_MAX, ATA_CMD_SET_MAX_EXT),
-	  .timeouts = ata_eh_other_timeouts, },
-	{ .commands = CMDS(ATA_CMD_SET_FEATURES),
-	  .timeouts = ata_eh_other_timeouts, },
-	{ .commands = CMDS(ATA_CMD_INIT_DEV_PARAMS),
-	  .timeouts = ata_eh_other_timeouts, },
+	{ .commands = CMDS(ATA_CMD_READ_LOG_EXT, ATA_CMD_READ_LOG_DMA_EXT),
+	  .timeouts = ata_eh_revalidate_timeouts, },
 	{ .commands = CMDS(ATA_CMD_FLUSH, ATA_CMD_FLUSH_EXT),
 	  .timeouts = ata_eh_flush_timeouts },
+	{ .commands = CMDS(ATA_CMD_READ_NATIVE_MAX,	\
+			   ATA_CMD_READ_NATIVE_MAX_EXT,		\
+			   ATA_CMD_SET_MAX,			\
+			   ATA_CMD_SET_MAX_EXT,			\
+			   ATA_CMD_SET_FEATURES,		\
+			   ATA_CMD_INIT_DEV_PARAMS),
+	  .timeouts = ata_eh_other_timeouts, },
 };
 #undef CMDS
 
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 236ec689056a..f2b12057ffcf 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -394,7 +394,7 @@ enum {
 	/* This should match the actual table size of
 	 * ata_eh_cmd_timeout_table in libata-eh.c.
 	 */
-	ATA_EH_CMD_TIMEOUT_TABLE_SIZE = 6,
+	ATA_EH_CMD_TIMEOUT_TABLE_SIZE = 4,
 
 	/* Horkage types. May be set by libata or controller on drives
 	   (some horkage may be drive/controller pair dependent */


On my test box, I can reliably generate the same qc timeout errors you are
seeing by doing:

rmmod sd_mod ahci libahci libata
modprobe ahci

The first command will hard reset the drives (causing them to "reboot").
When the second command starts, revalidate is executed with the drives slow to
respond to read log commands. The patch adds an auto timeout for read log
commands, to set the timeout to 15s instead of the default 5s. With that, all
timeout errors disappear. Note that these timeout numbers are totally
arbitrary...


> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2021-11-04  6:20 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-27  2:22 [PATCH v9 0/5] Initial support for multi-actuator HDDs Damien Le Moal
2021-10-27  2:22 ` [PATCH v9 1/5] block: Add independent access ranges support Damien Le Moal
2021-10-27  2:22 ` [PATCH v9 2/5] scsi: sd: add concurrent positioning " Damien Le Moal
2021-10-27  2:22 ` [PATCH v9 3/5] libata: support concurrent positioning ranges log Damien Le Moal
2021-11-02 10:40   ` Geert Uytterhoeven
2021-11-02 11:42     ` Damien Le Moal
2021-11-02 14:02       ` Geert Uytterhoeven
2021-11-04  6:20         ` Damien Le Moal [this message]
2021-10-27  2:22 ` [PATCH v9 4/5] doc: document sysfs queue/independent_access_ranges attributes Damien Le Moal
2021-10-27  2:22 ` [PATCH v9 5/5] doc: Fix typo in request queue sysfs documentation Damien Le Moal
2021-10-27  2:37 ` (subset) [PATCH v9 0/5] Initial support for multi-actuator HDDs Jens Axboe
2021-10-27  2:38 ` Jens Axboe
2021-10-27  2:46   ` Damien Le Moal
2021-10-27  2:52     ` Martin K. Petersen
2021-10-27  2:53       ` Damien Le Moal
2021-10-27  2:49   ` Damien Le Moal
2021-10-27  3:03     ` Jens Axboe
2021-10-27  3:42       ` Damien Le Moal
2021-10-27  3:03 ` (subset) " Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c26ad6c724a49d287903ccb838bce8712ae9461.camel@wdc.com \
    --to=damien.lemoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=geert@linux-m68k.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-renesas@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).