[Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors
@ 2009-06-21 17:26 bugzilla-daemon
  2009-06-21 18:47 ` James Bottomley
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: bugzilla-daemon @ 2009-06-21 17:26 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594

           Summary: SMART responses for SATA disks on SAS get interpreted
                    as errors
           Product: IO/Storage
           Version: 2.5
    Kernel Version: 2.6.30-rc6
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: SCSI
        AssignedTo: linux-scsi@vger.kernel.org
        ReportedBy: sgunderson@bigfoot.com
        Regression: No

Hi,

I just bought a LSI SAS3081E-R which I use against a Supermicro backplane to
drive ten Seagate SATA disks (7200.11, 750GB and 1.5GB). I'm using the
standard Linux Fusion MPT device driver (CONFIG_FUSION_SAS) under Linux
2.6.30-rc6. Everything seems to work pretty well, with one exception: When I
use SMART against the drives (say, smartctl -a /dev/sda) the kernel complains
with:

  [  811.091916] sd 0:0:0:0: [sda] Sense Key : Recovered Error [current]
[descriptor]
  [  811.099807] Descriptor sense data with sense descriptors (in hex):
  [  811.106175]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
  [  811.113262]         00 4f 00 c2 00 50
  [  811.117379] sd 0:0:0:0: [sda] Add. Sense: ATA pass through information
available

I've tried upgrading to the newest firmware (1.28.02.00, 05-MAY-2009), but
all that changed is that the hex dump was added to the error message.

Whenever this happens, it appears like all the disks “hiccup” and the kernel
loses contact with the controller for a small while. If too many of these
happen at once, eventually disks start falling off RAIDs, and the entire
machine goes down. It looks to me as if these messages should simply not be
treated as errors by the kernel -- smartctl explicitly asks for a response even
if the command doesn't fail (by setting CK_COND), so the response probably
shouldn't be taken as an error.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
@ 2009-06-21 18:47 ` James Bottomley
  2009-06-21 18:55   ` James Bottomley
  2009-06-21 18:48 ` [Bug 13594] " bugzilla-daemon
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2009-06-21 18:47 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-scsi

On Sun, 2009-06-21 at 17:26 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
>            Summary: SMART responses for SATA disks on SAS get interpreted
>                     as errors
>            Product: IO/Storage
>            Version: 2.5
>     Kernel Version: 2.6.30-rc6
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: SCSI
>         AssignedTo: linux-scsi@vger.kernel.org
>         ReportedBy: sgunderson@bigfoot.com
>         Regression: No
> 
> 
> Hi,
> 
> I just bought a LSI SAS3081E-R which I use against a Supermicro backplane to
> drive ten Seagate SATA disks (7200.11, 750GB and 1.5GB). I'm using the
> standard Linux Fusion MPT device driver (CONFIG_FUSION_SAS) under Linux
> 2.6.30-rc6. Everything seems to work pretty well, with one exception: When I
> use SMART against the drives (say, smartctl -a /dev/sda) the kernel complains
> with:
> 
>   [  811.091916] sd 0:0:0:0: [sda] Sense Key : Recovered Error [current]
> [descriptor]
>   [  811.099807] Descriptor sense data with sense descriptors (in hex):
>   [  811.106175]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
>   [  811.113262]         00 4f 00 c2 00 50
>   [  811.117379] sd 0:0:0:0: [sda] Add. Sense: ATA pass through information
> available

This is a message the kernel prints out on all recovered error returns
(except those marked REQ_QUIET).  It's purely informational and doesn't
affect return processing of the command at all, so the kernel is
actually treating this as a successful completion not an error.

> I've tried upgrading to the newest firmware (1.28.02.00, 05-MAY-2009), but
> all that changed is that the hex dump was added to the error message.
> 
> Whenever this happens, it appears like all the disks “hiccup” and the kernel
> loses contact with the controller for a small while. If too many of these
> happen at once, eventually disks start falling off RAIDs, and the entire
> machine goes down. It looks to me as if these messages should simply not be
> treated as errors by the kernel -- smartctl explicitly asks for a response even
> if the command doesn't fail (by setting CK_COND), so the response probably
> shouldn't be taken as an error.

So this sounds like the bug ... however, for the LSI card, this bug will
be in the SAT layer in the fusion firmware.  I can shut the kernel up by
making the recovered error processing clause look for 01/00/1D as well
as REQ_QUIET, but it won't affect this problem.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
  2009-06-21 18:47 ` James Bottomley
@ 2009-06-21 18:48 ` bugzilla-daemon
  2009-06-21 18:55 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2009-06-21 18:48 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #1 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2009-06-21 18:47:59 ---
Reply-To: James.Bottomley@HansenPartnership.com

On Sun, 2009-06-21 at 17:26 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
>            Summary: SMART responses for SATA disks on SAS get interpreted
>                     as errors
>            Product: IO/Storage
>            Version: 2.5
>     Kernel Version: 2.6.30-rc6
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: SCSI
>         AssignedTo: linux-scsi@vger.kernel.org
>         ReportedBy: sgunderson@bigfoot.com
>         Regression: No
> 
> 
> Hi,
> 
> I just bought a LSI SAS3081E-R which I use against a Supermicro backplane to
> drive ten Seagate SATA disks (7200.11, 750GB and 1.5GB). I'm using the
> standard Linux Fusion MPT device driver (CONFIG_FUSION_SAS) under Linux
> 2.6.30-rc6. Everything seems to work pretty well, with one exception: When I
> use SMART against the drives (say, smartctl -a /dev/sda) the kernel complains
> with:
> 
>   [  811.091916] sd 0:0:0:0: [sda] Sense Key : Recovered Error [current]
> [descriptor]
>   [  811.099807] Descriptor sense data with sense descriptors (in hex):
>   [  811.106175]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
>   [  811.113262]         00 4f 00 c2 00 50
>   [  811.117379] sd 0:0:0:0: [sda] Add. Sense: ATA pass through information
> available

This is a message the kernel prints out on all recovered error returns
(except those marked REQ_QUIET).  It's purely informational and doesn't
affect return processing of the command at all, so the kernel is
actually treating this as a successful completion not an error.

> I've tried upgrading to the newest firmware (1.28.02.00, 05-MAY-2009), but
> all that changed is that the hex dump was added to the error message.
> 
> Whenever this happens, it appears like all the disks “hiccup” and the kernel
> loses contact with the controller for a small while. If too many of these
> happen at once, eventually disks start falling off RAIDs, and the entire
> machine goes down. It looks to me as if these messages should simply not be
> treated as errors by the kernel -- smartctl explicitly asks for a response even
> if the command doesn't fail (by setting CK_COND), so the response probably
> shouldn't be taken as an error.

So this sounds like the bug ... however, for the LSI card, this bug will
be in the SAT layer in the fusion firmware.  I can shut the kernel up by
making the recovered error processing clause look for 01/00/1D as well
as REQ_QUIET, but it won't affect this problem.

James

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 18:47 ` James Bottomley
@ 2009-06-21 18:55   ` James Bottomley
  0 siblings, 0 replies; 14+ messages in thread
From: James Bottomley @ 2009-06-21 18:55 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-scsi

On Sun, 2009-06-21 at 13:47 -0500, James Bottomley wrote:
> >   [  811.091916] sd 0:0:0:0: [sda] Sense Key : Recovered Error [current]
> > [descriptor]
> >   [  811.099807] Descriptor sense data with sense descriptors (in hex):
> >   [  811.106175]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> >   [  811.113262]         00 4f 00 c2 00 50
> >   [  811.117379] sd 0:0:0:0: [sda] Add. Sense: ATA pass through information
> > available
> 
> This is a message the kernel prints out on all recovered error returns
> (except those marked REQ_QUIET).  It's purely informational and doesn't
> affect return processing of the command at all, so the kernel is
> actually treating this as a successful completion not an error.
> 
> > I've tried upgrading to the newest firmware (1.28.02.00, 05-MAY-2009), but
> > all that changed is that the hex dump was added to the error message.
> > 
> > Whenever this happens, it appears like all the disks “hiccup” and the kernel
> > loses contact with the controller for a small while. If too many of these
> > happen at once, eventually disks start falling off RAIDs, and the entire
> > machine goes down. It looks to me as if these messages should simply not be
> > treated as errors by the kernel -- smartctl explicitly asks for a response even
> > if the command doesn't fail (by setting CK_COND), so the response probably
> > shouldn't be taken as an error.
> 
> So this sounds like the bug ... however, for the LSI card, this bug will
> be in the SAT layer in the fusion firmware.  I can shut the kernel up by
> making the recovered error processing clause look for 01/00/1D as well
> as REQ_QUIET, but it won't affect this problem.

Actually quieting the message is trivially easy, try this.

James

---

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f3c4089..a0235c9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -774,7 +774,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	 * is what gets returned to the user
 	 */
 	if (sense_valid && sshdr.sense_key == RECOVERED_ERROR) {
-		if (!(req->cmd_flags & REQ_QUIET))
+		if (!(req->cmd_flags & REQ_QUIET) &&
+		    !(sshdr.asc == 0x00 && sshdr.ascq == 0x1d))
 			scsi_print_sense("", cmd);
 		result = 0;
 		/* BLOCK_PC may have set error */


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
  2009-06-21 18:47 ` James Bottomley
  2009-06-21 18:48 ` [Bug 13594] " bugzilla-daemon
@ 2009-06-21 18:55 ` bugzilla-daemon
  2009-06-21 18:58 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2009-06-21 18:55 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #2 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2009-06-21 18:55:06 ---
Reply-To: James.Bottomley@HansenPartnership.com

On Sun, 2009-06-21 at 13:47 -0500, James Bottomley wrote:
> >   [  811.091916] sd 0:0:0:0: [sda] Sense Key : Recovered Error [current]
> > [descriptor]
> >   [  811.099807] Descriptor sense data with sense descriptors (in hex):
> >   [  811.106175]         72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
> >   [  811.113262]         00 4f 00 c2 00 50
> >   [  811.117379] sd 0:0:0:0: [sda] Add. Sense: ATA pass through information
> > available
> 
> This is a message the kernel prints out on all recovered error returns
> (except those marked REQ_QUIET).  It's purely informational and doesn't
> affect return processing of the command at all, so the kernel is
> actually treating this as a successful completion not an error.
> 
> > I've tried upgrading to the newest firmware (1.28.02.00, 05-MAY-2009), but
> > all that changed is that the hex dump was added to the error message.
> > 
> > Whenever this happens, it appears like all the disks “hiccup” and the kernel
> > loses contact with the controller for a small while. If too many of these
> > happen at once, eventually disks start falling off RAIDs, and the entire
> > machine goes down. It looks to me as if these messages should simply not be
> > treated as errors by the kernel -- smartctl explicitly asks for a response even
> > if the command doesn't fail (by setting CK_COND), so the response probably
> > shouldn't be taken as an error.
> 
> So this sounds like the bug ... however, for the LSI card, this bug will
> be in the SAT layer in the fusion firmware.  I can shut the kernel up by
> making the recovered error processing clause look for 01/00/1D as well
> as REQ_QUIET, but it won't affect this problem.

Actually quieting the message is trivially easy, try this.

James

---

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f3c4089..a0235c9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -774,7 +774,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int
good_bytes)
      * is what gets returned to the user
      */
     if (sense_valid && sshdr.sense_key == RECOVERED_ERROR) {
-        if (!(req->cmd_flags & REQ_QUIET))
+        if (!(req->cmd_flags & REQ_QUIET) &&
+            !(sshdr.asc == 0x00 && sshdr.ascq == 0x1d))
             scsi_print_sense("", cmd);
         result = 0;
         /* BLOCK_PC may have set error */

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
                   ` (2 preceding siblings ...)
  2009-06-21 18:55 ` bugzilla-daemon
@ 2009-06-21 18:58 ` bugzilla-daemon
  2009-06-21 19:07   ` James Bottomley
  2009-06-21 19:07 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: bugzilla-daemon @ 2009-06-21 18:58 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #3 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2009-06-21 18:58:28 ---
(In reply to comment #1)
> This is a message the kernel prints out on all recovered error returns
> (except those marked REQ_QUIET).  It's purely informational and doesn't
> affect return processing of the command at all, so the kernel is
> actually treating this as a successful completion not an error.

OK.

> So this sounds like the bug ... however, for the LSI card, this bug will
> be in the SAT layer in the fusion firmware.  I can shut the kernel up by
> making the recovered error processing clause look for 01/00/1D as well
> as REQ_QUIET, but it won't affect this problem.

I tried reporting this to the Linux fusionmpt driver people a while ago, but
never received any response (thus this bug)... I guess I'm out of luck, then,
if there's nothing that can be done for it in the kernel. It's a bit weird,
though; one would believe people ran smartd on their systems and discovered
this already.

/* Steinar */

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 18:58 ` bugzilla-daemon
@ 2009-06-21 19:07   ` James Bottomley
  0 siblings, 0 replies; 14+ messages in thread
From: James Bottomley @ 2009-06-21 19:07 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-scsi, Moore, Eric

On Sun, 2009-06-21 at 18:58 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
> 
> 
> 
> 
> --- Comment #3 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2009-06-21 18:58:28 ---
> (In reply to comment #1)
> > This is a message the kernel prints out on all recovered error returns
> > (except those marked REQ_QUIET).  It's purely informational and doesn't
> > affect return processing of the command at all, so the kernel is
> > actually treating this as a successful completion not an error.
> 
> OK.
> 
> > So this sounds like the bug ... however, for the LSI card, this bug will
> > be in the SAT layer in the fusion firmware.  I can shut the kernel up by
> > making the recovered error processing clause look for 01/00/1D as well
> > as REQ_QUIET, but it won't affect this problem.
> 
> I tried reporting this to the Linux fusionmpt driver people a while ago, but
> never received any response (thus this bug)... I guess I'm out of luck,

OK, cc'd LSI people, let's see if I get better luck

>  then,
> if there's nothing that can be done for it in the kernel. It's a bit weird,
> though; one would believe people ran smartd on their systems and discovered
> this already.

I can guess that it's some type of firmware mode problem: either it runs
for SMART or it runs for normal commands, hence the hiatus.  If that's
true, you'd likely only see the problem in a large disk setup ... it
might also be possible to work around by simply quiescing the card
before sending down SMART commands (that would be grossly inefficient,
but at least devices wouldn't get errored).

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
                   ` (3 preceding siblings ...)
  2009-06-21 18:58 ` bugzilla-daemon
@ 2009-06-21 19:07 ` bugzilla-daemon
  2009-06-21 20:53   ` Douglas Gilbert
  2009-06-21 20:53 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: bugzilla-daemon @ 2009-06-21 19:07 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #4 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2009-06-21 19:07:13 ---
Reply-To: James.Bottomley@HansenPartnership.com

On Sun, 2009-06-21 at 18:58 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
> 
> 
> 
> 
> --- Comment #3 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2009-06-21 18:58:28 ---
> (In reply to comment #1)
> > This is a message the kernel prints out on all recovered error returns
> > (except those marked REQ_QUIET).  It's purely informational and doesn't
> > affect return processing of the command at all, so the kernel is
> > actually treating this as a successful completion not an error.
> 
> OK.
> 
> > So this sounds like the bug ... however, for the LSI card, this bug will
> > be in the SAT layer in the fusion firmware.  I can shut the kernel up by
> > making the recovered error processing clause look for 01/00/1D as well
> > as REQ_QUIET, but it won't affect this problem.
> 
> I tried reporting this to the Linux fusionmpt driver people a while ago, but
> never received any response (thus this bug)... I guess I'm out of luck,

OK, cc'd LSI people, let's see if I get better luck

>  then,
> if there's nothing that can be done for it in the kernel. It's a bit weird,
> though; one would believe people ran smartd on their systems and discovered
> this already.

I can guess that it's some type of firmware mode problem: either it runs
for SMART or it runs for normal commands, hence the hiatus.  If that's
true, you'd likely only see the problem in a large disk setup ... it
might also be possible to work around by simply quiescing the card
before sending down SMART commands (that would be grossly inefficient,
but at least devices wouldn't get errored).

James

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 19:07 ` bugzilla-daemon
@ 2009-06-21 20:53   ` Douglas Gilbert
  2009-06-22 12:04     ` Matthew Wilcox
  0 siblings, 1 reply; 14+ messages in thread
From: Douglas Gilbert @ 2009-06-21 20:53 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-scsi

bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
> 
> 
> 
> 
> --- Comment #4 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2009-06-21 19:07:13 ---
> Reply-To: James.Bottomley@HansenPartnership.com
> 
> On Sun, 2009-06-21 at 18:58 +0000, bugzilla-daemon@bugzilla.kernel.org
> wrote:
>> http://bugzilla.kernel.org/show_bug.cgi?id=13594
>>
>>
>>
>>
>>
>> --- Comment #3 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2009-06-21 18:58:28 ---
>> (In reply to comment #1)
>>> This is a message the kernel prints out on all recovered error returns
>>> (except those marked REQ_QUIET).  It's purely informational and doesn't
>>> affect return processing of the command at all, so the kernel is
>>> actually treating this as a successful completion not an error.
>> OK.
>>
>>> So this sounds like the bug ... however, for the LSI card, this bug will
>>> be in the SAT layer in the fusion firmware.  I can shut the kernel up by
>>> making the recovered error processing clause look for 01/00/1D as well
>>> as REQ_QUIET, but it won't affect this problem.
>> I tried reporting this to the Linux fusionmpt driver people a while ago, but
>> never received any response (thus this bug)... I guess I'm out of luck,
> 
> OK, cc'd LSI people, let's see if I get better luck
> 
>>  then,
>> if there's nothing that can be done for it in the kernel. It's a bit weird,
>> though; one would believe people ran smartd on their systems and discovered
>> this already.
> 
> I can guess that it's some type of firmware mode problem: either it runs
> for SMART or it runs for normal commands, hence the hiatus.  If that's
> true, you'd likely only see the problem in a large disk setup ... it
> might also be possible to work around by simply quiescing the card
> before sending down SMART commands (that would be grossly inefficient,
> but at least devices wouldn't get errored).

I have just replicated the "ATA pass through information
available" message report on a similar vintage LSI
controller and a SATA disk with a recent smartctl
version.

There is no need to report this in the kernel error log,
as the smartmontools ATA pass-through (SCSI) command asked
for the final state of the ATA registers and the sense
buffer is the conduit for that information. That ASC/ASCQ
pair basically means "you asked for them and here they
are". [reference: sat2r07b.pdf section 12.2.5 table 107
when CK_COND is 1]

As for the hiccup, I have noticed that with SAS (SCSI)
disks from Seagate there is a curious sound and a pause
before the response to LOG SENSE SCSI command (the
type the smartmontools uses on SCSI disks).

Another annoyance is that the disk must be ready (i.e.
spun up) before MODE SENSE and LOG SENSE work, haven't
Seagate heard of flash :-)
SCSI standards permit that (i.e. only
a small number of commands have to work when the disk
is not ready) but you would think accessing metadata
given the disk has spun up once since power up could
be accomplished from RAM or flash.

Doug Gilbert

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
                   ` (4 preceding siblings ...)
  2009-06-21 19:07 ` bugzilla-daemon
@ 2009-06-21 20:53 ` bugzilla-daemon
  2009-06-21 21:14 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2009-06-21 20:53 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #5 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2009-06-21 20:53:36 ---
Reply-To: dgilbert@interlog.com

bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13594
> 
> 
> 
> 
> 
> --- Comment #4 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2009-06-21 19:07:13 ---
> Reply-To: James.Bottomley@HansenPartnership.com
> 
> On Sun, 2009-06-21 at 18:58 +0000, bugzilla-daemon@bugzilla.kernel.org
> wrote:
>> http://bugzilla.kernel.org/show_bug.cgi?id=13594
>>
>>
>>
>>
>>
>> --- Comment #3 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2009-06-21 18:58:28 ---
>> (In reply to comment #1)
>>> This is a message the kernel prints out on all recovered error returns
>>> (except those marked REQ_QUIET).  It's purely informational and doesn't
>>> affect return processing of the command at all, so the kernel is
>>> actually treating this as a successful completion not an error.
>> OK.
>>
>>> So this sounds like the bug ... however, for the LSI card, this bug will
>>> be in the SAT layer in the fusion firmware.  I can shut the kernel up by
>>> making the recovered error processing clause look for 01/00/1D as well
>>> as REQ_QUIET, but it won't affect this problem.
>> I tried reporting this to the Linux fusionmpt driver people a while ago, but
>> never received any response (thus this bug)... I guess I'm out of luck,
> 
> OK, cc'd LSI people, let's see if I get better luck
> 
>>  then,
>> if there's nothing that can be done for it in the kernel. It's a bit weird,
>> though; one would believe people ran smartd on their systems and discovered
>> this already.
> 
> I can guess that it's some type of firmware mode problem: either it runs
> for SMART or it runs for normal commands, hence the hiatus.  If that's
> true, you'd likely only see the problem in a large disk setup ... it
> might also be possible to work around by simply quiescing the card
> before sending down SMART commands (that would be grossly inefficient,
> but at least devices wouldn't get errored).

I have just replicated the "ATA pass through information
available" message report on a similar vintage LSI
controller and a SATA disk with a recent smartctl
version.

There is no need to report this in the kernel error log,
as the smartmontools ATA pass-through (SCSI) command asked
for the final state of the ATA registers and the sense
buffer is the conduit for that information. That ASC/ASCQ
pair basically means "you asked for them and here they
are". [reference: sat2r07b.pdf section 12.2.5 table 107
when CK_COND is 1]

As for the hiccup, I have noticed that with SAS (SCSI)
disks from Seagate there is a curious sound and a pause
before the response to LOG SENSE SCSI command (the
type the smartmontools uses on SCSI disks).

Another annoyance is that the disk must be ready (i.e.
spun up) before MODE SENSE and LOG SENSE work, haven't
Seagate heard of flash :-)
SCSI standards permit that (i.e. only
a small number of commands have to work when the disk
is not ready) but you would think accessing metadata
given the disk has spun up once since power up could
be accomplished from RAM or flash.

Doug Gilbert

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
                   ` (5 preceding siblings ...)
  2009-06-21 20:53 ` bugzilla-daemon
@ 2009-06-21 21:14 ` bugzilla-daemon
  2009-06-22 12:04 ` bugzilla-daemon
  2009-11-21  0:20 ` bugzilla-daemon
  8 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2009-06-21 21:14 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #6 from Steinar H. Gunderson <sgunderson@bigfoot.com>  2009-06-21 21:14:37 ---
On Sun, Jun 21, 2009 at 08:53:37PM +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> I have just replicated the "ATA pass through information
> available" message report on a similar vintage LSI
> controller and a SATA disk with a recent smartctl
> version.
> 
> There is no need to report this in the kernel error log,
> as the smartmontools ATA pass-through (SCSI) command asked
> for the final state of the ATA registers and the sense
> buffer is the conduit for that information. That ASC/ASCQ
> pair basically means "you asked for them and here they
> are". [reference: sat2r07b.pdf section 12.2.5 table 107
> when CK_COND is 1]

OK, this is basically what we agreed on already. I'm not able to
test the given patch right now, though (the machine is a production
machine).

> As for the hiccup, I have noticed that with SAS (SCSI)
> disks from Seagate there is a curious sound and a pause
> before the response to LOG SENSE SCSI command (the
> type the smartmontools uses on SCSI disks).

FWIW, I've used the same disks on SATA controllers with smartctl
without any problems. I'm not entirely sure how to parse your
message, though -- do you imply that the problem is in smartctl?
The disk?

/* Steinar */

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 20:53   ` Douglas Gilbert
@ 2009-06-22 12:04     ` Matthew Wilcox
  0 siblings, 0 replies; 14+ messages in thread
From: Matthew Wilcox @ 2009-06-22 12:04 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: bugzilla-daemon, linux-scsi

On Sun, Jun 21, 2009 at 04:53:29PM -0400, Douglas Gilbert wrote:
> As for the hiccup, I have noticed that with SAS (SCSI)
> disks from Seagate there is a curious sound and a pause
> before the response to LOG SENSE SCSI command (the
> type the smartmontools uses on SCSI disks).
>
> Another annoyance is that the disk must be ready (i.e.
> spun up) before MODE SENSE and LOG SENSE work, haven't
> Seagate heard of flash :-)
> SCSI standards permit that (i.e. only
> a small number of commands have to work when the disk
> is not ready) but you would think accessing metadata
> given the disk has spun up once since power up could
> be accomplished from RAM or flash.

We've experienced similar problems at Intel with an LSI card and Intel
SSDs (SATA, not SAS).  This issue got pushed into the 'investigate later'
category, as we were able to just disable smartd.  I'll try and get some
more information on this later.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
                   ` (6 preceding siblings ...)
  2009-06-21 21:14 ` bugzilla-daemon
@ 2009-06-22 12:04 ` bugzilla-daemon
  2009-11-21  0:20 ` bugzilla-daemon
  8 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2009-06-22 12:04 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594





--- Comment #7 from Matthew Wilcox <matthew@wil.cx>  2009-06-22 12:04:31 ---
On Sun, Jun 21, 2009 at 04:53:29PM -0400, Douglas Gilbert wrote:
> As for the hiccup, I have noticed that with SAS (SCSI)
> disks from Seagate there is a curious sound and a pause
> before the response to LOG SENSE SCSI command (the
> type the smartmontools uses on SCSI disks).
>
> Another annoyance is that the disk must be ready (i.e.
> spun up) before MODE SENSE and LOG SENSE work, haven't
> Seagate heard of flash :-)
> SCSI standards permit that (i.e. only
> a small number of commands have to work when the disk
> is not ready) but you would think accessing metadata
> given the disk has spun up once since power up could
> be accomplished from RAM or flash.

We've experienced similar problems at Intel with an LSI card and Intel
SSDs (SATA, not SAS).  This issue got pushed into the 'investigate later'
category, as we were able to just disable smartd.  I'll try and get some
more information on this later.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 13594] SMART responses for SATA disks on SAS get interpreted as errors
  2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
                   ` (7 preceding siblings ...)
  2009-06-22 12:04 ` bugzilla-daemon
@ 2009-11-21  0:20 ` bugzilla-daemon
  8 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2009-11-21  0:20 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=13594


Al Tobey <tobert@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tobert@gmail.com




--- Comment #8 from Al Tobey <tobert@gmail.com>  2009-11-21 00:20:30 ---
I get the same issue on LSI SAS2008 using the mpt2sas driver in 2.6.32-rc5.  
It wouldn't be a big deal, but it actually increments
/sys/block/$dev/device/ioerr_cnt, which I'd like to use for quick & dirty
checks for drives going south (I realize it's not perfect).

This occurs with both smartmontools 5.38-2+lenny1 as shipped with Debian 5 and
with a local backport of 5.38+svn2956 from experimental.

Trying smartctl -d scsi returns an outright failure. 

I can also reproduce with sg_sat_identify -c.

~$ sudo sg_sat_identify -c /dev/sg13
~$ dmesg |tail -n 5
sd 4:0:11:0: [sg13] Sense Key : Recovered Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00 
        00 00 00 00 00 00 
sd 4:0:11:0: [sg13] Add. Sense: ATA pass through information available

~$ cat /sys/block/sdm/device/ioerr_cnt
0x5

~$ sudo smartctl -d sat -q errorsonly -H /dev/sdm
smartctl 5.39 2009-10-10 r2955 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net

Warning! SMART Attribute Thresholds Structure error: invalid SMART checksum.
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

~$ cat /sys/block/sdm/device/ioerr_cnt
0x6

~$ cat /sys/class/scsi_host/host4/device_delay
00
~$ cat /sys/class/scsi_host/host4/version_fw
02.00.50.00
~$ cat /sys/class/scsi_host/host4/version_mpi
200.0b
~$ cat /sys/class/scsi_host/host4/version_product 
LSISAS2008
~$ cat /sys/class/scsi_host/host4/version_bios
07.01.01.00

~$ sudo sg_inq /dev/sg12
standard INQUIRY:
  PQual=0  Device_type=0  RMB=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=1  Resp_data_format=2
  SCCS=0  ACC=0  TGPS=0  3PC=0  Protect=0  BQue=0
  EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=0  Sync=0  Linked=0  [TranDis=0]  CmdQue=1
  [SPI: Clocking=0x0  QAS=0  IUS=0]
    length=74 (0x4a)   Peripheral device type: disk
 Vendor identification: ATA     
 Product identification: WDC WD2002FYPS-0
 Product revision level: 5G04
 Unit serial number:      WD-WCAVY0517841

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-11-21  0:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-21 17:26 [Bug 13594] New: SMART responses for SATA disks on SAS get interpreted as errors bugzilla-daemon
2009-06-21 18:47 ` James Bottomley
2009-06-21 18:55   ` James Bottomley
2009-06-21 18:48 ` [Bug 13594] " bugzilla-daemon
2009-06-21 18:55 ` bugzilla-daemon
2009-06-21 18:58 ` bugzilla-daemon
2009-06-21 19:07   ` James Bottomley
2009-06-21 19:07 ` bugzilla-daemon
2009-06-21 20:53   ` Douglas Gilbert
2009-06-22 12:04     ` Matthew Wilcox
2009-06-21 20:53 ` bugzilla-daemon
2009-06-21 21:14 ` bugzilla-daemon
2009-06-22 12:04 ` bugzilla-daemon
2009-11-21  0:20 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.