All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG,REGRESSION] ARM: mvebu: SATA regression w/ 12.0-rc4 kernel
@ 2013-10-06 21:38 Arnaud Ebalard
  2013-10-07 12:59 ` Jason Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Arnaud Ebalard @ 2013-10-06 21:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

I was testing some code on my (Armada 370-based) ReadyNAS 102 and got
the following error while writing something on disk on *3.12-rc4* (also
happen on -rc3):

[  417.288155] ata1.00: exception Emask 0x0 SAct 0x1fff6001 SErr 0x0 action 0x6 frozen
[  417.295838] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.301097] ata1.00: cmd 61/48:00:80:ad:0b/00:00:0c:00:00/40 tag 0 ncq 36864 out
[  417.315896] ata1.00: status: { DRDY }
[  417.319570] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.324814] ata1.00: cmd 61/08:68:70:a1:87/00:00:0d:00:00/40 tag 13 ncq 4096 out
[  417.339619] ata1.00: status: { DRDY }
[  417.343288] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.348536] ata1.00: cmd 61/08:70:28:a2:87/00:00:0d:00:00/40 tag 14 ncq 4096 out
[  417.363341] ata1.00: status: { DRDY }
[  417.367010] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.372257] ata1.00: cmd 61/08:80:80:a3:87/00:00:0d:00:00/40 tag 16 ncq 4096 out
[  417.387061] ata1.00: status: { DRDY }
[  417.390733] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.395977] ata1.00: cmd 61/08:88:58:a1:c7/00:00:0d:00:00/40 tag 17 ncq 4096 out
[  417.410782] ata1.00: status: { DRDY }
[  417.414450] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.419697] ata1.00: cmd 61/08:90:48:a9:c7/00:00:0d:00:00/40 tag 18 ncq 4096 out
[  417.434501] ata1.00: status: { DRDY }
[  417.438173] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.443417] ata1.00: cmd 61/08:98:68:a9:c7/00:00:0d:00:00/40 tag 19 ncq 4096 out
[  417.458221] ata1.00: status: { DRDY }
[  417.461890] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.467134] ata1.00: cmd 61/08:a0:a0:aa:c7/00:00:0d:00:00/40 tag 20 ncq 4096 out
[  417.481940] ata1.00: status: { DRDY }
[  417.485609] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.490856] ata1.00: cmd 61/08:a8:70:ad:c7/00:00:0d:00:00/40 tag 21 ncq 4096 out
[  417.505660] ata1.00: status: { DRDY }
[  417.509332] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.514576] ata1.00: cmd 61/08:b0:78:b2:c7/00:00:0d:00:00/40 tag 22 ncq 4096 out
[  417.529383] ata1.00: status: { DRDY }
[  417.533051] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.538299] ata1.00: cmd 61/18:b8:90:a1:07/00:00:0e:00:00/40 tag 23 ncq 12288 out
[  417.553190] ata1.00: status: { DRDY }
[  417.556859] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.562106] ata1.00: cmd 61/08:c0:18:a2:07/00:00:0e:00:00/40 tag 24 ncq 4096 out
[  417.576910] ata1.00: status: { DRDY }
[  417.580582] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.585826] ata1.00: cmd 61/08:c8:48:a2:07/00:00:0e:00:00/40 tag 25 ncq 4096 out
[  417.600631] ata1.00: status: { DRDY }
[  417.604300] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.609546] ata1.00: cmd 61/10:d0:60:a2:07/00:00:0e:00:00/40 tag 26 ncq 8192 out
[  417.624351] ata1.00: status: { DRDY }
[  417.628020] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.633267] ata1.00: cmd 61/10:d8:b0:a5:07/00:00:0e:00:00/40 tag 27 ncq 8192 out
[  417.648071] ata1.00: status: { DRDY }
[  417.651743] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.656987] ata1.00: cmd 61/08:e0:10:a1:07/00:00:16:00:00/40 tag 28 ncq 4096 out
[  417.671791] ata1.00: status: { DRDY }
[  417.675466] ata1: hard resetting link
[  418.228117] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  418.336657] ata1.00: configured for UDMA/133
[  418.340954] ata1.00: device reported invalid CHS sector 0
[  418.346365] ata1.00: device reported invalid CHS sector 0
[  418.351779] ata1.00: device reported invalid CHS sector 0
[  418.357187] ata1.00: device reported invalid CHS sector 0
[  418.362599] ata1.00: device reported invalid CHS sector 0
[  418.368008] ata1.00: device reported invalid CHS sector 0
[  418.373419] ata1.00: device reported invalid CHS sector 0
[  418.378830] ata1.00: device reported invalid CHS sector 0
[  418.384238] ata1.00: device reported invalid CHS sector 0
[  418.389649] ata1.00: device reported invalid CHS sector 0
[  418.395057] ata1.00: device reported invalid CHS sector 0
[  418.400468] ata1.00: device reported invalid CHS sector 0
[  418.405876] ata1.00: device reported invalid CHS sector 0
[  418.411288] ata1.00: device reported invalid CHS sector 0
[  418.416696] ata1.00: device reported invalid CHS sector 0
[  418.422107] ata1.00: device reported invalid CHS sector 0
[  418.427533] ata1: EH complete
 
I though this was an hardware issue and replaced the disk by another
one with an already installed system (Debian armel for the former,
Debian armhf for the latter). After some minutes, the same kind of
problem occured. When asking smartctl, both disks have a PASSED
status. I rebooted on a 3.11.4 kernel and never got the issue.

Looking at commits touching other armada-370 .dts file, I do not think I
missed any specific changes so - I may be wrong - but it is possible
that what I get is also happening on other mvebu boards *which do use*
sata disks. Before trying and look where it comes from (sounds
promising) to put different subsystem maintainer in Cc:, it would be
good to know If I am the only one to get that and/or understand if it is
mvebu, sata or anything else related.

So if you have such a board with a sata disk connected, it would be nice
if you could give 3.12-rc4 a try and report what disk-related operations
produce.

If you have any other idea where it may come from, do not hesitate.

Cheers,

a+

ps: FWIW, here is what I get at boot in the log regarding sata config:

  libata version 3.00 loaded.
  ahci 0000:01:00.0: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
  ata1: SATA max UDMA/133 abar m512 at 0xe0010000 port 0xe0010100 irq 103
  ata2: SATA max UDMA/133 abar m512 at 0xe0010000 port 0xe0010180 irq 103
  sata_mv d00a0000.sata: version 1.28
  sata_mv d00a0000.sata: slots 32 ports 2
  scsi2 : sata_mv
  scsi3 : sata_mv
  ata3: SATA max UDMA/133 irq 23
  ata4: SATA max UDMA/133 irq 23
  ata2: SATA link down (SStatus 0 SControl 300)
  ata3: SATA link down (SStatus 0 SControl F300)
  ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata1.00: ATA-7: ST3250824AS, 3.ADH, max UDMA/133
  ata1.00: 488281250 sectors, multi 0: LBA48 NCQ (depth 31/32)
  ata1.00: configured for UDMA/133
  scsi 0:0:0:0: Direct-Access     ATA      ST3250824AS      3.AD PQ: 0 ANSI: 5
  ata4: SATA link down (SStatus 0 SControl F300)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [BUG,REGRESSION] ARM: mvebu: SATA regression w/ 12.0-rc4 kernel
  2013-10-06 21:38 [BUG,REGRESSION] ARM: mvebu: SATA regression w/ 12.0-rc4 kernel Arnaud Ebalard
@ 2013-10-07 12:59 ` Jason Cooper
  2013-10-07 19:12   ` [BUG,REGRESSION] SATA regression on " Arnaud Ebalard
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Cooper @ 2013-10-07 12:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Oct 06, 2013 at 11:38:46PM +0200, Arnaud Ebalard wrote:
> Hi,
> 
> I was testing some code on my (Armada 370-based) ReadyNAS 102 and got
> the following error while writing something on disk on *3.12-rc4* (also
> happen on -rc3):
> 
...
> I though this was an hardware issue and replaced the disk by another
> one with an already installed system (Debian armel for the former,
> Debian armhf for the latter). After some minutes, the same kind of
> problem occured. When asking smartctl, both disks have a PASSED
> status. I rebooted on a 3.11.4 kernel and never got the issue.

could you run a git bisect so that we can nail it down to a specific
commit?

thx,

Jason.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [BUG,REGRESSION] SATA regression on 12.0-rc4 kernel
  2013-10-07 12:59 ` Jason Cooper
@ 2013-10-07 19:12   ` Arnaud Ebalard
  2013-10-08  2:38       ` Robert Hancock
  0 siblings, 1 reply; 15+ messages in thread
From: Arnaud Ebalard @ 2013-10-07 19:12 UTC (permalink / raw)
  To: Marc Carino, Tejun Heo, linux-ide
  Cc: Andrew Lunn, Ezequiel Garcia, Jason Gunthorpe, linux-arm-kernel,
	Thomas Petazzoni, Gregory Clement, Sebastian Hesselbarth,
	willy tarreau, Jason Cooper

Hi guys,

yesterday, I reported on arm kernel mailing list what looked like a sata
regression on my platform (Marvell Armada 370-based NETGEAR ReadyNAS
102). I initially thought this was an ARM-related issue. My initial
email, provided below, contains various details on the platform and the
error encountered.

Today, before starting a painful git bisect, I decided to git log
sata_mv.c code and then more generally drivers/ata to quickly end up on
commit ed36911c747c (libata: Add support for SEND/RECEIVE FPDMA QUEUED)
against which I got suspicious after looking again at the errors I had:

[  417.288155] ata1.00: exception Emask 0x0 SAct 0x1fff6001 SErr 0x0 action 0x6 frozen
[  417.295838] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.301097] ata1.00: cmd 61/48:00:80:ad:0b/00:00:0c:00:00/40 tag 0 ncq 36864 out
[  417.315896] ata1.00: status: { DRDY }
[  417.319570] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.324814] ata1.00: cmd 61/08:68:70:a1:87/00:00:0d:00:00/40 tag 13 ncq 4096 out
[  417.339619] ata1.00: status: { DRDY }
[  417.343288] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.348536] ata1.00: cmd 61/08:70:28:a2:87/00:00:0d:00:00/40 tag 14 ncq 4096 out
[  417.363341] ata1.00: status: { DRDY }
[  417.367010] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.372257] ata1.00: cmd 61/08:80:80:a3:87/00:00:0d:00:00/40 tag 16 ncq 4096 out
[  417.387061] ata1.00: status: { DRDY }
[  417.390733] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.395977] ata1.00: cmd 61/08:88:58:a1:c7/00:00:0d:00:00/40 tag 17 ncq 4096 out
[  417.410782] ata1.00: status: { DRDY }

Reverting both 87fb6c31b9 (libata: Add support for queued DSM TRIM) and
ed36911c74 (libata: Add support for SEND/RECEIVE FPDMA QUEUED) makes the
problem disappear. Note: reverting 87fb6c31b9 is not enough and I cannot
compile the kernel with only the latter reverted.

If you need more info on the platform or want me to test something some
fix, do not hesitate.

Cheers,

a+

---------------->8---------------------------------------------------
Hi guys,

I was testing some code on my (Armada 370-based) ReadyNAS 102 and got
the following error while writing something on disk on *3.12-rc4* (also
happen on -rc3):

[  417.288155] ata1.00: exception Emask 0x0 SAct 0x1fff6001 SErr 0x0 action 0x6 frozen
[  417.295838] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.301097] ata1.00: cmd 61/48:00:80:ad:0b/00:00:0c:00:00/40 tag 0 ncq 36864 out
[  417.315896] ata1.00: status: { DRDY }
[  417.319570] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.324814] ata1.00: cmd 61/08:68:70:a1:87/00:00:0d:00:00/40 tag 13 ncq 4096 out
[  417.339619] ata1.00: status: { DRDY }
[  417.343288] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.348536] ata1.00: cmd 61/08:70:28:a2:87/00:00:0d:00:00/40 tag 14 ncq 4096 out
[  417.363341] ata1.00: status: { DRDY }
[  417.367010] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.372257] ata1.00: cmd 61/08:80:80:a3:87/00:00:0d:00:00/40 tag 16 ncq 4096 out
[  417.387061] ata1.00: status: { DRDY }
[  417.390733] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.395977] ata1.00: cmd 61/08:88:58:a1:c7/00:00:0d:00:00/40 tag 17 ncq 4096 out
[  417.410782] ata1.00: status: { DRDY }
[  417.414450] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.419697] ata1.00: cmd 61/08:90:48:a9:c7/00:00:0d:00:00/40 tag 18 ncq 4096 out
[  417.434501] ata1.00: status: { DRDY }
[  417.438173] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.443417] ata1.00: cmd 61/08:98:68:a9:c7/00:00:0d:00:00/40 tag 19 ncq 4096 out
[  417.458221] ata1.00: status: { DRDY }
[  417.461890] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.467134] ata1.00: cmd 61/08:a0:a0:aa:c7/00:00:0d:00:00/40 tag 20 ncq 4096 out
[  417.481940] ata1.00: status: { DRDY }
[  417.485609] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.490856] ata1.00: cmd 61/08:a8:70:ad:c7/00:00:0d:00:00/40 tag 21 ncq 4096 out
[  417.505660] ata1.00: status: { DRDY }
[  417.509332] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.514576] ata1.00: cmd 61/08:b0:78:b2:c7/00:00:0d:00:00/40 tag 22 ncq 4096 out
[  417.529383] ata1.00: status: { DRDY }
[  417.533051] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.538299] ata1.00: cmd 61/18:b8:90:a1:07/00:00:0e:00:00/40 tag 23 ncq 12288 out
[  417.553190] ata1.00: status: { DRDY }
[  417.556859] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.562106] ata1.00: cmd 61/08:c0:18:a2:07/00:00:0e:00:00/40 tag 24 ncq 4096 out
[  417.576910] ata1.00: status: { DRDY }
[  417.580582] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.585826] ata1.00: cmd 61/08:c8:48:a2:07/00:00:0e:00:00/40 tag 25 ncq 4096 out
[  417.600631] ata1.00: status: { DRDY }
[  417.604300] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.609546] ata1.00: cmd 61/10:d0:60:a2:07/00:00:0e:00:00/40 tag 26 ncq 8192 out
[  417.624351] ata1.00: status: { DRDY }
[  417.628020] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.633267] ata1.00: cmd 61/10:d8:b0:a5:07/00:00:0e:00:00/40 tag 27 ncq 8192 out
[  417.648071] ata1.00: status: { DRDY }
[  417.651743] ata1.00: failed command: WRITE FPDMA QUEUED
[  417.656987] ata1.00: cmd 61/08:e0:10:a1:07/00:00:16:00:00/40 tag 28 ncq 4096 out
[  417.671791] ata1.00: status: { DRDY }
[  417.675466] ata1: hard resetting link
[  418.228117] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  418.336657] ata1.00: configured for UDMA/133
[  418.340954] ata1.00: device reported invalid CHS sector 0
[  418.346365] ata1.00: device reported invalid CHS sector 0
[  418.351779] ata1.00: device reported invalid CHS sector 0
[  418.357187] ata1.00: device reported invalid CHS sector 0
[  418.362599] ata1.00: device reported invalid CHS sector 0
[  418.368008] ata1.00: device reported invalid CHS sector 0
[  418.373419] ata1.00: device reported invalid CHS sector 0
[  418.378830] ata1.00: device reported invalid CHS sector 0
[  418.384238] ata1.00: device reported invalid CHS sector 0
[  418.389649] ata1.00: device reported invalid CHS sector 0
[  418.395057] ata1.00: device reported invalid CHS sector 0
[  418.400468] ata1.00: device reported invalid CHS sector 0
[  418.405876] ata1.00: device reported invalid CHS sector 0
[  418.411288] ata1.00: device reported invalid CHS sector 0
[  418.416696] ata1.00: device reported invalid CHS sector 0
[  418.422107] ata1.00: device reported invalid CHS sector 0
[  418.427533] ata1: EH complete
 
I though this was an hardware issue and replaced the disk by another
one with an already installed system (Debian armel for the former,
Debian armhf for the latter). After some minutes, the same kind of
problem occured. When asking smartctl, both disks have a PASSED
status. I rebooted on a 3.11.4 kernel and never got the issue.

Looking at commits touching other armada-370 .dts file, I do not think I
missed any specific changes so - I may be wrong - but it is possible
that what I get is also happening on other mvebu boards *which do use*
sata disks. Before trying and look where it comes from (sounds
promising) to put different subsystem maintainer in Cc:, it would be
good to know If I am the only one to get that and/or understand if it is
mvebu, sata or anything else related.

So if you have such a board with a sata disk connected, it would be nice
if you could give 3.12-rc4 a try and report what disk-related operations
produce.

If you have any other idea where it may come from, do not hesitate.

Cheers,

a+

ps: FWIW, here is what I get at boot in the log regarding sata config:

  libata version 3.00 loaded.
  ahci 0000:01:00.0: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
  ata1: SATA max UDMA/133 abar m512@0xe0010000 port 0xe0010100 irq 103
  ata2: SATA max UDMA/133 abar m512@0xe0010000 port 0xe0010180 irq 103
  sata_mv d00a0000.sata: version 1.28
  sata_mv d00a0000.sata: slots 32 ports 2
  scsi2 : sata_mv
  scsi3 : sata_mv
  ata3: SATA max UDMA/133 irq 23
  ata4: SATA max UDMA/133 irq 23
  ata2: SATA link down (SStatus 0 SControl 300)
  ata3: SATA link down (SStatus 0 SControl F300)
  ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata1.00: ATA-7: ST3250824AS, 3.ADH, max UDMA/133
  ata1.00: 488281250 sectors, multi 0: LBA48 NCQ (depth 31/32)
  ata1.00: configured for UDMA/133
  scsi 0:0:0:0: Direct-Access     ATA      ST3250824AS      3.AD PQ: 0 ANSI: 5
  ata4: SATA link down (SStatus 0 SControl F300)

---------------->8---------------------------------------------------

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG,REGRESSION] SATA regression on 12.0-rc4 kernel
  2013-10-07 19:12   ` [BUG,REGRESSION] SATA regression on " Arnaud Ebalard
@ 2013-10-08  2:38       ` Robert Hancock
  0 siblings, 0 replies; 15+ messages in thread
From: Robert Hancock @ 2013-10-08  2:38 UTC (permalink / raw)
  To: Arnaud Ebalard
  Cc: Marc Carino, Tejun Heo, linux-ide, Andrew Lunn, Ezequiel Garcia,
	Jason Gunthorpe, linux-arm-kernel, Thomas Petazzoni,
	Gregory Clement, Sebastian Hesselbarth, willy tarreau,
	Jason Cooper

On 10/07/2013 01:12 PM, Arnaud Ebalard wrote:
> Hi guys,
>
> yesterday, I reported on arm kernel mailing list what looked like a sata
> regression on my platform (Marvell Armada 370-based NETGEAR ReadyNAS
> 102). I initially thought this was an ARM-related issue. My initial
> email, provided below, contains various details on the platform and the
> error encountered.
>
> Today, before starting a painful git bisect, I decided to git log
> sata_mv.c code and then more generally drivers/ata to quickly end up on
> commit ed36911c747c (libata: Add support for SEND/RECEIVE FPDMA QUEUED)
> against which I got suspicious after looking again at the errors I had:
>
> [  417.288155] ata1.00: exception Emask 0x0 SAct 0x1fff6001 SErr 0x0 action 0x6 frozen
> [  417.295838] ata1.00: failed command: WRITE FPDMA QUEUED
> [  417.301097] ata1.00: cmd 61/48:00:80:ad:0b/00:00:0c:00:00/40 tag 0 ncq 36864 out
> [  417.315896] ata1.00: status: { DRDY }
> [  417.319570] ata1.00: failed command: WRITE FPDMA QUEUED
> [  417.324814] ata1.00: cmd 61/08:68:70:a1:87/00:00:0d:00:00/40 tag 13 ncq 4096 out
> [  417.339619] ata1.00: status: { DRDY }
> [  417.343288] ata1.00: failed command: WRITE FPDMA QUEUED
> [  417.348536] ata1.00: cmd 61/08:70:28:a2:87/00:00:0d:00:00/40 tag 14 ncq 4096 out
> [  417.363341] ata1.00: status: { DRDY }
> [  417.367010] ata1.00: failed command: WRITE FPDMA QUEUED
> [  417.372257] ata1.00: cmd 61/08:80:80:a3:87/00:00:0d:00:00/40 tag 16 ncq 4096 out
> [  417.387061] ata1.00: status: { DRDY }
> [  417.390733] ata1.00: failed command: WRITE FPDMA QUEUED
> [  417.395977] ata1.00: cmd 61/08:88:58:a1:c7/00:00:0d:00:00/40 tag 17 ncq 4096 out
> [  417.410782] ata1.00: status: { DRDY }
>
> Reverting both 87fb6c31b9 (libata: Add support for queued DSM TRIM) and
> ed36911c74 (libata: Add support for SEND/RECEIVE FPDMA QUEUED) makes the
> problem disappear. Note: reverting 87fb6c31b9 is not enough and I cannot
> compile the kernel with only the latter reverted.
>
> If you need more info on the platform or want me to test something some
> fix, do not hesitate.

I assume that it consistently fails on a non-working kernel and 
consistently works with those patches reverted? Given that both of those 
patches seem to only be touching SSDs with NCQ trim support, it seems 
odd they would be breaking a normal hard drive, but maybe there is some 
unexpected side effect..

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [BUG,REGRESSION] SATA regression on 12.0-rc4 kernel
@ 2013-10-08  2:38       ` Robert Hancock
  0 siblings, 0 replies; 15+ messages in thread
From: Robert Hancock @ 2013-10-08  2:38 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/07/2013 01:12 PM, Arnaud Ebalard wrote:
> Hi guys,
>
> yesterday, I reported on arm kernel mailing list what looked like a sata
> regression on my platform (Marvell Armada 370-based NETGEAR ReadyNAS
> 102). I initially thought this was an ARM-related issue. My initial
> email, provided below, contains various details on the platform and the
> error encountered.
>
> Today, before starting a painful git bisect, I decided to git log
> sata_mv.c code and then more generally drivers/ata to quickly end up on
> commit ed36911c747c (libata: Add support for SEND/RECEIVE FPDMA QUEUED)
> against which I got suspicious after looking again at the errors I had:
>
> [  417.288155] ata1.00: exception Emask 0x0 SAct 0x1fff6001 SErr 0x0 action 0x6 frozen
> [  417.295838] ata1.00: failed command: WRITE FPDMA QUEUED
> [  417.301097] ata1.00: cmd 61/48:00:80:ad:0b/00:00:0c:00:00/40 tag 0 ncq 36864 out
> [  417.315896] ata1.00: status: { DRDY }
> [  417.319570] ata1.00: failed command: WRITE FPDMA QUEUED
> [  417.324814] ata1.00: cmd 61/08:68:70:a1:87/00:00:0d:00:00/40 tag 13 ncq 4096 out
> [  417.339619] ata1.00: status: { DRDY }
> [  417.343288] ata1.00: failed command: WRITE FPDMA QUEUED
> [  417.348536] ata1.00: cmd 61/08:70:28:a2:87/00:00:0d:00:00/40 tag 14 ncq 4096 out
> [  417.363341] ata1.00: status: { DRDY }
> [  417.367010] ata1.00: failed command: WRITE FPDMA QUEUED
> [  417.372257] ata1.00: cmd 61/08:80:80:a3:87/00:00:0d:00:00/40 tag 16 ncq 4096 out
> [  417.387061] ata1.00: status: { DRDY }
> [  417.390733] ata1.00: failed command: WRITE FPDMA QUEUED
> [  417.395977] ata1.00: cmd 61/08:88:58:a1:c7/00:00:0d:00:00/40 tag 17 ncq 4096 out
> [  417.410782] ata1.00: status: { DRDY }
>
> Reverting both 87fb6c31b9 (libata: Add support for queued DSM TRIM) and
> ed36911c74 (libata: Add support for SEND/RECEIVE FPDMA QUEUED) makes the
> problem disappear. Note: reverting 87fb6c31b9 is not enough and I cannot
> compile the kernel with only the latter reverted.
>
> If you need more info on the platform or want me to test something some
> fix, do not hesitate.

I assume that it consistently fails on a non-working kernel and 
consistently works with those patches reverted? Given that both of those 
patches seem to only be touching SSDs with NCQ trim support, it seems 
odd they would be breaking a normal hard drive, but maybe there is some 
unexpected side effect..

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG,REGRESSION] SATA regression on 12.0-rc4 kernel
  2013-10-08  2:38       ` Robert Hancock
@ 2013-10-08  6:10         ` Arnaud Ebalard
  -1 siblings, 0 replies; 15+ messages in thread
From: Arnaud Ebalard @ 2013-10-08  6:10 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Thomas Petazzoni, Andrew Lunn, Jason Cooper, linux-ide,
	Jason Gunthorpe, Marc Carino, Ezequiel Garcia, Tejun Heo,
	Gregory Clement, willy tarreau, linux-arm-kernel,
	Sebastian Hesselbarth

Hi Robert,

Robert Hancock <hancockrwd@gmail.com> writes:

> On 10/07/2013 01:12 PM, Arnaud Ebalard wrote:
>> Hi guys,
>>
>> yesterday, I reported on arm kernel mailing list what looked like a sata
>> regression on my platform (Marvell Armada 370-based NETGEAR ReadyNAS
>> 102). I initially thought this was an ARM-related issue. My initial
>> email, provided below, contains various details on the platform and the
>> error encountered.
>>
>> Today, before starting a painful git bisect, I decided to git log
>> sata_mv.c code and then more generally drivers/ata to quickly end up on
>> commit ed36911c747c (libata: Add support for SEND/RECEIVE FPDMA QUEUED)
>> against which I got suspicious after looking again at the errors I had:
>>
>> [  417.288155] ata1.00: exception Emask 0x0 SAct 0x1fff6001 SErr 0x0 action 0x6 frozen
>> [  417.295838] ata1.00: failed command: WRITE FPDMA QUEUED
>> [  417.301097] ata1.00: cmd 61/48:00:80:ad:0b/00:00:0c:00:00/40 tag 0 ncq 36864 out
>> [  417.315896] ata1.00: status: { DRDY }
>> [  417.319570] ata1.00: failed command: WRITE FPDMA QUEUED
>> [  417.324814] ata1.00: cmd 61/08:68:70:a1:87/00:00:0d:00:00/40 tag 13 ncq 4096 out
>> [  417.339619] ata1.00: status: { DRDY }
>> [  417.343288] ata1.00: failed command: WRITE FPDMA QUEUED
>> [  417.348536] ata1.00: cmd 61/08:70:28:a2:87/00:00:0d:00:00/40 tag 14 ncq 4096 out
>> [  417.363341] ata1.00: status: { DRDY }
>> [  417.367010] ata1.00: failed command: WRITE FPDMA QUEUED
>> [  417.372257] ata1.00: cmd 61/08:80:80:a3:87/00:00:0d:00:00/40 tag 16 ncq 4096 out
>> [  417.387061] ata1.00: status: { DRDY }
>> [  417.390733] ata1.00: failed command: WRITE FPDMA QUEUED
>> [  417.395977] ata1.00: cmd 61/08:88:58:a1:c7/00:00:0d:00:00/40 tag 17 ncq 4096 out
>> [  417.410782] ata1.00: status: { DRDY }
>>
>> Reverting both 87fb6c31b9 (libata: Add support for queued DSM TRIM) and
>> ed36911c74 (libata: Add support for SEND/RECEIVE FPDMA QUEUED) makes the
>> problem disappear. Note: reverting 87fb6c31b9 is not enough and I cannot
>> compile the kernel with only the latter reverted.
>>
>> If you need more info on the platform or want me to test something some
>> fix, do not hesitate.
>
> I assume that it consistently fails on a non-working kernel and
> consistently works with those patches reverted? Given that both of
> those patches seem to only be touching SSDs with NCQ trim support, it
> seems odd they would be breaking a normal hard drive, but maybe there
> is some unexpected side effect..

With two different disks (same model though, i.e. 250GB 3.5" WD blue), it
consistently works on a 3.11.4 and consistently fails on 3.12-rc3 and
3.12-rc4 (not tested others 3.12-rc). The problem is easy to reproduce,
i.e. I just need to perform some disk operations. With the two commits
reverted from 3.12-rc4, I can consistently do a "find / -exec sha256sum
'{}' \;" w/o anything happening.

What I do not understand is why the log report failed FPDMA commands if
the feature is supposed to be SSD-related (looking only at commit
messages: 87fb6c31b9 seems SSD-related, ed36911c74 does not). Is it
possible that the feature detection is what is causing the issue? Or
that the hardware report support w/o having? I can test with a different
disk if you think it would help.

Cheers,

a+

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [BUG,REGRESSION] SATA regression on 12.0-rc4 kernel
@ 2013-10-08  6:10         ` Arnaud Ebalard
  0 siblings, 0 replies; 15+ messages in thread
From: Arnaud Ebalard @ 2013-10-08  6:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robert,

Robert Hancock <hancockrwd@gmail.com> writes:

> On 10/07/2013 01:12 PM, Arnaud Ebalard wrote:
>> Hi guys,
>>
>> yesterday, I reported on arm kernel mailing list what looked like a sata
>> regression on my platform (Marvell Armada 370-based NETGEAR ReadyNAS
>> 102). I initially thought this was an ARM-related issue. My initial
>> email, provided below, contains various details on the platform and the
>> error encountered.
>>
>> Today, before starting a painful git bisect, I decided to git log
>> sata_mv.c code and then more generally drivers/ata to quickly end up on
>> commit ed36911c747c (libata: Add support for SEND/RECEIVE FPDMA QUEUED)
>> against which I got suspicious after looking again at the errors I had:
>>
>> [  417.288155] ata1.00: exception Emask 0x0 SAct 0x1fff6001 SErr 0x0 action 0x6 frozen
>> [  417.295838] ata1.00: failed command: WRITE FPDMA QUEUED
>> [  417.301097] ata1.00: cmd 61/48:00:80:ad:0b/00:00:0c:00:00/40 tag 0 ncq 36864 out
>> [  417.315896] ata1.00: status: { DRDY }
>> [  417.319570] ata1.00: failed command: WRITE FPDMA QUEUED
>> [  417.324814] ata1.00: cmd 61/08:68:70:a1:87/00:00:0d:00:00/40 tag 13 ncq 4096 out
>> [  417.339619] ata1.00: status: { DRDY }
>> [  417.343288] ata1.00: failed command: WRITE FPDMA QUEUED
>> [  417.348536] ata1.00: cmd 61/08:70:28:a2:87/00:00:0d:00:00/40 tag 14 ncq 4096 out
>> [  417.363341] ata1.00: status: { DRDY }
>> [  417.367010] ata1.00: failed command: WRITE FPDMA QUEUED
>> [  417.372257] ata1.00: cmd 61/08:80:80:a3:87/00:00:0d:00:00/40 tag 16 ncq 4096 out
>> [  417.387061] ata1.00: status: { DRDY }
>> [  417.390733] ata1.00: failed command: WRITE FPDMA QUEUED
>> [  417.395977] ata1.00: cmd 61/08:88:58:a1:c7/00:00:0d:00:00/40 tag 17 ncq 4096 out
>> [  417.410782] ata1.00: status: { DRDY }
>>
>> Reverting both 87fb6c31b9 (libata: Add support for queued DSM TRIM) and
>> ed36911c74 (libata: Add support for SEND/RECEIVE FPDMA QUEUED) makes the
>> problem disappear. Note: reverting 87fb6c31b9 is not enough and I cannot
>> compile the kernel with only the latter reverted.
>>
>> If you need more info on the platform or want me to test something some
>> fix, do not hesitate.
>
> I assume that it consistently fails on a non-working kernel and
> consistently works with those patches reverted? Given that both of
> those patches seem to only be touching SSDs with NCQ trim support, it
> seems odd they would be breaking a normal hard drive, but maybe there
> is some unexpected side effect..

With two different disks (same model though, i.e. 250GB 3.5" WD blue), it
consistently works on a 3.11.4 and consistently fails on 3.12-rc3 and
3.12-rc4 (not tested others 3.12-rc). The problem is easy to reproduce,
i.e. I just need to perform some disk operations. With the two commits
reverted from 3.12-rc4, I can consistently do a "find / -exec sha256sum
'{}' \;" w/o anything happening.

What I do not understand is why the log report failed FPDMA commands if
the feature is supposed to be SSD-related (looking only at commit
messages: 87fb6c31b9 seems SSD-related, ed36911c74 does not). Is it
possible that the feature detection is what is causing the issue? Or
that the hardware report support w/o having? I can test with a different
disk if you think it would help.

Cheers,

a+

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG,REGRESSION] SATA regression on 12.0-rc4 kernel
  2013-10-08  6:10         ` Arnaud Ebalard
@ 2013-10-09  5:50           ` Robert Hancock
  -1 siblings, 0 replies; 15+ messages in thread
From: Robert Hancock @ 2013-10-09  5:50 UTC (permalink / raw)
  To: Arnaud Ebalard
  Cc: Marc Carino, Tejun Heo, linux-ide, Andrew Lunn, Ezequiel Garcia,
	Jason Gunthorpe, linux-arm-kernel, Thomas Petazzoni,
	Gregory Clement, Sebastian Hesselbarth, willy tarreau,
	Jason Cooper

On Tue, Oct 8, 2013 at 12:10 AM, Arnaud Ebalard <arno@natisbad.org> wrote:
> Hi Robert,
>
> Robert Hancock <hancockrwd@gmail.com> writes:
>
>> On 10/07/2013 01:12 PM, Arnaud Ebalard wrote:
>>> Hi guys,
>>>
>>> yesterday, I reported on arm kernel mailing list what looked like a sata
>>> regression on my platform (Marvell Armada 370-based NETGEAR ReadyNAS
>>> 102). I initially thought this was an ARM-related issue. My initial
>>> email, provided below, contains various details on the platform and the
>>> error encountered.
>>>
>>> Today, before starting a painful git bisect, I decided to git log
>>> sata_mv.c code and then more generally drivers/ata to quickly end up on
>>> commit ed36911c747c (libata: Add support for SEND/RECEIVE FPDMA QUEUED)
>>> against which I got suspicious after looking again at the errors I had:
>>>
>>> [  417.288155] ata1.00: exception Emask 0x0 SAct 0x1fff6001 SErr 0x0 action 0x6 frozen
>>> [  417.295838] ata1.00: failed command: WRITE FPDMA QUEUED
>>> [  417.301097] ata1.00: cmd 61/48:00:80:ad:0b/00:00:0c:00:00/40 tag 0 ncq 36864 out
>>> [  417.315896] ata1.00: status: { DRDY }
>>> [  417.319570] ata1.00: failed command: WRITE FPDMA QUEUED
>>> [  417.324814] ata1.00: cmd 61/08:68:70:a1:87/00:00:0d:00:00/40 tag 13 ncq 4096 out
>>> [  417.339619] ata1.00: status: { DRDY }
>>> [  417.343288] ata1.00: failed command: WRITE FPDMA QUEUED
>>> [  417.348536] ata1.00: cmd 61/08:70:28:a2:87/00:00:0d:00:00/40 tag 14 ncq 4096 out
>>> [  417.363341] ata1.00: status: { DRDY }
>>> [  417.367010] ata1.00: failed command: WRITE FPDMA QUEUED
>>> [  417.372257] ata1.00: cmd 61/08:80:80:a3:87/00:00:0d:00:00/40 tag 16 ncq 4096 out
>>> [  417.387061] ata1.00: status: { DRDY }
>>> [  417.390733] ata1.00: failed command: WRITE FPDMA QUEUED
>>> [  417.395977] ata1.00: cmd 61/08:88:58:a1:c7/00:00:0d:00:00/40 tag 17 ncq 4096 out
>>> [  417.410782] ata1.00: status: { DRDY }
>>>
>>> Reverting both 87fb6c31b9 (libata: Add support for queued DSM TRIM) and
>>> ed36911c74 (libata: Add support for SEND/RECEIVE FPDMA QUEUED) makes the
>>> problem disappear. Note: reverting 87fb6c31b9 is not enough and I cannot
>>> compile the kernel with only the latter reverted.
>>>
>>> If you need more info on the platform or want me to test something some
>>> fix, do not hesitate.
>>
>> I assume that it consistently fails on a non-working kernel and
>> consistently works with those patches reverted? Given that both of
>> those patches seem to only be touching SSDs with NCQ trim support, it
>> seems odd they would be breaking a normal hard drive, but maybe there
>> is some unexpected side effect..
>
> With two different disks (same model though, i.e. 250GB 3.5" WD blue), it
> consistently works on a 3.11.4 and consistently fails on 3.12-rc3 and
> 3.12-rc4 (not tested others 3.12-rc). The problem is easy to reproduce,
> i.e. I just need to perform some disk operations. With the two commits
> reverted from 3.12-rc4, I can consistently do a "find / -exec sha256sum
> '{}' \;" w/o anything happening.
>
> What I do not understand is why the log report failed FPDMA commands if
> the feature is supposed to be SSD-related (looking only at commit
> messages: 87fb6c31b9 seems SSD-related, ed36911c74 does not). Is it
> possible that the feature detection is what is causing the issue? Or
> that the hardware report support w/o having? I can test with a different
> disk if you think it would help.

The commands that are failing are WRITE FPDMA QUEUED which is a
regular NCQ write command. The ones that these commits add support for
are FPDMA_SEND and FPDMA_RECV which are used for NCQ trim commands.

It's possible that the feature detection for this is picking up
support for FPDMA SEND/RECV on this drive when it shouldn't be. Can
you post the output of "hdparm --Istdout /dev/sdX" for one of these
drives (where X matches the drive in question)?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [BUG,REGRESSION] SATA regression on 12.0-rc4 kernel
@ 2013-10-09  5:50           ` Robert Hancock
  0 siblings, 0 replies; 15+ messages in thread
From: Robert Hancock @ 2013-10-09  5:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 8, 2013 at 12:10 AM, Arnaud Ebalard <arno@natisbad.org> wrote:
> Hi Robert,
>
> Robert Hancock <hancockrwd@gmail.com> writes:
>
>> On 10/07/2013 01:12 PM, Arnaud Ebalard wrote:
>>> Hi guys,
>>>
>>> yesterday, I reported on arm kernel mailing list what looked like a sata
>>> regression on my platform (Marvell Armada 370-based NETGEAR ReadyNAS
>>> 102). I initially thought this was an ARM-related issue. My initial
>>> email, provided below, contains various details on the platform and the
>>> error encountered.
>>>
>>> Today, before starting a painful git bisect, I decided to git log
>>> sata_mv.c code and then more generally drivers/ata to quickly end up on
>>> commit ed36911c747c (libata: Add support for SEND/RECEIVE FPDMA QUEUED)
>>> against which I got suspicious after looking again at the errors I had:
>>>
>>> [  417.288155] ata1.00: exception Emask 0x0 SAct 0x1fff6001 SErr 0x0 action 0x6 frozen
>>> [  417.295838] ata1.00: failed command: WRITE FPDMA QUEUED
>>> [  417.301097] ata1.00: cmd 61/48:00:80:ad:0b/00:00:0c:00:00/40 tag 0 ncq 36864 out
>>> [  417.315896] ata1.00: status: { DRDY }
>>> [  417.319570] ata1.00: failed command: WRITE FPDMA QUEUED
>>> [  417.324814] ata1.00: cmd 61/08:68:70:a1:87/00:00:0d:00:00/40 tag 13 ncq 4096 out
>>> [  417.339619] ata1.00: status: { DRDY }
>>> [  417.343288] ata1.00: failed command: WRITE FPDMA QUEUED
>>> [  417.348536] ata1.00: cmd 61/08:70:28:a2:87/00:00:0d:00:00/40 tag 14 ncq 4096 out
>>> [  417.363341] ata1.00: status: { DRDY }
>>> [  417.367010] ata1.00: failed command: WRITE FPDMA QUEUED
>>> [  417.372257] ata1.00: cmd 61/08:80:80:a3:87/00:00:0d:00:00/40 tag 16 ncq 4096 out
>>> [  417.387061] ata1.00: status: { DRDY }
>>> [  417.390733] ata1.00: failed command: WRITE FPDMA QUEUED
>>> [  417.395977] ata1.00: cmd 61/08:88:58:a1:c7/00:00:0d:00:00/40 tag 17 ncq 4096 out
>>> [  417.410782] ata1.00: status: { DRDY }
>>>
>>> Reverting both 87fb6c31b9 (libata: Add support for queued DSM TRIM) and
>>> ed36911c74 (libata: Add support for SEND/RECEIVE FPDMA QUEUED) makes the
>>> problem disappear. Note: reverting 87fb6c31b9 is not enough and I cannot
>>> compile the kernel with only the latter reverted.
>>>
>>> If you need more info on the platform or want me to test something some
>>> fix, do not hesitate.
>>
>> I assume that it consistently fails on a non-working kernel and
>> consistently works with those patches reverted? Given that both of
>> those patches seem to only be touching SSDs with NCQ trim support, it
>> seems odd they would be breaking a normal hard drive, but maybe there
>> is some unexpected side effect..
>
> With two different disks (same model though, i.e. 250GB 3.5" WD blue), it
> consistently works on a 3.11.4 and consistently fails on 3.12-rc3 and
> 3.12-rc4 (not tested others 3.12-rc). The problem is easy to reproduce,
> i.e. I just need to perform some disk operations. With the two commits
> reverted from 3.12-rc4, I can consistently do a "find / -exec sha256sum
> '{}' \;" w/o anything happening.
>
> What I do not understand is why the log report failed FPDMA commands if
> the feature is supposed to be SSD-related (looking only at commit
> messages: 87fb6c31b9 seems SSD-related, ed36911c74 does not). Is it
> possible that the feature detection is what is causing the issue? Or
> that the hardware report support w/o having? I can test with a different
> disk if you think it would help.

The commands that are failing are WRITE FPDMA QUEUED which is a
regular NCQ write command. The ones that these commits add support for
are FPDMA_SEND and FPDMA_RECV which are used for NCQ trim commands.

It's possible that the feature detection for this is picking up
support for FPDMA SEND/RECV on this drive when it shouldn't be. Can
you post the output of "hdparm --Istdout /dev/sdX" for one of these
drives (where X matches the drive in question)?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG,REGRESSION] SATA regression on 12.0-rc4 kernel
  2013-10-09  5:50           ` Robert Hancock
@ 2013-10-09  8:40             ` Arnaud Ebalard
  -1 siblings, 0 replies; 15+ messages in thread
From: Arnaud Ebalard @ 2013-10-09  8:40 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Marc Carino, Tejun Heo, linux-ide, Andrew Lunn, Ezequiel Garcia,
	Jason Gunthorpe, linux-arm-kernel, Thomas Petazzoni,
	Gregory Clement, Sebastian Hesselbarth, willy tarreau,
	Jason Cooper


Robert Hancock <hancockrwd@gmail.com> writes:

>> With two different disks (same model though, i.e. 250GB 3.5" WD blue), it
>> consistently works on a 3.11.4 and consistently fails on 3.12-rc3 and
>> 3.12-rc4 (not tested others 3.12-rc). The problem is easy to reproduce,
>> i.e. I just need to perform some disk operations. With the two commits
>> reverted from 3.12-rc4, I can consistently do a "find / -exec sha256sum
>> '{}' \;" w/o anything happening.
>>
>> What I do not understand is why the log report failed FPDMA commands if
>> the feature is supposed to be SSD-related (looking only at commit
>> messages: 87fb6c31b9 seems SSD-related, ed36911c74 does not). Is it
>> possible that the feature detection is what is causing the issue? Or
>> that the hardware report support w/o having? I can test with a different
>> disk if you think it would help.
>
> The commands that are failing are WRITE FPDMA QUEUED which is a
> regular NCQ write command. The ones that these commits add support for
> are FPDMA_SEND and FPDMA_RECV which are used for NCQ trim commands.
>
> It's possible that the feature detection for this is picking up
> support for FPDMA SEND/RECV on this drive when it shouldn't be. Can
> you post the output of "hdparm --Istdout /dev/sdX" for one of these
> drives (where X matches the drive in question)?

Will do that tonight and post the output.

a+

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [BUG,REGRESSION] SATA regression on 12.0-rc4 kernel
@ 2013-10-09  8:40             ` Arnaud Ebalard
  0 siblings, 0 replies; 15+ messages in thread
From: Arnaud Ebalard @ 2013-10-09  8:40 UTC (permalink / raw)
  To: linux-arm-kernel


Robert Hancock <hancockrwd@gmail.com> writes:

>> With two different disks (same model though, i.e. 250GB 3.5" WD blue), it
>> consistently works on a 3.11.4 and consistently fails on 3.12-rc3 and
>> 3.12-rc4 (not tested others 3.12-rc). The problem is easy to reproduce,
>> i.e. I just need to perform some disk operations. With the two commits
>> reverted from 3.12-rc4, I can consistently do a "find / -exec sha256sum
>> '{}' \;" w/o anything happening.
>>
>> What I do not understand is why the log report failed FPDMA commands if
>> the feature is supposed to be SSD-related (looking only at commit
>> messages: 87fb6c31b9 seems SSD-related, ed36911c74 does not). Is it
>> possible that the feature detection is what is causing the issue? Or
>> that the hardware report support w/o having? I can test with a different
>> disk if you think it would help.
>
> The commands that are failing are WRITE FPDMA QUEUED which is a
> regular NCQ write command. The ones that these commits add support for
> are FPDMA_SEND and FPDMA_RECV which are used for NCQ trim commands.
>
> It's possible that the feature detection for this is picking up
> support for FPDMA SEND/RECV on this drive when it shouldn't be. Can
> you post the output of "hdparm --Istdout /dev/sdX" for one of these
> drives (where X matches the drive in question)?

Will do that tonight and post the output.

a+

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG,REGRESSION] SATA regression on 12.0-rc4 kernel
  2013-10-09  5:50           ` Robert Hancock
@ 2013-10-09 15:22             ` Marc (Marc-Angelo) Carino
  -1 siblings, 0 replies; 15+ messages in thread
From: Marc (Marc-Angelo) Carino @ 2013-10-09 15:22 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Arnaud Ebalard, Marc Carino, Tejun Heo, linux-ide, Andrew Lunn,
	Ezequiel Garcia, Jason Gunthorpe, linux-arm-kernel,
	Thomas Petazzoni, Gregory Clement, Sebastian Hesselbarth,
	willy tarreau, Jason Cooper

Hello all,

>> What I do not understand is why the log report failed FPDMA commands if
>> the feature is supposed to be SSD-related (looking only at commit
>> messages: 87fb6c31b9 seems SSD-related, ed36911c74 does not). Is it
>> possible that the feature detection is what is causing the issue? Or
>> that the hardware report support w/o having? I can test with a different
>> disk if you think it would help.

Drive misreporting is likely to be the case. Oddly enough, even if the
drive's firmware did misreport support for the new SEND/RECV commands, it
appears that a discard/trim request is being made by the block layer.

In addition to the hdparm dump, could you also provide a full kernel boot
log? The driver should complain if there were any issues retrieving the NCQ
send/receive log page.

Lastly, could you give another drive brand a try, if possible? I had tested
the changes on an Intel SATA AHCI controller and a Micron M500 SSD. I should
be able to scrounge up a Marvell PCIe AHCI controller.

Thanks!
Marc

On 10/8/2013 10:51 PM, Robert Hancock wrote:
> On Tue, Oct 8, 2013 at 12:10 AM, Arnaud Ebalard <arno@natisbad.org> wrote:
>> Hi Robert,
>>
>> Robert Hancock <hancockrwd@gmail.com> writes:
>>
>>> On 10/07/2013 01:12 PM, Arnaud Ebalard wrote:
>>>> Hi guys,
>>>>
>>>> yesterday, I reported on arm kernel mailing list what looked like a sata
>>>> regression on my platform (Marvell Armada 370-based NETGEAR ReadyNAS
>>>> 102). I initially thought this was an ARM-related issue. My initial
>>>> email, provided below, contains various details on the platform and the
>>>> error encountered.
>>>>
>>>> Today, before starting a painful git bisect, I decided to git log
>>>> sata_mv.c code and then more generally drivers/ata to quickly end up on
>>>> commit ed36911c747c (libata: Add support for SEND/RECEIVE FPDMA QUEUED)
>>>> against which I got suspicious after looking again at the errors I had:
>>>>
>>>> [  417.288155] ata1.00: exception Emask 0x0 SAct 0x1fff6001 SErr 0x0 action 0x6 frozen
>>>> [  417.295838] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.301097] ata1.00: cmd 61/48:00:80:ad:0b/00:00:0c:00:00/40 tag 0 ncq 36864 out
>>>> [  417.315896] ata1.00: status: { DRDY }
>>>> [  417.319570] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.324814] ata1.00: cmd 61/08:68:70:a1:87/00:00:0d:00:00/40 tag 13 ncq 4096 out
>>>> [  417.339619] ata1.00: status: { DRDY }
>>>> [  417.343288] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.348536] ata1.00: cmd 61/08:70:28:a2:87/00:00:0d:00:00/40 tag 14 ncq 4096 out
>>>> [  417.363341] ata1.00: status: { DRDY }
>>>> [  417.367010] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.372257] ata1.00: cmd 61/08:80:80:a3:87/00:00:0d:00:00/40 tag 16 ncq 4096 out
>>>> [  417.387061] ata1.00: status: { DRDY }
>>>> [  417.390733] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.395977] ata1.00: cmd 61/08:88:58:a1:c7/00:00:0d:00:00/40 tag 17 ncq 4096 out
>>>> [  417.410782] ata1.00: status: { DRDY }
>>>>
>>>> Reverting both 87fb6c31b9 (libata: Add support for queued DSM TRIM) and
>>>> ed36911c74 (libata: Add support for SEND/RECEIVE FPDMA QUEUED) makes the
>>>> problem disappear. Note: reverting 87fb6c31b9 is not enough and I cannot
>>>> compile the kernel with only the latter reverted.
>>>>
>>>> If you need more info on the platform or want me to test something some
>>>> fix, do not hesitate.
>>>
>>> I assume that it consistently fails on a non-working kernel and
>>> consistently works with those patches reverted? Given that both of
>>> those patches seem to only be touching SSDs with NCQ trim support, it
>>> seems odd they would be breaking a normal hard drive, but maybe there
>>> is some unexpected side effect..
>>
>> With two different disks (same model though, i.e. 250GB 3.5" WD blue), it
>> consistently works on a 3.11.4 and consistently fails on 3.12-rc3 and
>> 3.12-rc4 (not tested others 3.12-rc). The problem is easy to reproduce,
>> i.e. I just need to perform some disk operations. With the two commits
>> reverted from 3.12-rc4, I can consistently do a "find / -exec sha256sum
>> '{}' \;" w/o anything happening.
>>
>> What I do not understand is why the log report failed FPDMA commands if
>> the feature is supposed to be SSD-related (looking only at commit
>> messages: 87fb6c31b9 seems SSD-related, ed36911c74 does not). Is it
>> possible that the feature detection is what is causing the issue? Or
>> that the hardware report support w/o having? I can test with a different
>> disk if you think it would help.
> 
> The commands that are failing are WRITE FPDMA QUEUED which is a
> regular NCQ write command. The ones that these commits add support for
> are FPDMA_SEND and FPDMA_RECV which are used for NCQ trim commands.
> 
> It's possible that the feature detection for this is picking up
> support for FPDMA SEND/RECV on this drive when it shouldn't be. Can
> you post the output of "hdparm --Istdout /dev/sdX" for one of these
> drives (where X matches the drive in question)?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [BUG,REGRESSION] SATA regression on 12.0-rc4 kernel
@ 2013-10-09 15:22             ` Marc (Marc-Angelo) Carino
  0 siblings, 0 replies; 15+ messages in thread
From: Marc (Marc-Angelo) Carino @ 2013-10-09 15:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hello all,

>> What I do not understand is why the log report failed FPDMA commands if
>> the feature is supposed to be SSD-related (looking only at commit
>> messages: 87fb6c31b9 seems SSD-related, ed36911c74 does not). Is it
>> possible that the feature detection is what is causing the issue? Or
>> that the hardware report support w/o having? I can test with a different
>> disk if you think it would help.

Drive misreporting is likely to be the case. Oddly enough, even if the
drive's firmware did misreport support for the new SEND/RECV commands, it
appears that a discard/trim request is being made by the block layer.

In addition to the hdparm dump, could you also provide a full kernel boot
log? The driver should complain if there were any issues retrieving the NCQ
send/receive log page.

Lastly, could you give another drive brand a try, if possible? I had tested
the changes on an Intel SATA AHCI controller and a Micron M500 SSD. I should
be able to scrounge up a Marvell PCIe AHCI controller.

Thanks!
Marc

On 10/8/2013 10:51 PM, Robert Hancock wrote:
> On Tue, Oct 8, 2013 at 12:10 AM, Arnaud Ebalard <arno@natisbad.org> wrote:
>> Hi Robert,
>>
>> Robert Hancock <hancockrwd@gmail.com> writes:
>>
>>> On 10/07/2013 01:12 PM, Arnaud Ebalard wrote:
>>>> Hi guys,
>>>>
>>>> yesterday, I reported on arm kernel mailing list what looked like a sata
>>>> regression on my platform (Marvell Armada 370-based NETGEAR ReadyNAS
>>>> 102). I initially thought this was an ARM-related issue. My initial
>>>> email, provided below, contains various details on the platform and the
>>>> error encountered.
>>>>
>>>> Today, before starting a painful git bisect, I decided to git log
>>>> sata_mv.c code and then more generally drivers/ata to quickly end up on
>>>> commit ed36911c747c (libata: Add support for SEND/RECEIVE FPDMA QUEUED)
>>>> against which I got suspicious after looking again at the errors I had:
>>>>
>>>> [  417.288155] ata1.00: exception Emask 0x0 SAct 0x1fff6001 SErr 0x0 action 0x6 frozen
>>>> [  417.295838] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.301097] ata1.00: cmd 61/48:00:80:ad:0b/00:00:0c:00:00/40 tag 0 ncq 36864 out
>>>> [  417.315896] ata1.00: status: { DRDY }
>>>> [  417.319570] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.324814] ata1.00: cmd 61/08:68:70:a1:87/00:00:0d:00:00/40 tag 13 ncq 4096 out
>>>> [  417.339619] ata1.00: status: { DRDY }
>>>> [  417.343288] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.348536] ata1.00: cmd 61/08:70:28:a2:87/00:00:0d:00:00/40 tag 14 ncq 4096 out
>>>> [  417.363341] ata1.00: status: { DRDY }
>>>> [  417.367010] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.372257] ata1.00: cmd 61/08:80:80:a3:87/00:00:0d:00:00/40 tag 16 ncq 4096 out
>>>> [  417.387061] ata1.00: status: { DRDY }
>>>> [  417.390733] ata1.00: failed command: WRITE FPDMA QUEUED
>>>> [  417.395977] ata1.00: cmd 61/08:88:58:a1:c7/00:00:0d:00:00/40 tag 17 ncq 4096 out
>>>> [  417.410782] ata1.00: status: { DRDY }
>>>>
>>>> Reverting both 87fb6c31b9 (libata: Add support for queued DSM TRIM) and
>>>> ed36911c74 (libata: Add support for SEND/RECEIVE FPDMA QUEUED) makes the
>>>> problem disappear. Note: reverting 87fb6c31b9 is not enough and I cannot
>>>> compile the kernel with only the latter reverted.
>>>>
>>>> If you need more info on the platform or want me to test something some
>>>> fix, do not hesitate.
>>>
>>> I assume that it consistently fails on a non-working kernel and
>>> consistently works with those patches reverted? Given that both of
>>> those patches seem to only be touching SSDs with NCQ trim support, it
>>> seems odd they would be breaking a normal hard drive, but maybe there
>>> is some unexpected side effect..
>>
>> With two different disks (same model though, i.e. 250GB 3.5" WD blue), it
>> consistently works on a 3.11.4 and consistently fails on 3.12-rc3 and
>> 3.12-rc4 (not tested others 3.12-rc). The problem is easy to reproduce,
>> i.e. I just need to perform some disk operations. With the two commits
>> reverted from 3.12-rc4, I can consistently do a "find / -exec sha256sum
>> '{}' \;" w/o anything happening.
>>
>> What I do not understand is why the log report failed FPDMA commands if
>> the feature is supposed to be SSD-related (looking only at commit
>> messages: 87fb6c31b9 seems SSD-related, ed36911c74 does not). Is it
>> possible that the feature detection is what is causing the issue? Or
>> that the hardware report support w/o having? I can test with a different
>> disk if you think it would help.
> 
> The commands that are failing are WRITE FPDMA QUEUED which is a
> regular NCQ write command. The ones that these commits add support for
> are FPDMA_SEND and FPDMA_RECV which are used for NCQ trim commands.
> 
> It's possible that the feature detection for this is picking up
> support for FPDMA SEND/RECV on this drive when it shouldn't be. Can
> you post the output of "hdparm --Istdout /dev/sdX" for one of these
> drives (where X matches the drive in question)?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG,REGRESSION] SATA regression on 12.0-rc4 kernel
  2013-10-09 15:22             ` Marc (Marc-Angelo) Carino
@ 2013-10-09 18:56               ` Arnaud Ebalard
  -1 siblings, 0 replies; 15+ messages in thread
From: Arnaud Ebalard @ 2013-10-09 18:56 UTC (permalink / raw)
  To: Marc (Marc-Angelo) Carino
  Cc: Robert Hancock, Marc Carino, Tejun Heo, linux-ide, Andrew Lunn,
	Ezequiel Garcia, Jason Gunthorpe, linux-arm-kernel,
	Thomas Petazzoni, Gregory Clement, Sebastian Hesselbarth,
	willy tarreau, Jason Cooper

Hi guys,

I went crazy on that one: I managed to reproduce the bug w/ another disk
(Seagate barracuda 7200.9). Then after a 'make clean' on the tree and a
compilation from ground of the stock 3.12-rc4, I never managed to
reproduce the error I had. I tested again with a WD30EFRX, the
WD2500AAJS and the Seagate barracuda 7200.9, all the three in two
different instances of the NAS: the error is gone.

The only logical conclusion is that it was due to some missing 'make
clean' on my tree.

I guess I owe you all some apologies for the noise.

Cheers,

a+

ps: and yes, I hate myself for having overwritten the erroneous image

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [BUG,REGRESSION] SATA regression on 12.0-rc4 kernel
@ 2013-10-09 18:56               ` Arnaud Ebalard
  0 siblings, 0 replies; 15+ messages in thread
From: Arnaud Ebalard @ 2013-10-09 18:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hi guys,

I went crazy on that one: I managed to reproduce the bug w/ another disk
(Seagate barracuda 7200.9). Then after a 'make clean' on the tree and a
compilation from ground of the stock 3.12-rc4, I never managed to
reproduce the error I had. I tested again with a WD30EFRX, the
WD2500AAJS and the Seagate barracuda 7200.9, all the three in two
different instances of the NAS: the error is gone.

The only logical conclusion is that it was due to some missing 'make
clean' on my tree.

I guess I owe you all some apologies for the noise.

Cheers,

a+

ps: and yes, I hate myself for having overwritten the erroneous image

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-10-09 18:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-06 21:38 [BUG,REGRESSION] ARM: mvebu: SATA regression w/ 12.0-rc4 kernel Arnaud Ebalard
2013-10-07 12:59 ` Jason Cooper
2013-10-07 19:12   ` [BUG,REGRESSION] SATA regression on " Arnaud Ebalard
2013-10-08  2:38     ` Robert Hancock
2013-10-08  2:38       ` Robert Hancock
2013-10-08  6:10       ` Arnaud Ebalard
2013-10-08  6:10         ` Arnaud Ebalard
2013-10-09  5:50         ` Robert Hancock
2013-10-09  5:50           ` Robert Hancock
2013-10-09  8:40           ` Arnaud Ebalard
2013-10-09  8:40             ` Arnaud Ebalard
2013-10-09 15:22           ` Marc (Marc-Angelo) Carino
2013-10-09 15:22             ` Marc (Marc-Angelo) Carino
2013-10-09 18:56             ` Arnaud Ebalard
2013-10-09 18:56               ` Arnaud Ebalard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.