linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
@ 2019-01-02 15:25 Sitsofe Wheeler
  2019-01-02 15:29 ` Sitsofe Wheeler
  0 siblings, 1 reply; 13+ messages in thread
From: Sitsofe Wheeler @ 2019-01-02 15:25 UTC (permalink / raw)
  To: linux-block

Hi,

I recently purchased a SATA Samsung 860 EVO SSD and put it in an old
HP microserver (which has an AMD N36L). By default, when the disk load
becomes a little heavy e.g. by running a job like

fio --name=test --readonly --rw=randread --filename /dev/sdb --bs=32k \
    --ioengine=libaio --iodepth=32 --direct=1 --runtime=10m --time_based=1

the kernel starts repeatedly producing error messages like:

[ 1177.729912] ata2.00: exception Emask 0x10 SAct 0x3c000 SErr 0x0
action 0x6 frozen
[ 1177.729931] ata2.00: irq_stat 0x08000000, interface fatal error
[ 1177.729943] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1177.729962] ata2.00: cmd 61/80:70:80:50:e6/06:00:00:00:00/40 tag 14
ncq dma 851968 out
[ 1177.729962]          res 40/00:80:00:5a:e6/00:00:00:00:00/40 Emask
0x10 (ATA bus error)
[ 1177.729978] ata2.00: status: { DRDY }
[ 1177.729986] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1177.730002] ata2.00: cmd 61/00:78:00:57:e6/03:00:00:00:00/40 tag 15
ncq dma 393216 out
[ 1177.730002]          res 40/00:80:00:5a:e6/00:00:00:00:00/40 Emask
0x10 (ATA bus error)
[ 1177.730017] ata2.00: status: { DRDY }
[ 1177.730024] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1177.730039] ata2.00: cmd 61/00:80:00:5a:e6/05:00:00:00:00/40 tag 16
ncq dma 655360 out
[ 1177.730039]          res 40/00:80:00:5a:e6/00:00:00:00:00/40 Emask
0x10 (ATA bus error)
[ 1177.730053] ata2.00: status: { DRDY }
[ 1177.730060] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1177.730078] ata2.00: cmd 61/00:88:00:5f:e6/01:00:00:00:00/40 tag 17
ncq dma 131072 out
[ 1177.730078]          res 40/00:80:00:5a:e6/00:00:00:00:00/40 Emask
0x10 (ATA bus error)
[ 1177.730096] ata2.00: status: { DRDY }
[ 1177.730108] ata2: hard resetting link
[ 1178.205831] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1178.206165] ata2.00: supports DRM functions and may not be fully accessible
[ 1178.209743] ata2.00: supports DRM functions and may not be fully accessible
[ 1178.212786] ata2.00: configured for UDMA/133
[ 1178.212826] ata2: EH complete
[ 1178.212988] ata2.00: Enabling discard_zeroes_data

I tried moving the SSD to another caddy and bay but the issue
persists. None of the regular hard disks (a Western Digital and a
Seagate) nor the other SSD (a Crucial MX500) already in the system
trigger the issue the Samsung 860 EVO does. Adding

libata.force=2.00:noncq

seems to make the issue go away but seemingly at some speed cost (at
least compared to what the MX500 achieves). The OS in use is Ubuntu
18.04 with a 4.15.0-43-generic kernel but even a 4.18.0-13-generic had
the same issue.

Is there anything software-wise that might need investigating that
would allow NCQ to work and a better speed to be reached?

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
  2019-01-02 15:25 failed command: WRITE FPDMA QUEUED with Samsung 860 EVO Sitsofe Wheeler
@ 2019-01-02 15:29 ` Sitsofe Wheeler
  2019-01-02 16:10   ` Laurence Oberman
  0 siblings, 1 reply; 13+ messages in thread
From: Sitsofe Wheeler @ 2019-01-02 15:29 UTC (permalink / raw)
  To: linux-ide; +Cc: linux-block

(Also trying linux-ide list)

On Wed, 2 Jan 2019 at 15:25, Sitsofe Wheeler <sitsofe@gmail.com> wrote:
>
> Hi,
>
> I recently purchased a SATA Samsung 860 EVO SSD and put it in an old
> HP microserver (which has an AMD N36L). By default, when the disk load
> becomes a little heavy e.g. by running a job like
>
> fio --name=test --readonly --rw=randread --filename /dev/sdb --bs=32k \
>     --ioengine=libaio --iodepth=32 --direct=1 --runtime=10m --time_based=1
>
> the kernel starts repeatedly producing error messages like:
>
> [ 1177.729912] ata2.00: exception Emask 0x10 SAct 0x3c000 SErr 0x0
> action 0x6 frozen
> [ 1177.729931] ata2.00: irq_stat 0x08000000, interface fatal error
> [ 1177.729943] ata2.00: failed command: WRITE FPDMA QUEUED
> [ 1177.729962] ata2.00: cmd 61/80:70:80:50:e6/06:00:00:00:00/40 tag 14
> ncq dma 851968 out
> [ 1177.729962]          res 40/00:80:00:5a:e6/00:00:00:00:00/40 Emask
> 0x10 (ATA bus error)
> [ 1177.729978] ata2.00: status: { DRDY }
> [ 1177.729986] ata2.00: failed command: WRITE FPDMA QUEUED
> [ 1177.730002] ata2.00: cmd 61/00:78:00:57:e6/03:00:00:00:00/40 tag 15
> ncq dma 393216 out
> [ 1177.730002]          res 40/00:80:00:5a:e6/00:00:00:00:00/40 Emask
> 0x10 (ATA bus error)
> [ 1177.730017] ata2.00: status: { DRDY }
> [ 1177.730024] ata2.00: failed command: WRITE FPDMA QUEUED
> [ 1177.730039] ata2.00: cmd 61/00:80:00:5a:e6/05:00:00:00:00/40 tag 16
> ncq dma 655360 out
> [ 1177.730039]          res 40/00:80:00:5a:e6/00:00:00:00:00/40 Emask
> 0x10 (ATA bus error)
> [ 1177.730053] ata2.00: status: { DRDY }
> [ 1177.730060] ata2.00: failed command: WRITE FPDMA QUEUED
> [ 1177.730078] ata2.00: cmd 61/00:88:00:5f:e6/01:00:00:00:00/40 tag 17
> ncq dma 131072 out
> [ 1177.730078]          res 40/00:80:00:5a:e6/00:00:00:00:00/40 Emask
> 0x10 (ATA bus error)
> [ 1177.730096] ata2.00: status: { DRDY }
> [ 1177.730108] ata2: hard resetting link
> [ 1178.205831] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [ 1178.206165] ata2.00: supports DRM functions and may not be fully accessible
> [ 1178.209743] ata2.00: supports DRM functions and may not be fully accessible
> [ 1178.212786] ata2.00: configured for UDMA/133
> [ 1178.212826] ata2: EH complete
> [ 1178.212988] ata2.00: Enabling discard_zeroes_data
>
> I tried moving the SSD to another caddy and bay but the issue
> persists. None of the regular hard disks (a Western Digital and a
> Seagate) nor the other SSD (a Crucial MX500) already in the system
> trigger the issue the Samsung 860 EVO does. Adding
>
> libata.force=2.00:noncq
>
> seems to make the issue go away but seemingly at some speed cost (at
> least compared to what the MX500 achieves). The OS in use is Ubuntu
> 18.04 with a 4.15.0-43-generic kernel but even a 4.18.0-13-generic had
> the same issue.
>
> Is there anything software-wise that might need investigating that
> would allow NCQ to work and a better speed to be reached?

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
  2019-01-02 15:29 ` Sitsofe Wheeler
@ 2019-01-02 16:10   ` Laurence Oberman
  2019-01-03 18:28     ` Laurence Oberman
  0 siblings, 1 reply; 13+ messages in thread
From: Laurence Oberman @ 2019-01-02 16:10 UTC (permalink / raw)
  To: Sitsofe Wheeler, linux-ide; +Cc: linux-block

On Wed, 2019-01-02 at 15:29 +0000, Sitsofe Wheeler wrote:
> (Also trying linux-ide list)
> 
> On Wed, 2 Jan 2019 at 15:25, Sitsofe Wheeler <sitsofe@gmail.com>
> wrote:
> > 
> > Hi,
> > 
> > I recently purchased a SATA Samsung 860 EVO SSD and put it in an
> > old
> > HP microserver (which has an AMD N36L). By default, when the disk
> > load
> > becomes a little heavy e.g. by running a job like
> > 
> > fio --name=test --readonly --rw=randread --filename /dev/sdb --
> > bs=32k \
> >     --ioengine=libaio --iodepth=32 --direct=1 --runtime=10m --
> > time_based=1
> > 
> > the kernel starts repeatedly producing error messages like:
> > 
> > [ 1177.729912] ata2.00: exception Emask 0x10 SAct 0x3c000 SErr 0x0
> > action 0x6 frozen
> > [ 1177.729931] ata2.00: irq_stat 0x08000000, interface fatal error
> > [ 1177.729943] ata2.00: failed command: WRITE FPDMA QUEUED
> > [ 1177.729962] ata2.00: cmd 61/80:70:80:50:e6/06:00:00:00:00/40 tag
> > 14
> > ncq dma 851968 out
> > [ 1177.729962]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > Emask
> > 0x10 (ATA bus error)
> > [ 1177.729978] ata2.00: status: { DRDY }
> > [ 1177.729986] ata2.00: failed command: WRITE FPDMA QUEUED
> > [ 1177.730002] ata2.00: cmd 61/00:78:00:57:e6/03:00:00:00:00/40 tag
> > 15
> > ncq dma 393216 out
> > [ 1177.730002]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > Emask
> > 0x10 (ATA bus error)
> > [ 1177.730017] ata2.00: status: { DRDY }
> > [ 1177.730024] ata2.00: failed command: WRITE FPDMA QUEUED
> > [ 1177.730039] ata2.00: cmd 61/00:80:00:5a:e6/05:00:00:00:00/40 tag
> > 16
> > ncq dma 655360 out
> > [ 1177.730039]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > Emask
> > 0x10 (ATA bus error)
> > [ 1177.730053] ata2.00: status: { DRDY }
> > [ 1177.730060] ata2.00: failed command: WRITE FPDMA QUEUED
> > [ 1177.730078] ata2.00: cmd 61/00:88:00:5f:e6/01:00:00:00:00/40 tag
> > 17
> > ncq dma 131072 out
> > [ 1177.730078]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > Emask
> > 0x10 (ATA bus error)
> > [ 1177.730096] ata2.00: status: { DRDY }
> > [ 1177.730108] ata2: hard resetting link
> > [ 1178.205831] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl
> > 300)
> > [ 1178.206165] ata2.00: supports DRM functions and may not be fully
> > accessible
> > [ 1178.209743] ata2.00: supports DRM functions and may not be fully
> > accessible
> > [ 1178.212786] ata2.00: configured for UDMA/133
> > [ 1178.212826] ata2: EH complete
> > [ 1178.212988] ata2.00: Enabling discard_zeroes_data
> > 
> > I tried moving the SSD to another caddy and bay but the issue
> > persists. None of the regular hard disks (a Western Digital and a
> > Seagate) nor the other SSD (a Crucial MX500) already in the system
> > trigger the issue the Samsung 860 EVO does. Adding
> > 
> > libata.force=2.00:noncq
> > 
> > seems to make the issue go away but seemingly at some speed cost
> > (at
> > least compared to what the MX500 achieves). The OS in use is Ubuntu
> > 18.04 with a 4.15.0-43-generic kernel but even a 4.18.0-13-generic
> > had
> > the same issue.
> > 
> > Is there anything software-wise that might need investigating that
> > would allow NCQ to work and a better speed to be reached?
> 
> 

Hello 

I have seen issues reported due to low power delivery to the drive.
However investigating this, its starts with an exception Emask and then
the link error code runs.
Reviewing online some folks are reporting cable issues can cause this
or firmware.
I don't have one to test myself, and you are using an enclosure. Are
you able to connect direct to the motherboard via another cable and
test again.

Regards
Laurence

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
  2019-01-02 16:10   ` Laurence Oberman
@ 2019-01-03 18:28     ` Laurence Oberman
  2019-01-03 20:47       ` Laurence Oberman
  0 siblings, 1 reply; 13+ messages in thread
From: Laurence Oberman @ 2019-01-03 18:28 UTC (permalink / raw)
  To: Sitsofe Wheeler, linux-ide; +Cc: linux-block

On Wed, 2019-01-02 at 11:10 -0500, Laurence Oberman wrote:
> On Wed, 2019-01-02 at 15:29 +0000, Sitsofe Wheeler wrote:
> > (Also trying linux-ide list)
> > 
> > On Wed, 2 Jan 2019 at 15:25, Sitsofe Wheeler <sitsofe@gmail.com>
> > wrote:
> > > 
> > > Hi,
> > > 
> > > I recently purchased a SATA Samsung 860 EVO SSD and put it in an
> > > old
> > > HP microserver (which has an AMD N36L). By default, when the disk
> > > load
> > > becomes a little heavy e.g. by running a job like
> > > 
> > > fio --name=test --readonly --rw=randread --filename /dev/sdb --
> > > bs=32k \
> > >     --ioengine=libaio --iodepth=32 --direct=1 --runtime=10m --
> > > time_based=1
> > > 
> > > the kernel starts repeatedly producing error messages like:
> > > 
> > > [ 1177.729912] ata2.00: exception Emask 0x10 SAct 0x3c000 SErr
> > > 0x0
> > > action 0x6 frozen
> > > [ 1177.729931] ata2.00: irq_stat 0x08000000, interface fatal
> > > error
> > > [ 1177.729943] ata2.00: failed command: WRITE FPDMA QUEUED
> > > [ 1177.729962] ata2.00: cmd 61/80:70:80:50:e6/06:00:00:00:00/40
> > > tag
> > > 14
> > > ncq dma 851968 out
> > > [ 1177.729962]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > > Emask
> > > 0x10 (ATA bus error)
> > > [ 1177.729978] ata2.00: status: { DRDY }
> > > [ 1177.729986] ata2.00: failed command: WRITE FPDMA QUEUED
> > > [ 1177.730002] ata2.00: cmd 61/00:78:00:57:e6/03:00:00:00:00/40
> > > tag
> > > 15
> > > ncq dma 393216 out
> > > [ 1177.730002]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > > Emask
> > > 0x10 (ATA bus error)
> > > [ 1177.730017] ata2.00: status: { DRDY }
> > > [ 1177.730024] ata2.00: failed command: WRITE FPDMA QUEUED
> > > [ 1177.730039] ata2.00: cmd 61/00:80:00:5a:e6/05:00:00:00:00/40
> > > tag
> > > 16
> > > ncq dma 655360 out
> > > [ 1177.730039]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > > Emask
> > > 0x10 (ATA bus error)
> > > [ 1177.730053] ata2.00: status: { DRDY }
> > > [ 1177.730060] ata2.00: failed command: WRITE FPDMA QUEUED
> > > [ 1177.730078] ata2.00: cmd 61/00:88:00:5f:e6/01:00:00:00:00/40
> > > tag
> > > 17
> > > ncq dma 131072 out
> > > [ 1177.730078]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > > Emask
> > > 0x10 (ATA bus error)
> > > [ 1177.730096] ata2.00: status: { DRDY }
> > > [ 1177.730108] ata2: hard resetting link
> > > [ 1178.205831] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl
> > > 300)
> > > [ 1178.206165] ata2.00: supports DRM functions and may not be
> > > fully
> > > accessible
> > > [ 1178.209743] ata2.00: supports DRM functions and may not be
> > > fully
> > > accessible
> > > [ 1178.212786] ata2.00: configured for UDMA/133
> > > [ 1178.212826] ata2: EH complete
> > > [ 1178.212988] ata2.00: Enabling discard_zeroes_data
> > > 
> > > I tried moving the SSD to another caddy and bay but the issue
> > > persists. None of the regular hard disks (a Western Digital and a
> > > Seagate) nor the other SSD (a Crucial MX500) already in the
> > > system
> > > trigger the issue the Samsung 860 EVO does. Adding
> > > 
> > > libata.force=2.00:noncq
> > > 
> > > seems to make the issue go away but seemingly at some speed cost
> > > (at
> > > least compared to what the MX500 achieves). The OS in use is
> > > Ubuntu
> > > 18.04 with a 4.15.0-43-generic kernel but even a 4.18.0-13-
> > > generic
> > > had
> > > the same issue.
> > > 
> > > Is there anything software-wise that might need investigating
> > > that
> > > would allow NCQ to work and a better speed to be reached?
> > 
> > 
> 
> Hello 
> 
> I have seen issues reported due to low power delivery to the drive.
> However investigating this, its starts with an exception Emask and
> then
> the link error code runs.
> Reviewing online some folks are reporting cable issues can cause this
> or firmware.
> I don't have one to test myself, and you are using an enclosure. Are
> you able to connect direct to the motherboard via another cable and
> test again.
> 
> Regards
> Laurence

I managed to find a 860 so going to test it and see if I see the same
behavior and report back

Thanks
Laurence

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
  2019-01-03 18:28     ` Laurence Oberman
@ 2019-01-03 20:47       ` Laurence Oberman
  2019-01-03 22:24         ` Sitsofe Wheeler
  0 siblings, 1 reply; 13+ messages in thread
From: Laurence Oberman @ 2019-01-03 20:47 UTC (permalink / raw)
  To: Sitsofe Wheeler, linux-ide; +Cc: linux-block

On Thu, 2019-01-03 at 13:28 -0500, Laurence Oberman wrote:
> On Wed, 2019-01-02 at 11:10 -0500, Laurence Oberman wrote:
> > On Wed, 2019-01-02 at 15:29 +0000, Sitsofe Wheeler wrote:
> > > (Also trying linux-ide list)
> > > 
> > > On Wed, 2 Jan 2019 at 15:25, Sitsofe Wheeler <sitsofe@gmail.com>
> > > wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > I recently purchased a SATA Samsung 860 EVO SSD and put it in
> > > > an
> > > > old
> > > > HP microserver (which has an AMD N36L). By default, when the
> > > > disk
> > > > load
> > > > becomes a little heavy e.g. by running a job like
> > > > 
> > > > fio --name=test --readonly --rw=randread --filename /dev/sdb --
> > > > bs=32k \
> > > >     --ioengine=libaio --iodepth=32 --direct=1 --runtime=10m --
> > > > time_based=1
> > > > 
> > > > the kernel starts repeatedly producing error messages like:
> > > > 
> > > > [ 1177.729912] ata2.00: exception Emask 0x10 SAct 0x3c000 SErr
> > > > 0x0
> > > > action 0x6 frozen
> > > > [ 1177.729931] ata2.00: irq_stat 0x08000000, interface fatal
> > > > error
> > > > [ 1177.729943] ata2.00: failed command: WRITE FPDMA QUEUED
> > > > [ 1177.729962] ata2.00: cmd 61/80:70:80:50:e6/06:00:00:00:00/40
> > > > tag
> > > > 14
> > > > ncq dma 851968 out
> > > > [ 1177.729962]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > > > Emask
> > > > 0x10 (ATA bus error)
> > > > [ 1177.729978] ata2.00: status: { DRDY }
> > > > [ 1177.729986] ata2.00: failed command: WRITE FPDMA QUEUED
> > > > [ 1177.730002] ata2.00: cmd 61/00:78:00:57:e6/03:00:00:00:00/40
> > > > tag
> > > > 15
> > > > ncq dma 393216 out
> > > > [ 1177.730002]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > > > Emask
> > > > 0x10 (ATA bus error)
> > > > [ 1177.730017] ata2.00: status: { DRDY }
> > > > [ 1177.730024] ata2.00: failed command: WRITE FPDMA QUEUED
> > > > [ 1177.730039] ata2.00: cmd 61/00:80:00:5a:e6/05:00:00:00:00/40
> > > > tag
> > > > 16
> > > > ncq dma 655360 out
> > > > [ 1177.730039]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > > > Emask
> > > > 0x10 (ATA bus error)
> > > > [ 1177.730053] ata2.00: status: { DRDY }
> > > > [ 1177.730060] ata2.00: failed command: WRITE FPDMA QUEUED
> > > > [ 1177.730078] ata2.00: cmd 61/00:88:00:5f:e6/01:00:00:00:00/40
> > > > tag
> > > > 17
> > > > ncq dma 131072 out
> > > > [ 1177.730078]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > > > Emask
> > > > 0x10 (ATA bus error)
> > > > [ 1177.730096] ata2.00: status: { DRDY }
> > > > [ 1177.730108] ata2: hard resetting link
> > > > [ 1178.205831] ata2: SATA link up 3.0 Gbps (SStatus 123
> > > > SControl
> > > > 300)
> > > > [ 1178.206165] ata2.00: supports DRM functions and may not be
> > > > fully
> > > > accessible
> > > > [ 1178.209743] ata2.00: supports DRM functions and may not be
> > > > fully
> > > > accessible
> > > > [ 1178.212786] ata2.00: configured for UDMA/133
> > > > [ 1178.212826] ata2: EH complete
> > > > [ 1178.212988] ata2.00: Enabling discard_zeroes_data
> > > > 
> > > > I tried moving the SSD to another caddy and bay but the issue
> > > > persists. None of the regular hard disks (a Western Digital and
> > > > a
> > > > Seagate) nor the other SSD (a Crucial MX500) already in the
> > > > system
> > > > trigger the issue the Samsung 860 EVO does. Adding
> > > > 
> > > > libata.force=2.00:noncq
> > > > 
> > > > seems to make the issue go away but seemingly at some speed
> > > > cost
> > > > (at
> > > > least compared to what the MX500 achieves). The OS in use is
> > > > Ubuntu
> > > > 18.04 with a 4.15.0-43-generic kernel but even a 4.18.0-13-
> > > > generic
> > > > had
> > > > the same issue.
> > > > 
> > > > Is there anything software-wise that might need investigating
> > > > that
> > > > would allow NCQ to work and a better speed to be reached?
> > > 
> > > 
> > 
> > Hello 
> > 
> > I have seen issues reported due to low power delivery to the drive.
> > However investigating this, its starts with an exception Emask and
> > then
> > the link error code runs.
> > Reviewing online some folks are reporting cable issues can cause
> > this
> > or firmware.
> > I don't have one to test myself, and you are using an enclosure.
> > Are
> > you able to connect direct to the motherboard via another cable and
> > test again.
> > 
> > Regards
> > Laurence
> 
> I managed to find a 860 so going to test it and see if I see the same
> behavior and report back
> 
> Thanks
> Laurence

Hello

I put the 860 in an enclosure (MSA50) driven by a SAS HBA
(megaraid)sas)

The backplane is SAS or SATA

/dev/sg2  0 0 49 0  0  /dev/sdb  ATA       Samsung SSD 860   1B6Q

Running the same fio test of yours on latest RHEL7 and 4.20.0+-1 I am
unable to reproduce this issue of yours after multiple test runs.

Tests all run to completion with no errors on RHEL7 and upstream
kernels.

I have no way to test at the moment with a direct motherboard
connection to a SATA port so if this is a host side issue with sata
(ATA) I would not see it.

What this likely means is that the drive itself seems to be well
behaved here and the power or cable issue I alluded to earlier may be
worth looking into for you or possibly the host ATA interface.

RHEL7 kernel
3.10.0-862.11.1.el7.x86_64

test: (g=0): rw=randread, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB,
(T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=32
fio-3.3-38-gf5ec8
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=120MiB/s,w=0KiB/s][r=3839,w=0 IOPS][eta
00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=3974: Thu Jan  3 15:14:10 2019
   read: IOPS=3827, BW=120MiB/s (125MB/s)(70.1GiB/600009msec)
    slat (usec): min=7, max=374, avg=23.78, stdev= 6.09
    clat (usec): min=449, max=509311, avg=8330.29, stdev=2060.29
     lat (usec): min=514, max=509331, avg=8355.00, stdev=2060.29
    clat percentiles (usec):
     |  1.00th=[ 5342],  5.00th=[ 7767], 10.00th=[ 8225], 20.00th=[
8291],
     | 30.00th=[ 8291], 40.00th=[ 8291], 50.00th=[ 8291], 60.00th=[
8291],
     | 70.00th=[ 8356], 80.00th=[ 8356], 90.00th=[ 8455], 95.00th=[
8848],
     | 99.00th=[11600], 99.50th=[13042], 99.90th=[16581],
99.95th=[17695],
     | 99.99th=[19006]
   bw (  KiB/s): min=50560, max=124472, per=99.94%, avg=122409.89,
stdev=2592.08, samples=1200
   iops        : min= 1580, max= 3889, avg=3825.22, stdev=81.01,
samples=1200
  lat (usec)   : 500=0.01%, 750=0.03%, 1000=0.02%
  lat (msec)   : 2=0.08%, 4=0.32%, 10=97.20%, 20=2.34%, 50=0.01%
  lat (msec)   : 750=0.01%
  cpu          : usr=4.76%, sys=12.81%, ctx=2113947, majf=0, minf=14437
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
>=64=0.0%
     issued rwts: total=2296574,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=120MiB/s (125MB/s), 120MiB/s-120MiB/s (125MB/s-125MB/s),
io=70.1GiB (75.3GB), run=600009-600009msecmodinfo ata

Disk stats (read/write):
  sdb: ios=2295763/0, merge=0/0, ticks=18786069/0, in_queue=18784356,
util=100.00%

Upstream Kernel
4.20.0+-1.x86_64

[root@localhost ~]# ./test_ssd.sh 
test: (g=0): rw=randread, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB,
(T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=32
fio-3.3-38-gf5ec8
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=120MiB/s,w=0KiB/s][r=3835,w=0 IOPS][eta
00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2895: Thu Jan  3 15:47:21 2019
   read: IOPS=3826, BW=120MiB/s (125MB/s)(70.1GiB/600009msec)
    slat (usec): min=5, max=410, avg=26.92, stdev= 3.81
    clat (usec): min=760, max=1287.1k, avg=8327.27, stdev=4756.19
     lat (usec): min=787, max=1287.1k, avg=8355.50, stdev=4756.18
    clat percentiles (usec):
     |  1.00th=[ 8225],  5.00th=[ 8291], 10.00th=[ 8291], 20.00th=[
8291],
     | 30.00th=[ 8291], 40.00th=[ 8291], 50.00th=[ 8291], 60.00th=[
8291],
     | 70.00th=[ 8356], 80.00th=[ 8356], 90.00th=[ 8356], 95.00th=[
8356],
     | 99.00th=[ 8455], 99.50th=[ 8455], 99.90th=[ 8455], 99.95th=[
8455],
     | 99.99th=[ 9765]
   bw (  KiB/s): min=25152, max=124559, per=100.00%, avg=122589.35,
stdev=3879.77, samples=1199
   iops        : min=  786, max= 3892, avg=3830.88, stdev=121.24,
samples=1199
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=99.99%, 20=0.01%
  cpu          : usr=4.19%, sys=18.68%, ctx=2295902, majf=0, minf=278
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
>=64=0.0%
     issued rwts: total=2296041,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=120MiB/s (125MB/s), 120MiB/s-120MiB/s (125MB/s-125MB/s),
io=70.1GiB (75.2GB), run=600009-600009msec

Disk stats (read/write):
  sdb: ios=2296022/0, merge=0/0, ticks=19111730/0, in_queue=18408961,
util=99.87%



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
  2019-01-03 20:47       ` Laurence Oberman
@ 2019-01-03 22:24         ` Sitsofe Wheeler
  2019-01-03 22:40           ` Laurence Oberman
  0 siblings, 1 reply; 13+ messages in thread
From: Sitsofe Wheeler @ 2019-01-03 22:24 UTC (permalink / raw)
  To: Laurence Oberman; +Cc: linux-ide, linux-block

Hi,

On Thu, 3 Jan 2019 at 20:47, Laurence Oberman <loberman@redhat.com> wrote:
>
> Hello
>
> I put the 860 in an enclosure (MSA50) driven by a SAS HBA
> (megaraid)sas)
>
> The backplane is SAS or SATA
>
> /dev/sg2  0 0 49 0  0  /dev/sdb  ATA       Samsung SSD 860   1B6Q
>
> Running the same fio test of yours on latest RHEL7 and 4.20.0+-1 I am
> unable to reproduce this issue of yours after multiple test runs.
>
> Tests all run to completion with no errors on RHEL7 and upstream
> kernels.
>
> I have no way to test at the moment with a direct motherboard
> connection to a SATA port so if this is a host side issue with sata
> (ATA) I would not see it.
>
> What this likely means is that the drive itself seems to be well
> behaved here and the power or cable issue I alluded to earlier may be
> worth looking into for you or possibly the host ATA interface.
>
> RHEL7 kernel
> 3.10.0-862.11.1.el7.x86_64

Thanks for going the extra mile on this Laurence - it does sound like
whatever issue I'm seeing with the 860 EVO is local to my box. It's
curious that others are seeing something similar (e.g.
https://github.com/zfsonlinux/zfs/issues/4873#issuecomment-449798356 )
but maybe they're in the same boat as me.

> test: (g=0): rw=randread, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB,
> (T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=32
> fio-3.3-38-gf5ec8
> Starting 1 process
> Jobs: 1 (f=1): [r(1)][100.0%][r=120MiB/s,w=0KiB/s][r=3839,w=0 IOPS][eta
> 00m:00s]
> test: (groupid=0, jobs=1): err= 0: pid=3974: Thu Jan  3 15:14:10 2019
>    read: IOPS=3827, BW=120MiB/s (125MB/s)(70.1GiB/600009msec)
>     slat (usec): min=7, max=374, avg=23.78, stdev= 6.09
>     clat (usec): min=449, max=509311, avg=8330.29, stdev=2060.29
>      lat (usec): min=514, max=509331, avg=8355.00, stdev=2060.29
>     clat percentiles (usec):
>      |  1.00th=[ 5342],  5.00th=[ 7767], 10.00th=[ 8225], 20.00th=[
> 8291],
>      | 30.00th=[ 8291], 40.00th=[ 8291], 50.00th=[ 8291], 60.00th=[
> 8291],
>      | 70.00th=[ 8356], 80.00th=[ 8356], 90.00th=[ 8455], 95.00th=[
> 8848],
>      | 99.00th=[11600], 99.50th=[13042], 99.90th=[16581],
> 99.95th=[17695],
>      | 99.99th=[19006]
>    bw (  KiB/s): min=50560, max=124472, per=99.94%, avg=122409.89,
> stdev=2592.08, samples=1200
>    iops        : min= 1580, max= 3889, avg=3825.22, stdev=81.01,
> samples=1200
>   lat (usec)   : 500=0.01%, 750=0.03%, 1000=0.02%
>   lat (msec)   : 2=0.08%, 4=0.32%, 10=97.20%, 20=2.34%, 50=0.01%
>   lat (msec)   : 750=0.01%
>   cpu          : usr=4.76%, sys=12.81%, ctx=2113947, majf=0, minf=14437
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
> >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
> >=64=0.0%
>      issued rwts: total=2296574,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>      latency   : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
>    READ: bw=120MiB/s (125MB/s), 120MiB/s-120MiB/s (125MB/s-125MB/s),
> io=70.1GiB (75.3GB), run=600009-600009msecmodinfo ata
>
> Disk stats (read/write):
>   sdb: ios=2295763/0, merge=0/0, ticks=18786069/0, in_queue=18784356,
> util=100.00%

For what it's worth, the speeds I see with NCQ off on the Samsung 860
EVO are not far off what you're reporting (but are much lower than
those I see on the MX500 in the same machine). I suppose it could just
be the MX500 is simply a better performing SSD for the specific
workload I have been testing...

--
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
  2019-01-03 22:24         ` Sitsofe Wheeler
@ 2019-01-03 22:40           ` Laurence Oberman
  2019-01-04  7:33             ` Sitsofe Wheeler
  0 siblings, 1 reply; 13+ messages in thread
From: Laurence Oberman @ 2019-01-03 22:40 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: linux-ide, linux-block

On Thu, 2019-01-03 at 22:24 +0000, Sitsofe Wheeler wrote:
> Hi,
> 
> On Thu, 3 Jan 2019 at 20:47, Laurence Oberman <loberman@redhat.com>
> wrote:
> > 
> > Hello
> > 
> > I put the 860 in an enclosure (MSA50) driven by a SAS HBA
> > (megaraid)sas)
> > 
> > The backplane is SAS or SATA
> > 
> > /dev/sg2  0 0 49 0  0  /dev/sdb  ATA       Samsung SSD 860   1B6Q
> > 
> > Running the same fio test of yours on latest RHEL7 and 4.20.0+-1 I
> > am
> > unable to reproduce this issue of yours after multiple test runs.
> > 
> > Tests all run to completion with no errors on RHEL7 and upstream
> > kernels.
> > 
> > I have no way to test at the moment with a direct motherboard
> > connection to a SATA port so if this is a host side issue with sata
> > (ATA) I would not see it.
> > 
> > What this likely means is that the drive itself seems to be well
> > behaved here and the power or cable issue I alluded to earlier may
> > be
> > worth looking into for you or possibly the host ATA interface.
> > 
> > RHEL7 kernel
> > 3.10.0-862.11.1.el7.x86_64
> 
> Thanks for going the extra mile on this Laurence - it does sound like
> whatever issue I'm seeing with the 860 EVO is local to my box. It's
> curious that others are seeing something similar (e.g.
> https://github.com/zfsonlinux/zfs/issues/4873#issuecomment-449798356
> )
> but maybe they're in the same boat as me.
> 
> > test: (g=0): rw=randread, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-
> > 32.0KiB,
> > (T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=32
> > fio-3.3-38-gf5ec8
> > Starting 1 process
> > Jobs: 1 (f=1): [r(1)][100.0%][r=120MiB/s,w=0KiB/s][r=3839,w=0
> > IOPS][eta
> > 00m:00s]
> > test: (groupid=0, jobs=1): err= 0: pid=3974: Thu Jan  3 15:14:10
> > 2019
> >    read: IOPS=3827, BW=120MiB/s (125MB/s)(70.1GiB/600009msec)
> >     slat (usec): min=7, max=374, avg=23.78, stdev= 6.09
> >     clat (usec): min=449, max=509311, avg=8330.29, stdev=2060.29
> >      lat (usec): min=514, max=509331, avg=8355.00, stdev=2060.29
> >     clat percentiles (usec):
> >      |  1.00th=[ 5342],  5.00th=[ 7767], 10.00th=[ 8225], 20.00th=[
> > 8291],
> >      | 30.00th=[ 8291], 40.00th=[ 8291], 50.00th=[ 8291], 60.00th=[
> > 8291],
> >      | 70.00th=[ 8356], 80.00th=[ 8356], 90.00th=[ 8455], 95.00th=[
> > 8848],
> >      | 99.00th=[11600], 99.50th=[13042], 99.90th=[16581],
> > 99.95th=[17695],
> >      | 99.99th=[19006]
> >    bw (  KiB/s): min=50560, max=124472, per=99.94%, avg=122409.89,
> > stdev=2592.08, samples=1200
> >    iops        : min= 1580, max= 3889, avg=3825.22, stdev=81.01,
> > samples=1200
> >   lat (usec)   : 500=0.01%, 750=0.03%, 1000=0.02%
> >   lat (msec)   : 2=0.08%, 4=0.32%, 10=97.20%, 20=2.34%, 50=0.01%
> >   lat (msec)   : 750=0.01%
> >   cpu          : usr=4.76%, sys=12.81%, ctx=2113947, majf=0,
> > minf=14437
> >   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%,
> > 32=100.0%,
> > > =64=0.0%
> > 
> >      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> > 64=0.0%,
> > > =64=0.0%
> > 
> >      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%,
> > 64=0.0%,
> > > =64=0.0%
> > 
> >      issued rwts: total=2296574,0,0,0 short=0,0,0,0 dropped=0,0,0,0
> >      latency   : target=0, window=0, percentile=100.00%, depth=32
> > 
> > Run status group 0 (all jobs):
> >    READ: bw=120MiB/s (125MB/s), 120MiB/s-120MiB/s (125MB/s-
> > 125MB/s),
> > io=70.1GiB (75.3GB), run=600009-600009msecmodinfo ata
> > 
> > Disk stats (read/write):
> >   sdb: ios=2295763/0, merge=0/0, ticks=18786069/0,
> > in_queue=18784356,
> > util=100.00%
> 
> For what it's worth, the speeds I see with NCQ off on the Samsung 860
> EVO are not far off what you're reporting (but are much lower than
> those I see on the MX500 in the same machine). I suppose it could
> just
> be the MX500 is simply a better performing SSD for the specific
> workload I have been testing...
> 
> --
> Sitsofe | http://sucs.org/~sits/

Hello Sitsofe

I am going to try tomorrow on a motherboard direct connection.
My testing was with no flags to libata, but of course ATA is hidden 
host wise in my test as I am going via megaraid_sas to the MSA50 shelf.

Are you using 32k blocks on the MX500 as well, is that 12gbit or 6gbit
SAS (The MX500)
Was it the same read tests via fio.

Thanks
Laurence



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
  2019-01-03 22:40           ` Laurence Oberman
@ 2019-01-04  7:33             ` Sitsofe Wheeler
  2019-01-07  7:17               ` Hannes Reinecke
  0 siblings, 1 reply; 13+ messages in thread
From: Sitsofe Wheeler @ 2019-01-04  7:33 UTC (permalink / raw)
  To: Laurence Oberman; +Cc: linux-ide, linux-block

Blimey Laurence - you're really pushing the boat out on this one!

On Thu, 3 Jan 2019 at 22:40, Laurence Oberman <loberman@redhat.com> wrote:
>
> On Thu, 2019-01-03 at 22:24 +0000, Sitsofe Wheeler wrote:
> > Hi,
> >
> > On Thu, 3 Jan 2019 at 20:47, Laurence Oberman <loberman@redhat.com>
> > wrote:
> > >
> > > Hello
> > >
> > > I put the 860 in an enclosure (MSA50) driven by a SAS HBA
> > > (megaraid)sas)
> > >
> > > The backplane is SAS or SATA
> > >
> > > /dev/sg2  0 0 49 0  0  /dev/sdb  ATA       Samsung SSD 860   1B6Q
> > >
> > > Running the same fio test of yours on latest RHEL7 and 4.20.0+-1 I
> > > am
> > > unable to reproduce this issue of yours after multiple test runs.
> > >
> > > Tests all run to completion with no errors on RHEL7 and upstream
> > > kernels.
> > >
> > > I have no way to test at the moment with a direct motherboard
> > > connection to a SATA port so if this is a host side issue with sata
> > > (ATA) I would not see it.
> > >
> > > What this likely means is that the drive itself seems to be well
> > > behaved here and the power or cable issue I alluded to earlier may
> > > be
> > > worth looking into for you or possibly the host ATA interface.
> > >
> > > RHEL7 kernel
> > > 3.10.0-862.11.1.el7.x86_64
> >
> > Thanks for going the extra mile on this Laurence - it does sound like
> > whatever issue I'm seeing with the 860 EVO is local to my box. It's
> > curious that others are seeing something similar (e.g.
> > https://github.com/zfsonlinux/zfs/issues/4873#issuecomment-449798356
> > )
> > but maybe they're in the same boat as me.
> >
> > > test: (g=0): rw=randread, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-
> > > 32.0KiB,
> > > (T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=32
> > > fio-3.3-38-gf5ec8
> > > Starting 1 process
> > > Jobs: 1 (f=1): [r(1)][100.0%][r=120MiB/s,w=0KiB/s][r=3839,w=0
> > > IOPS][eta
> > > 00m:00s]
> > > test: (groupid=0, jobs=1): err= 0: pid=3974: Thu Jan  3 15:14:10
> > > 2019
> > >    read: IOPS=3827, BW=120MiB/s (125MB/s)(70.1GiB/600009msec)
> > >     slat (usec): min=7, max=374, avg=23.78, stdev= 6.09
> > >     clat (usec): min=449, max=509311, avg=8330.29, stdev=2060.29
> > >      lat (usec): min=514, max=509331, avg=8355.00, stdev=2060.29
> > >     clat percentiles (usec):
> > >      |  1.00th=[ 5342],  5.00th=[ 7767], 10.00th=[ 8225], 20.00th=[
> > > 8291],
> > >      | 30.00th=[ 8291], 40.00th=[ 8291], 50.00th=[ 8291], 60.00th=[
> > > 8291],
> > >      | 70.00th=[ 8356], 80.00th=[ 8356], 90.00th=[ 8455], 95.00th=[
> > > 8848],
> > >      | 99.00th=[11600], 99.50th=[13042], 99.90th=[16581],
> > > 99.95th=[17695],
> > >      | 99.99th=[19006]
> > >    bw (  KiB/s): min=50560, max=124472, per=99.94%, avg=122409.89,
> > > stdev=2592.08, samples=1200
> > >    iops        : min= 1580, max= 3889, avg=3825.22, stdev=81.01,
> > > samples=1200
> > >   lat (usec)   : 500=0.01%, 750=0.03%, 1000=0.02%
> > >   lat (msec)   : 2=0.08%, 4=0.32%, 10=97.20%, 20=2.34%, 50=0.01%
> > >   lat (msec)   : 750=0.01%
> > >   cpu          : usr=4.76%, sys=12.81%, ctx=2113947, majf=0,
> > > minf=14437
> > >   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%,
> > > 32=100.0%,
> > > > =64=0.0%
> > >
> > >      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> > > 64=0.0%,
> > > > =64=0.0%
> > >
> > >      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%,
> > > 64=0.0%,
> > > > =64=0.0%
> > >
> > >      issued rwts: total=2296574,0,0,0 short=0,0,0,0 dropped=0,0,0,0
> > >      latency   : target=0, window=0, percentile=100.00%, depth=32
> > >
> > > Run status group 0 (all jobs):
> > >    READ: bw=120MiB/s (125MB/s), 120MiB/s-120MiB/s (125MB/s-
> > > 125MB/s),
> > > io=70.1GiB (75.3GB), run=600009-600009msecmodinfo ata
> > >
> > > Disk stats (read/write):
> > >   sdb: ios=2295763/0, merge=0/0, ticks=18786069/0,
> > > in_queue=18784356,
> > > util=100.00%
> >
> > For what it's worth, the speeds I see with NCQ off on the Samsung 860
> > EVO are not far off what you're reporting (but are much lower than
> > those I see on the MX500 in the same machine). I suppose it could
> > just
> > be the MX500 is simply a better performing SSD for the specific
> > workload I have been testing...
> >
> > --
> > Sitsofe | http://sucs.org/~sits/
>
> Hello Sitsofe
>
> I am going to try tomorrow on a motherboard direct connection.
> My testing was with no flags to libata, but of course ATA is hidden
> host wise in my test as I am going via megaraid_sas to the MSA50 shelf.
>
> Are you using 32k blocks on the MX500 as well, is that 12gbit or 6gbit
> SAS (The MX500)
> Was it the same read tests via fio.

Yes I'm using 32k blocks on the MX500 as well:
# Samsung 860 EVO (with NCQ disabled)
# fio --readonly --name=test --rw=randread --filename \
 $(readlink -f /dev/disk/by-id/ata-Samsung_SSD_860_EVO_500GB_XXXXXXXXXXXXXXX) \
 --bs=32k --ioengine=libaio --iodepth=32 --direct=1 --runtime=60s \
 --time_based=1
test: (g=0): rw=randread, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB,
(T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=32
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=107MiB/s,w=0KiB/s][r=3410,w=0 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=3671: Fri Jan  4 07:21:20 2019
   read: IOPS=3098, BW=96.8MiB/s (102MB/s)(5811MiB/60010msec)
    slat (usec): min=9, max=1398, avg=47.83, stdev=12.78
    clat (usec): min=384, max=69747, avg=10262.24, stdev=5722.72
     lat (usec): min=398, max=69798, avg=10311.27, stdev=5723.76
    clat percentiles (usec):
     |  1.00th=[  881],  5.00th=[ 1663], 10.00th=[ 2606], 20.00th=[ 4490],
     | 30.00th=[ 6390], 40.00th=[ 8225], 50.00th=[10159], 60.00th=[11994],
     | 70.00th=[13829], 80.00th=[15664], 90.00th=[17957], 95.00th=[19530],
     | 99.00th=[22152], 99.50th=[23200], 99.90th=[28967], 99.95th=[35390],
     | 99.99th=[58459]
   bw (  KiB/s): min=84032, max=111104, per=100.00%, avg=99154.29,
stdev=5994.62, samples=120
   iops        : min= 2626, max= 3472, avg=3098.57, stdev=187.33, samples=120
  lat (usec)   : 500=0.09%, 750=0.46%, 1000=0.95%
  lat (msec)   : 2=5.26%, 4=10.53%, 10=32.09%, 20=46.96%, 50=3.65%
  lat (msec)   : 100=0.02%
  cpu          : usr=9.95%, sys=26.29%, ctx=186005, majf=0, minf=266
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwt: total=185941,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=96.8MiB/s (102MB/s), 96.8MiB/s-96.8MiB/s
(102MB/s-102MB/s), io=5811MiB (6093MB), run=60010-60010msec

Disk stats (read/write):
  sdb: ios=185497/86, merge=2/46, ticks=1893688/4600,
in_queue=1898360, util=99.92%

# Crucial MX500
# test: (g=0): rw=randread, bs=(R) 32.0KiB-32.0KiB, (W)
32.0KiB-32.0KiB, (T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=32
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=249MiB/s,w=0KiB/s][r=7964,w=0 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=3684: Fri Jan  4 07:24:29 2019
   read: IOPS=7958, BW=249MiB/s (261MB/s)(14.6GiB/60004msec)
    slat (usec): min=8, max=781, avg=39.09, stdev=26.28
    clat (usec): min=351, max=11790, avg=3971.78, stdev=850.49
     lat (usec): min=418, max=11805, avg=4011.70, stdev=849.55
    clat percentiles (usec):
     |  1.00th=[ 2540],  5.00th=[ 2900], 10.00th=[ 3097], 20.00th=[ 3359],
     | 30.00th=[ 3556], 40.00th=[ 3720], 50.00th=[ 3884], 60.00th=[ 4015],
     | 70.00th=[ 4178], 80.00th=[ 4424], 90.00th=[ 4686], 95.00th=[ 5211],
     | 99.00th=[ 7177], 99.50th=[ 7308], 99.90th=[ 7701], 99.95th=[ 7832],
     | 99.99th=[ 8029]
   bw (  KiB/s): min=249856, max=255872, per=100.00%, avg=254709.22,
stdev=592.04, samples=120
   iops        : min= 7808, max= 7996, avg=7959.62, stdev=18.49, samples=120
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.07%, 4=58.38%, 10=41.54%, 20=0.01%
  cpu          : usr=13.17%, sys=45.78%, ctx=278702, majf=0, minf=265
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwt: total=477571,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=249MiB/s (261MB/s), 249MiB/s-249MiB/s (261MB/s-261MB/s),
io=14.6GiB (15.6GB), run=60004-60004msec

Disk stats (read/write):
  sda: ios=476506/5, merge=0/1, ticks=1876020/136, in_queue=1875680, util=99.91%

I've yet to attach the disk directly to the mobo. It's a bit fiddly as
the most accessible port is meant for the DVD drive and I think it's
speed is slower than the others.

The speed of the ATA ports is lower than you might expect (this
machine is fairly old):

[    2.725849] ahci 0000:00:11.0: version 3.0
[    2.726434] ahci 0000:00:11.0: AHCI 0001.0200 32 slots 4 ports 3
Gbps 0xf impl SATA mode
[    2.726439] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led
clo pmp pio slum part
[    2.734592] scsi host0: ahci
[    2.741769] scsi host1: ahci
[    2.747197] scsi host2: ahci
[    2.752589] scsi host3: ahci
[    2.752731] ata1: SATA max UDMA/133 abar m1024@0xfe6ffc00 port
0xfe6ffd00 irq 25
[    2.752735] ata2: SATA max UDMA/133 abar m1024@0xfe6ffc00 port
0xfe6ffd80 irq 25
[    2.752739] ata3: SATA max UDMA/133 abar m1024@0xfe6ffc00 port
0xfe6ffe00 irq 25
[    2.752742] ata4: SATA max UDMA/133 abar m1024@0xfe6ffc00 port
0xfe6ffe80 irq 25
[...]
[    3.228107] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    3.228941] ata1.00: supports DRM functions and may not be fully accessible
[    3.228979] ata1.00: ATA-10: CT500MX500SSD1, M3CR023, max UDMA/133
[    3.228982] ata1.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    3.229920] ata1.00: supports DRM functions and may not be fully accessible
[    3.230705] ata1.00: configured for UDMA/133
[    3.231082] scsi 0:0:0:0: Direct-Access     ATA      CT500MX500SSD1
  023  PQ: 0 ANSI: 5
[    3.231546] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    3.231767] sd 0:0:0:0: [sda] 976773168 512-byte logical blocks:
(500 GB/466 GiB)
[    3.231770] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    3.231790] sd 0:0:0:0: [sda] Write Protect is off
[    3.231793] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    3.231826] sd 0:0:0:0: [sda] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[    3.232091] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    3.233529]  sda: sda1 sda2 sda3
[    3.236030] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    3.236057] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    3.236617] ata4.00: ATA-9: WDC WD20EZRX-00D8PB0, 80.00A80, max UDMA/133
[    3.236620] ata4.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[    3.236843] ata3.00: ATA-9: ST2000DM001-1CH164, CC29, max UDMA/133
[    3.236846] ata3.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[    3.237178] ata4.00: configured for UDMA/133
[    3.237669] ata3.00: configured for UDMA/133
[    3.240132] sd 0:0:0:0: [sda] supports TCG Opal
[    3.240136] sd 0:0:0:0: [sda] Attached SCSI disk
[    3.242623] ata2.00: FORCE: horkage modified (noncq)
[    3.242683] ata2.00: supports DRM functions and may not be fully accessible
[    3.242686] ata2.00: ATA-11: Samsung SSD 860 EVO 500GB, RVT01B6Q,
max UDMA/133
[    3.242689] ata2.00: 976773168 sectors, multi 1: LBA48 NCQ (not used)
[    3.245518] ata2.00: supports DRM functions and may not be fully accessible
[    3.247611] ata2.00: configured for UDMA/133
[    3.247915] scsi 1:0:0:0: Direct-Access     ATA      Samsung SSD
860  1B6Q PQ: 0 ANSI: 5
[    3.248390] ata2.00: Enabling discard_zeroes_data
[    3.248493] sd 1:0:0:0: [sdb] 976773168 512-byte logical blocks:
(500 GB/466 GiB)
[    3.248514] sd 1:0:0:0: [sdb] Write Protect is off
[    3.248517] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    3.248551] sd 1:0:0:0: [sdb] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[    3.248777] ata2.00: Enabling discard_zeroes_data
[    3.249093] sd 1:0:0:0: Attached scsi generic sg1 type 0
[    3.249398] scsi 2:0:0:0: Direct-Access     ATA
ST2000DM001-1CH1 CC29 PQ: 0 ANSI: 5
[    3.249649] sd 2:0:0:0: Attached scsi generic sg2 type 0
[    3.250007] scsi 3:0:0:0: Direct-Access     ATA      WDC
WD20EZRX-00D 0A80 PQ: 0 ANSI: 5
[    3.250266] sd 3:0:0:0: Attached scsi generic sg3 type 0
[    3.250477] sd 2:0:0:0: [sdc] 3907029168 512-byte logical blocks:
(2.00 TB/1.82 TiB)
[    3.250480] sd 2:0:0:0: [sdc] 4096-byte physical blocks
[    3.250512] sd 2:0:0:0: [sdc] Write Protect is off
[    3.250514] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[    3.250586] sd 2:0:0:0: [sdc] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[    3.250619] sd 3:0:0:0: [sdd] 3907029168 512-byte logical blocks:
(2.00 TB/1.82 TiB)
[    3.250622] sd 3:0:0:0: [sdd] 4096-byte physical blocks
[    3.250739] sd 3:0:0:0: [sdd] Write Protect is off
[    3.250741] sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[    3.250811] sd 3:0:0:0: [sdd] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[    3.254586]  sdb: sdb1 sdb2 sdb3 sdb4
[    3.255158] ata2.00: Enabling discard_zeroes_data
[    3.257058] sd 1:0:0:0: [sdb] supports TCG Opal
[    3.257063] sd 1:0:0:0: [sdb] Attached SCSI disk
[    3.274972]  sdd: sdd1
[    3.275532] sd 3:0:0:0: [sdd] Attached SCSI disk
[    3.276090]  sdc: sdc1
[    3.276548] sd 2:0:0:0: [sdc] Attached SCSI disk

--
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
  2019-01-04  7:33             ` Sitsofe Wheeler
@ 2019-01-07  7:17               ` Hannes Reinecke
  2019-01-07  7:41                 ` Sitsofe Wheeler
  0 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2019-01-07  7:17 UTC (permalink / raw)
  To: Sitsofe Wheeler, Laurence Oberman; +Cc: linux-ide, linux-block

On 1/4/19 8:33 AM, Sitsofe Wheeler wrote:
> Blimey Laurence - you're really pushing the boat out on this one!
> 
[ .. ]
> I've yet to attach the disk directly to the mobo. It's a bit fiddly as
> the most accessible port is meant for the DVD drive and I think it's
> speed is slower than the others.
> 
> The speed of the ATA ports is lower than you might expect (this
> machine is fairly old):
> 
[ .. ]
> [    3.242623] ata2.00: FORCE: horkage modified (noncq)
> [    3.242683] ata2.00: supports DRM functions and may not be fully accessible
> [    3.242686] ata2.00: ATA-11: Samsung SSD 860 EVO 500GB, RVT01B6Q,
> max UDMA/133
> [    3.242689] ata2.00: 976773168 sectors, multi 1: LBA48 NCQ (not used)
> [    3.245518] ata2.00: supports DRM functions and may not be fully accessible
> [    3.247611] ata2.00: configured for UDMA/133

'slower' is an understatement.
That adapter can't do NCQ, hence 'WRITE FPDMA QUEUED' (which _is_ an NCQ 
command) will never be issued.
So I'd be _very_ surprised if you still see this problem there ...

Do you?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
  2019-01-07  7:17               ` Hannes Reinecke
@ 2019-01-07  7:41                 ` Sitsofe Wheeler
  2019-01-07  8:46                   ` Hannes Reinecke
  0 siblings, 1 reply; 13+ messages in thread
From: Sitsofe Wheeler @ 2019-01-07  7:41 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Laurence Oberman, linux-ide, linux-block

On Mon, 7 Jan 2019 at 07:17, Hannes Reinecke <hare@suse.de> wrote:
>
> On 1/4/19 8:33 AM, Sitsofe Wheeler wrote:
> > Blimey Laurence - you're really pushing the boat out on this one!
> >
> [ .. ]
> > I've yet to attach the disk directly to the mobo. It's a bit fiddly as
> > the most accessible port is meant for the DVD drive and I think it's
> > speed is slower than the others.
> >
> > The speed of the ATA ports is lower than you might expect (this
> > machine is fairly old):
> >
> [ .. ]
> > [    3.242623] ata2.00: FORCE: horkage modified (noncq)
> > [    3.242683] ata2.00: supports DRM functions and may not be fully accessible
> > [    3.242686] ata2.00: ATA-11: Samsung SSD 860 EVO 500GB, RVT01B6Q,
> > max UDMA/133
> > [    3.242689] ata2.00: 976773168 sectors, multi 1: LBA48 NCQ (not used)
> > [    3.245518] ata2.00: supports DRM functions and may not be fully accessible
> > [    3.247611] ata2.00: configured for UDMA/133
>
> 'slower' is an understatement.

Are you surprised that there would be such a dramatic difference in
the speeds between the two SSDs (Samsung 860 EVO, Crucial MX500) on
that particular workload in that same machine?

> That adapter can't do NCQ, hence 'WRITE FPDMA QUEUED' (which _is_ an NCQ
> command) will never be issued.
> So I'd be _very_ surprised if you still see this problem there ...

I'm curious, why would using the libata.force=2.00:noncq kernel
command line (only mentioned in my very first mail) make using that
drive more stable if the adapter could never accept that command
anyway? Shouldn't the sending of that command have been disabled if
anything along the way can't actually accept it?

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
  2019-01-07  7:41                 ` Sitsofe Wheeler
@ 2019-01-07  8:46                   ` Hannes Reinecke
  2019-01-08  7:06                     ` Sitsofe Wheeler
  0 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2019-01-07  8:46 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: Laurence Oberman, linux-ide, linux-block

On 1/7/19 8:41 AM, Sitsofe Wheeler wrote:
> On Mon, 7 Jan 2019 at 07:17, Hannes Reinecke <hare@suse.de> wrote:
>>
>> On 1/4/19 8:33 AM, Sitsofe Wheeler wrote:
>>> Blimey Laurence - you're really pushing the boat out on this one!
>>>
>> [ .. ]
>>> I've yet to attach the disk directly to the mobo. It's a bit fiddly as
>>> the most accessible port is meant for the DVD drive and I think it's
>>> speed is slower than the others.
>>>
>>> The speed of the ATA ports is lower than you might expect (this
>>> machine is fairly old):
>>>
>> [ .. ]
>>> [    3.242623] ata2.00: FORCE: horkage modified (noncq)
>>> [    3.242683] ata2.00: supports DRM functions and may not be fully accessible
>>> [    3.242686] ata2.00: ATA-11: Samsung SSD 860 EVO 500GB, RVT01B6Q,
>>> max UDMA/133
>>> [    3.242689] ata2.00: 976773168 sectors, multi 1: LBA48 NCQ (not used)
>>> [    3.245518] ata2.00: supports DRM functions and may not be fully accessible
>>> [    3.247611] ata2.00: configured for UDMA/133
>>
>> 'slower' is an understatement.
> 
> Are you surprised that there would be such a dramatic difference in
> the speeds between the two SSDs (Samsung 860 EVO, Crucial MX500) on
> that particular workload in that same machine?
> 
Not at all.

>> That adapter can't do NCQ, hence 'WRITE FPDMA QUEUED' (which _is_ an NCQ
>> command) will never be issued.
>> So I'd be _very_ surprised if you still see this problem there ...
> 
> I'm curious, why would using the libata.force=2.00:noncq kernel
> command line (only mentioned in my very first mail) make using that
> drive more stable if the adapter could never accept that command
> anyway? Shouldn't the sending of that command have been disabled if
> anything along the way can't actually accept it?
> 
'WRITE FPDMA QUEUED' will ever be issued if the drive _and_ adapter can 
do NCQ. As this is the offending command it's not surprising that 
switching off NCQ (and hence the use of that command) will make the 
machine more stable.

Although I'd be curious about the 'more' bit in 'more stable'.
I would have thought that the machine would be stable after disabling 
NCQ; do you still see issues after disabling NCQ?

As for the NCQ issues: it might be that the adapter has issues with NCQ 
(quite some older adapters do).
It might also be a problem with the _previous_ command which failed; can 
you enable libata tracing to figure out the command flow?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
  2019-01-07  8:46                   ` Hannes Reinecke
@ 2019-01-08  7:06                     ` Sitsofe Wheeler
  2019-01-09  6:54                       ` Sitsofe Wheeler
  0 siblings, 1 reply; 13+ messages in thread
From: Sitsofe Wheeler @ 2019-01-08  7:06 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Laurence Oberman, linux-ide, linux-block

On Mon, 7 Jan 2019 at 08:46, Hannes Reinecke <hare@suse.de> wrote:
>
> On 1/7/19 8:41 AM, Sitsofe Wheeler wrote:
> > On Mon, 7 Jan 2019 at 07:17, Hannes Reinecke <hare@suse.de> wrote:
> >>
> >> On 1/4/19 8:33 AM, Sitsofe Wheeler wrote:
> >>> Blimey Laurence - you're really pushing the boat out on this one!
> >>>
> >> [ .. ]
> >>> I've yet to attach the disk directly to the mobo. It's a bit fiddly as
> >>> the most accessible port is meant for the DVD drive and I think it's
> >>> speed is slower than the others.
> >>>
> >>> The speed of the ATA ports is lower than you might expect (this
> >>> machine is fairly old):
> >>>
> >> [ .. ]
> >>> [    3.242623] ata2.00: FORCE: horkage modified (noncq)
> >>> [    3.242683] ata2.00: supports DRM functions and may not be fully accessible
> >>> [    3.242686] ata2.00: ATA-11: Samsung SSD 860 EVO 500GB, RVT01B6Q,
> >>> max UDMA/133
> >>> [    3.242689] ata2.00: 976773168 sectors, multi 1: LBA48 NCQ (not used)
> >>> [    3.245518] ata2.00: supports DRM functions and may not be fully accessible
> >>> [    3.247611] ata2.00: configured for UDMA/133
> >>
> >> 'slower' is an understatement.
> >
> > Are you surprised that there would be such a dramatic difference in
> > the speeds between the two SSDs (Samsung 860 EVO, Crucial MX500) on
> > that particular workload in that same machine?
> >
> Not at all.

Fair enough :-) My understanding is that both SSDs (when unloaded and
mostly empty etc) would be far faster than what this particular
machine could do but I stand corrected.

> >> That adapter can't do NCQ, hence 'WRITE FPDMA QUEUED' (which _is_ an NCQ
> >> command) will never be issued.
> >> So I'd be _very_ surprised if you still see this problem there ...
> >
> > I'm curious, why would using the libata.force=2.00:noncq kernel
> > command line (only mentioned in my very first mail) make using that
> > drive more stable if the adapter could never accept that command
> > anyway? Shouldn't the sending of that command have been disabled if
> > anything along the way can't actually accept it?
> >
> 'WRITE FPDMA QUEUED' will ever be issued if the drive _and_ adapter can
> do NCQ. As this is the offending command it's not surprising that
> switching off NCQ (and hence the use of that command) will make the
> machine more stable.
>
> Although I'd be curious about the 'more' bit in 'more stable'.
> I would have thought that the machine would be stable after disabling
> NCQ; do you still see issues after disabling NCQ?

I was inaccurate when I said "more": it has been totally stable since
disabling NCQ on that port.

> As for the NCQ issues: it might be that the adapter has issues with NCQ
> (quite some older adapters do).
> It might also be a problem with the _previous_ command which failed; can
> you enable libata tracing to figure out the command flow?

OK I'll see if I can get around to this one tomorrow.

Cheers!

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
  2019-01-08  7:06                     ` Sitsofe Wheeler
@ 2019-01-09  6:54                       ` Sitsofe Wheeler
  0 siblings, 0 replies; 13+ messages in thread
From: Sitsofe Wheeler @ 2019-01-09  6:54 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Laurence Oberman, linux-ide, linux-block

On Tue, 8 Jan 2019 at 07:06, Sitsofe Wheeler <sitsofe@gmail.com> wrote:
>
> > As for the NCQ issues: it might be that the adapter has issues with NCQ
> > (quite some older adapters do).
> > It might also be a problem with the _previous_ command which failed; can
> > you enable libata tracing to figure out the command flow?
>
> OK I'll see if I can get around to this one tomorrow.

So I wound up attempting this from an Ubuntu 18.04.1 live cd (because
I was trying to minimize background disk usage). I deactivated
LVM/VG/mdadm devices before running the test (lvchange -a n ...;
vgchange -a n ...; mdadm --stop ...).

Kernel:
Linux ubuntu-server 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17
15:39:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Tracing setup (I don't know if this is correct - googling around for
"enable libata tracing" only turned up
https://github.com/torvalds/linux/blob/master/drivers/ata/libata-trace.c
which doesn't contain an example):
echo "libata*" > /sys/kernel/debug/tracing/set_ftrace_filter
echo 1 > /sys/kernel/debug/tracing/events/libata/enable

Clear dmesg and any tracing:
dmesg -C
echo "" > /sys/kernel/debug/tracing/trace

<Some fiddling around with apt/apt config to fetch down a copy of fio>

fio command used:
fio --readonly --name --rw=randread --filename /dev/sdb --bs=32k
--ioengine=libaio --iodepth=16 --direct=1 --runtime=20s --time_based
--max_latency=100ms
(this is so that fio quits when the latency becomes high due to the bus resets)

After fio exits:
dmesg
less  /sys/kernel/debug/tracing/trace

dmesg looks like this:
[ 2427.105459] ata2.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6
[ 2427.105465] ata2.00: irq_stat 0x40000008
[ 2427.105472] ata2.00: failed command: READ FPDMA QUEUED
[ 2427.105486] ata2.00: cmd 60/40:68:40:c1:00/00:00:00:00:00/40 tag 13
ncq dma 32768 in
                        res 41/84:40:40:c1:00/00:00:00:00:00/00 Emask
0x410 (ATA bus error) <F>
[ 2427.105490] ata2.00: status: { DRDY ERR }
[ 2427.105493] ata2.00: error: { ICRC ABRT }
[ 2427.105503] ata2: hard resetting link
[ 2427.576563] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2427.576992] ata2.00: supports DRM functions and may not be fully accessible
[ 2427.581186] ata2.00: supports DRM functions and may not be fully accessible
[ 2427.584783] ata2.00: configured for UDMA/33
[ 2427.584857] ata2: EH complete
[ 2427.585742] ata2.00: Enabling discard_zeroes_data

The start of the trace file looks like this:
# tracer: function
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
             fio-4212  [001] d...  2426.910197: ata_qc_issue:
ata_port=2 ata_dev=0 tag=15 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:78:00:00:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910267: ata_qc_issue:
ata_port=2 ata_dev=0 tag=16 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:80:40:00:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910320: ata_qc_issue:
ata_port=2 ata_dev=0 tag=17 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:88:80:00:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910378: ata_qc_issue:
ata_port=2 ata_dev=0 tag=18 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:90:c0:00:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910434: ata_qc_issue:
ata_port=2 ata_dev=0 tag=19 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:98:00:01:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910489: ata_qc_issue:
ata_port=2 ata_dev=0 tag=20 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:a0:40:01:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910546: ata_qc_issue:
ata_port=2 ata_dev=0 tag=21 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:a8:80:01:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910599: ata_qc_issue:
ata_port=2 ata_dev=0 tag=22 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:b0:c0:01:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910659: ata_qc_issue:
ata_port=2 ata_dev=0 tag=23 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:b8:00:02:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910711: ata_qc_issue:
ata_port=2 ata_dev=0 tag=24 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:c0:40:02:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910766: ata_qc_issue:
ata_port=2 ata_dev=0 tag=25 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:c8:80:02:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910818: ata_qc_issue:
ata_port=2 ata_dev=0 tag=26 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:d0:c0:02:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910879: ata_qc_issue:
ata_port=2 ata_dev=0 tag=27 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:d8:00:03:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910934: ata_qc_issue:
ata_port=2 ata_dev=0 tag=28 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:e0:40:03:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.910985: ata_qc_issue:
ata_port=2 ata_dev=0 tag=29 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:e8:80:03:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911041: ata_qc_issue:
ata_port=2 ata_dev=0 tag=30 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:f0:c0:03:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911097: ata_qc_issue:
ata_port=2 ata_dev=0 tag=0 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:00:00:04:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911152: ata_qc_issue:
ata_port=2 ata_dev=0 tag=1 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:08:40:04:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911203: ata_qc_issue:
ata_port=2 ata_dev=0 tag=2 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:10:80:04:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911263: ata_qc_issue:
ata_port=2 ata_dev=0 tag=3 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:18:c0:04:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911318: ata_qc_issue:
ata_port=2 ata_dev=0 tag=4 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:20:00:05:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911371: ata_qc_issue:
ata_port=2 ata_dev=0 tag=5 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:28:40:05:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911427: ata_qc_issue:
ata_port=2 ata_dev=0 tag=6 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:30:80:05:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911481: ata_qc_issue:
ata_port=2 ata_dev=0 tag=7 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:38:c0:05:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911538: ata_qc_issue:
ata_port=2 ata_dev=0 tag=8 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:40:00:06:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911592: ata_qc_issue:
ata_port=2 ata_dev=0 tag=9 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:48:40:06:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911649: ata_qc_issue:
ata_port=2 ata_dev=0 tag=10 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:50:80:06:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911710: ata_qc_issue:
ata_port=2 ata_dev=0 tag=11 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:58:c0:06:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911765: ata_qc_issue:
ata_port=2 ata_dev=0 tag=12 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:60:00:07:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911820: ata_qc_issue:
ata_port=2 ata_dev=0 tag=13 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:68:40:07:00/00:00:00:00:00/40)
             fio-4212  [001] d...  2426.911874: ata_qc_issue:
ata_port=2 ata_dev=0 tag=14 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:70:80:07:00/00:00:00:00:00/40)
          <idle>-0     [000] d.h.  2426.912187: ata_qc_complete_done:
ata_port=2 ata_dev=0 tag=16 flags=75048c{ IO RETRY FAILED EH_SCHEDULED
} status={ DRDY }  res=(40/00:00:00:00:00/00:00:00:00:00/00)
          <idle>-0     [000] dNs.  2426.912314: ata_qc_issue:
ata_port=2 ata_dev=0 tag=16 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:80:c0:07:00/00:00:00:00:00/40)
             fio-4212  [000] d.h.  2426.912396: ata_qc_complete_done:
ata_port=2 ata_dev=0 tag=17 flags=7991cc{ IO QUIET RETRY FAILED }
status={ DRDY }  res=(40/00:00:00:00:00/00:00:00:00:00/00)
             fio-4212  [000] d...  2426.912510: ata_qc_issue:
ata_port=2 ata_dev=0 tag=17 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:88:00:08:00/00:00:00:00:00/40)
          <idle>-0     [000] d.h.  2426.912633: ata_qc_complete_done:
ata_port=2 ata_dev=0 tag=18 flags=7b3f2c{ IO CLEAR_EXCL FAILED
SENSE_VALID } status={ DRDY }
res=(40/00:00:00:00:00/00:00:00:00:00/00)
          <idle>-0     [000] dNs.  2426.912666: ata_qc_issue:
ata_port=2 ata_dev=0 tag=18 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:90:40:08:00/00:00:00:00:00/40)
          <idle>-0     [000] d.h.  2426.912879: ata_qc_complete_done:
ata_port=2 ata_dev=0 tag=15 flags=77974c{ IO QUIET FAILED SENSE_VALID
EH_SCHEDULED } status={ DRDY }
res=(40/00:00:00:00:00/00:00:00:00:00/00)
          <idle>-0     [000] dNs.  2426.912939: ata_qc_issue:
ata_port=2 ata_dev=0 tag=15 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:78:80:08:00/00:00:00:00:00/40)
             fio-4212  [000] d.h.  2426.913109: ata_qc_complete_done:
ata_port=2 ata_dev=0 tag=19 flags=7862cc{ IO QUIET RETRY } status={
DRDY }  res=(40/00:00:00:00:00/00:00:00:00:00/00)
             fio-4212  [000] d.s.  2426.913125: ata_qc_issue:
ata_port=2 ata_dev=0 tag=19 proto=ATA_PROT_NCQ cmd=ATA_CMD_FPDMA_READ
tf=(60/40:98:c0:08:00/00:00:00:00:00/40)

(Entirety of the trace file is available from
https://sucs.org/~sits/test/20190109-libata-ftrace.txt )

Just to check - was the above how you are supposed to enable libata tracing?

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-01-09  6:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-02 15:25 failed command: WRITE FPDMA QUEUED with Samsung 860 EVO Sitsofe Wheeler
2019-01-02 15:29 ` Sitsofe Wheeler
2019-01-02 16:10   ` Laurence Oberman
2019-01-03 18:28     ` Laurence Oberman
2019-01-03 20:47       ` Laurence Oberman
2019-01-03 22:24         ` Sitsofe Wheeler
2019-01-03 22:40           ` Laurence Oberman
2019-01-04  7:33             ` Sitsofe Wheeler
2019-01-07  7:17               ` Hannes Reinecke
2019-01-07  7:41                 ` Sitsofe Wheeler
2019-01-07  8:46                   ` Hannes Reinecke
2019-01-08  7:06                     ` Sitsofe Wheeler
2019-01-09  6:54                       ` Sitsofe Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).