All of lore.kernel.org
 help / color / mirror / Atom feed
* Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
@ 2020-03-02  1:50 David C. Rankin
  2020-03-02  5:25 ` Roman Mamedov
  0 siblings, 1 reply; 14+ messages in thread
From: David C. Rankin @ 2020-03-02  1:50 UTC (permalink / raw)
  To: mdraid

Mayday....

OS:     Archlinux
Kernel: 5.5
mdadm:  4.1-2
RAID 1: 2-disk 3T on devices sdc, sdd (not on partitions, on device)
        (see mdadm -E on both disks and mdadm -D on array below message)
array:  /dev/md4


  After update to Linux 5.5 kernel, a 2-disk 3T Raid1 on devices /sdc and /sdd
I/O on the array has dropped from ~speed=85166K/sec during scrub to
~speed=2022K/sec with speed as low as speed=737K/sec. There are no errors.
This array takes exacty 5 hours 10 minutes to scrub normally and has for the
past 4 years. The scrub has now been running for over 14 hours (without error)
and is only 2.8% complete, e.g.

cat /proc/mdstat
Personalities : [raid1]
md4 : active raid1 sdc[0] sdd[2]
      2930135488 blocks super 1.2 [2/2] [UU]
      [>....................]  check =  2.8% (82114752/2930135488)
finish=38635.1min speed=1228K/sec
      bitmap: 0/22 pages [0KB], 65536KB chunk
<snip>

  The last 3 months of logging the scrub shows the scrub completing in 5:10
every month (the timestamp is the completion time for the scrub, just subtract
the difference between /dev/md4 and /dev/md2 for the scrub time on /dev/md4):

Dec  1 03:01:02 '/dev/md0' mismatch_cnt = 0
Dec  1 03:10:02 '/dev/md1' mismatch_cnt = 0
Dec  1 07:10:03 '/dev/md2' mismatch_cnt = 0
Dec  1 12:20:03 '/dev/md4' mismatch_cnt = 0
Jan  1 03:01:01 '/dev/md0' mismatch_cnt = 0
Jan  1 03:07:01 '/dev/md1' mismatch_cnt = 0
Jan  1 05:04:02 '/dev/md2' mismatch_cnt = 0
Jan  1 10:14:03 '/dev/md4' mismatch_cnt = 0
Feb  1 03:01:01 '/dev/md0' mismatch_cnt = 0
Feb  1 03:07:01 '/dev/md1' mismatch_cnt = 0
Feb  1 05:01:02 '/dev/md2' mismatch_cnt = 0
Feb  1 10:11:02 '/dev/md4' mismatch_cnt = 0

  After the 5.5 kernel update I have noticed apps such as the virtualbox
guests on the drive becoming unusably slow and thought initiially it was a
problem with Oracle virtualbox and the 5.5 kernel. The iowate is over 99% at
times running top on the Archlinux guest at times, and iostat on the guest shows:

Linux 5.5.5-arch1-1 (vl1)       02/24/2020      _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.14    0.00    0.18   54.55    0.00   45.14

(I have a screenshot of the top wa over 98% if you need that too)

  Originally, I opened a bug with Oracle
https://www.virtualbox.org/ticket/19311, but that has left them scratching
thier head, and it wasn't until my scrub kicked of and I saw that it would
take a month to complete that I snapped to the fact it was a kernel Raid issue.

  I check the scrub regularly and have it log completion of each array.
Checking right at the end of /dev/md2 (the array before before this one starts
scrubbing), all is normal, speed is fine and it completed in normal time:

05:00 valkyrie:~/tmp> cat /proc/mdstat
Personalities : [raid1]
md4 : active raid1 sdc[0] sdd[2]
      2930135488 blocks super 1.2 [2/2] [UU]
      bitmap: 0/22 pages [0KB], 65536KB chunk

md2 : active raid1 sda7[0] sdb7[1]
      921030656 blocks super 1.2 [2/2] [UU]
      [===================>.]  check = 99.8% (919643456/921030656)
finish=0.2min speed=85166K/sec
      bitmap: 2/7 pages [8KB], 65536KB chunk

md1 : active raid1 sda6[0] sdb6[1]
      52396032 blocks super 1.2 [2/2] [UU]

md3 : active raid1 sda8[0] sdb8[1]
      2115584 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda5[0] sdb5[1]
      511680 blocks super 1.2 [2/2] [UU]

unused devices: <none>

  However, checking at the beginning of /dev/md4, speed plunged to
speed=2022K/sec (What??)

05:00 valkyrie:~/tmp> cat /proc/mdstat
Personalities : [raid1]
md4 : active raid1 sdc[0] sdd[2]
      2930135488 blocks super 1.2 [2/2] [UU]
      [>....................]  check =  0.0% (155712/2930135488)
finish=24141.2min speed=2022K/sec
      bitmap: 0/22 pages [0KB], 65536KB chunk

md2 : active raid1 sda7[0] sdb7[1]
      921030656 blocks super 1.2 [2/2] [UU]
      bitmap: 0/7 pages [0KB], 65536KB chunk

md1 : active raid1 sda6[0] sdb6[1]
      52396032 blocks super 1.2 [2/2] [UU]

md3 : active raid1 sda8[0] sdb8[1]
      2115584 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda5[0] sdb5[1]
      511680 blocks super 1.2 [2/2] [UU]

unused devices: <none>

  The perplexing problem is I have rolled the Archlinux install back to the
5.4 kenel before this problem originally appeared, but for reasons I cannot
explain, the array remains unusably slow. (I don't know if something was
written that changes the array for Linux 5.5 or what, but there is no question
it was like a switch was thrown on the 5.5 kernel update that crippled this
array, but left the other 3 arrays that are on partitions instead of devices
fine).

  There are no errors logged to the journal, but it is like I/O to this array
is coming through a Dixie Straw and most of the time it is like there is a
race-condition somewhere causing the thing to just sit and spin.

  Here are the mdadm -E and mdadm -D details:

[14:17 valkyrie:/home/david/tmp] # mdadm -E /dev/sdc
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 6e520607:f152d8b9:dd2a3bec:5f9dc875
           Name : valkyrie:4  (local to host valkyrie)
  Creation Time : Mon Mar 21 02:27:21 2016
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB)
     Array Size : 2930135488 (2794.39 GiB 3000.46 GB)
  Used Dev Size : 5860270976 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=48 sectors
          State : clean
    Device UUID : e15f0ea7:7e973d0c:f7ae51a1:9ee4b3a4

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Mar  1 14:18:07 2020
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 62472be - correct
         Events : 8193


   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
[14:18 valkyrie:/home/david/tmp] # mdadm -E /dev/sdd
/dev/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 6e520607:f152d8b9:dd2a3bec:5f9dc875
           Name : valkyrie:4  (local to host valkyrie)
  Creation Time : Mon Mar 21 02:27:21 2016
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB)
     Array Size : 2930135488 (2794.39 GiB 3000.46 GB)
  Used Dev Size : 5860270976 (2794.39 GiB 3000.46 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=48 sectors
          State : clean
    Device UUID : f745d11a:c323f477:71f8a0d9:27d8c717

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Mar  1 14:18:15 2020
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 9101220e - correct
         Events : 8194


   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
[14:18 valkyrie:/home/david/tmp] # mdadm -D /dev/md4
/dev/md4:
           Version : 1.2
     Creation Time : Mon Mar 21 02:27:21 2016
        Raid Level : raid1
        Array Size : 2930135488 (2794.39 GiB 3000.46 GB)
     Used Dev Size : 2930135488 (2794.39 GiB 3000.46 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Sun Mar  1 14:18:32 2020
             State : clean, checking
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

      Check Status : 1% complete

              Name : valkyrie:4  (local to host valkyrie)
              UUID : 6e520607:f152d8b9:dd2a3bec:5f9dc875
            Events : 8194

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       2       8       48        1      active sync   /dev/sdd

  A current mdstat (the scrub began at 05:00):

19:39 valkyrie:/home/david/tmp] # cat /proc/mdstat
Personalities : [raid1]
md4 : active raid1 sdc[0] sdd[2]
      2930135488 blocks super 1.2 [2/2] [UU]
      [>....................]  check =  2.8% (84842176/2930135488)
finish=28990.7min speed=1635K/sec
      bitmap: 0/22 pages [0KB], 65536KB chunk

md2 : active raid1 sda7[0] sdb7[1]
      921030656 blocks super 1.2 [2/2] [UU]
      bitmap: 0/7 pages [0KB], 65536KB chunk

md1 : active raid1 sda6[0] sdb6[1]
      52396032 blocks super 1.2 [2/2] [UU]

md3 : active raid1 sda8[0] sdb8[1]
      2115584 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda5[0] sdb5[1]
      511680 blocks super 1.2 [2/2] [UU]

unused devices: <none>

  Here is the complete scrub log for the past year.

Feb  1 03:07:02 '/dev/md1' mismatch_cnt = 0
Feb  1 05:02:02 '/dev/md2' mismatch_cnt = 0
Feb  1 10:12:03 '/dev/md4' mismatch_cnt = 0
Mar  1 03:01:02 '/dev/md0' mismatch_cnt = 0
Mar  1 03:07:02 '/dev/md1' mismatch_cnt = 0
Mar  1 05:05:02 '/dev/md2' mismatch_cnt = 0
Mar  1 10:15:03 '/dev/md4' mismatch_cnt = 0
Apr  1 03:01:02 '/dev/md0' mismatch_cnt = 0
Apr  1 03:07:02 '/dev/md1' mismatch_cnt = 0
Apr  1 05:03:02 '/dev/md2' mismatch_cnt = 0
Apr  1 10:13:03 '/dev/md4' mismatch_cnt = 0
May  1 03:01:01 '/dev/md0' mismatch_cnt = 0
May  1 03:07:01 '/dev/md1' mismatch_cnt = 0
May  1 05:06:02 '/dev/md2' mismatch_cnt = 0
May  1 10:16:02 '/dev/md4' mismatch_cnt = 0
Jun  1 03:01:01 '/dev/md0' mismatch_cnt = 0
Jun  1 03:07:01 '/dev/md1' mismatch_cnt = 0
Jun  1 05:02:02 '/dev/md2' mismatch_cnt = 0
Jun  1 10:12:02 '/dev/md4' mismatch_cnt = 0
Jul  1 03:01:01 '/dev/md0' mismatch_cnt = 0
Jul  1 03:07:01 '/dev/md1' mismatch_cnt = 0
Jul  1 05:01:02 '/dev/md2' mismatch_cnt = 0
Jul  1 10:11:02 '/dev/md4' mismatch_cnt = 0
Aug  1 03:01:01 '/dev/md0' mismatch_cnt = 0
Aug  1 03:07:01 '/dev/md1' mismatch_cnt = 0
Aug  1 05:01:02 '/dev/md2' mismatch_cnt = 0
Sep  1 03:01:01 '/dev/md0' mismatch_cnt = 0
Sep  1 03:07:01 '/dev/md1' mismatch_cnt = 0
Sep  1 05:01:02 '/dev/md2' mismatch_cnt = 0
Sep  1 10:11:02 '/dev/md4' error: mismatch_cnt = 256
Oct  1 03:01:01 '/dev/md0' mismatch_cnt = 0
Oct  1 03:06:01 '/dev/md1' mismatch_cnt = 0
Oct  1 05:00:02 '/dev/md2' mismatch_cnt = 0
Oct  1 10:10:02 '/dev/md4' error: mismatch_cnt = 128
Nov  1 03:01:01 '/dev/md0' mismatch_cnt = 0
Nov  1 03:06:01 '/dev/md1' mismatch_cnt = 0
Nov  1 05:00:02 '/dev/md2' mismatch_cnt = 0
Nov  1 10:10:02 '/dev/md4' error: mismatch_cnt = 3584
Dec  1 03:01:02 '/dev/md0' mismatch_cnt = 0
Dec  1 03:10:02 '/dev/md1' mismatch_cnt = 0
Dec  1 07:10:03 '/dev/md2' mismatch_cnt = 0
Dec  1 12:20:03 '/dev/md4' mismatch_cnt = 0
Jan  1 03:01:01 '/dev/md0' mismatch_cnt = 0
Jan  1 03:07:01 '/dev/md1' mismatch_cnt = 0
Jan  1 05:04:02 '/dev/md2' mismatch_cnt = 0
Jan  1 10:14:03 '/dev/md4' mismatch_cnt = 0
Feb  1 03:01:01 '/dev/md0' mismatch_cnt = 0
Feb  1 03:07:01 '/dev/md1' mismatch_cnt = 0
Feb  1 05:01:02 '/dev/md2' mismatch_cnt = 0
Feb  1 10:11:02 '/dev/md4' mismatch_cnt = 0

  I need help, I don't know what else to check or what else to send you? I've
tried to think of the most relevant informatation I can provide. I so have
straces between the virtualbox host and guest on that machne if that would
help. There is nothing in the journal to send of any disk error, etc... It's
just like the 5.5 kernel doesn't handle Raid1 on a device (instead of
partition) the same way did before 5.5 that is brining I/O to it's knees.

  Let me know if there is anything else I can send, and let me know if I
should stop the scrub or just let it run. I'm happy to run any diagnostic you
can think of that might help. Thanks.


-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
  2020-03-02  1:50 Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O David C. Rankin
@ 2020-03-02  5:25 ` Roman Mamedov
  2020-03-02  6:38   ` David C. Rankin
  0 siblings, 1 reply; 14+ messages in thread
From: Roman Mamedov @ 2020-03-02  5:25 UTC (permalink / raw)
  To: David C. Rankin; +Cc: mdraid

On Sun, 1 Mar 2020 19:50:03 -0600
"David C. Rankin" <drankinatty@suddenlinkmail.com> wrote:

>   Let me know if there is anything else I can send, and let me know if I
> should stop the scrub or just let it run. I'm happy to run any diagnostic you
> can think of that might help. Thanks.

It doesn't seem convincing that the issue is raw devices vs partitions, or
even kernel version related, especially since you rolled it back and the issue
remains.

What else you could send is "smartctl -a" of all devices;

and most importantly, while the "slow" scrub is running on md4, start:

  iostat -x 2 /dev/sdc /dev/sdd

(enlarge the terminal window) and see if any of the 2 devices is pegged into
100.0 in the last "%util" column, or just showing much higher values there
than the other one.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
  2020-03-02  5:25 ` Roman Mamedov
@ 2020-03-02  6:38   ` David C. Rankin
  2020-03-02  6:46     ` David C. Rankin
  2020-03-02  6:51     ` Roman Mamedov
  0 siblings, 2 replies; 14+ messages in thread
From: David C. Rankin @ 2020-03-02  6:38 UTC (permalink / raw)
  To: mdraid

[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]

On 03/01/2020 11:25 PM, Roman Mamedov wrote:
> On Sun, 1 Mar 2020 19:50:03 -0600
> "David C. Rankin" <drankinatty@suddenlinkmail.com> wrote:
> 
>>   Let me know if there is anything else I can send, and let me know if I
>> should stop the scrub or just let it run. I'm happy to run any diagnostic you
>> can think of that might help. Thanks.
> 
> It doesn't seem convincing that the issue is raw devices vs partitions, or
> even kernel version related, especially since you rolled it back and the issue
> remains.
> 
> What else you could send is "smartctl -a" of all devices;
> 
> and most importantly, while the "slow" scrub is running on md4, start:
> 
>   iostat -x 2 /dev/sdc /dev/sdd
> 
> (enlarge the terminal window) and see if any of the 2 devices is pegged into
> 100.0 in the last "%util" column, or just showing much higher values there
> than the other one.
> 

Thank you Roman, iostat and smartctl -a for sdc/sdd attached,

  sdc has a few errors from a power hit taken 3000 hours ago or so, but since
that time it has been fine. I had rolled back to several earlier kernels from
Jan 14, Jan 21, and Jan 27 with no change, I then updated to current which is
Archlinux 5.5.6-arch1-1.

  I'm not sure what to make of the iostat output, but the r_await looks
suspicious. Could this all be due to one flaky disk without it throwing any
errors?

-- 
David C. Rankin, J.D.,P.E.

[-- Attachment #2: iostat-x2_sdc_sdd.txt --]
[-- Type: text/plain, Size: 32395 bytes --]

# iostat -x 2 /dev/sdc /dev/sdd
Linux 5.5.6-arch1-1 (valkyrie)  03/02/2020      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.02    0.01    0.36    0.28    0.00   99.34

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc              4.17    323.18     1.01  19.52  918.56    77.56    0.16      5.88     0.01   6.11  472.99    35.69    0.00      0.00     0.00   0.00    0.00     0.00    0.17  333.64    3.90   0.33
sdd              4.16    323.93     1.00  19.33    3.99    77.81    0.16      5.88     0.01   6.06   25.57    35.66    0.00      0.00     0.00   0.00    0.00     0.00    0.17   20.40    0.02   0.29


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.19    0.00    0.00   99.81

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc              1.50    992.00     0.00   0.00 1722.00   661.33    1.00      2.25     0.00   0.00   62.00     2.25    0.00      0.00     0.00   0.00    0.00     0.00    2.00   30.75    2.64   1.15
sdd             15.50    992.00     0.00   0.00    0.45    64.00    1.00      2.25     0.00   0.00   19.00     2.25    0.00      0.00     0.00   0.00    0.00     0.00    2.00    9.25    0.02   1.20


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.12    0.00    0.00   99.88

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc             10.50    672.00     0.00   0.00 1179.43    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   12.37   1.00
sdd             10.50    672.00     0.00   0.00    0.29    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.80


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.12    0.00    0.00   99.88

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc             15.50    992.00     0.00   0.00 2609.16    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   40.41   1.35
sdd              8.50    544.00     0.00   0.00    0.29    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   1.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.06    0.00    0.00   99.94

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc              9.50   1248.00    14.00  59.57 1875.53   131.37    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   17.80   1.15
sdd              1.50    992.00    14.00  90.32   11.33   661.33    0.50      2.00     0.00   0.00   24.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    1.00   12.00    0.03   0.85


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.25    0.00    0.00   99.75

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc              8.50    800.00     0.00   0.00  485.18    94.12    1.00      2.25     0.00   0.00  704.50     2.25    0.00      0.00     0.00   0.00    0.00     0.00    2.00  352.25    4.82   1.35
sdd             23.50   1504.00     0.00   0.00    0.36    64.00    0.50      0.25     0.00   0.00   40.00     0.50    0.00      0.00     0.00   0.00    0.00     0.00    1.00   20.00    0.02   1.35


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc             11.50    736.00     0.00   0.00 1675.09    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   19.25   1.00
sdd             11.00    704.00     0.00   0.00    0.32    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   1.15


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.12    0.00    0.00   99.88

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc              3.50    224.00     0.00   0.00 3240.00    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   11.33   0.50
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc             11.50    736.00    14.00  54.90 4613.87    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   53.04   1.00
sdd              1.50    992.00    14.00  90.32   11.00   661.33    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.01   0.65


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc              1.50    992.00     0.00   0.00 1215.67   661.33    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.82   0.50
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.50      2.00     0.00   0.00   20.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    1.00    9.50    0.01   0.35


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.13    0.00    0.00   98.87

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc            202.50  13920.00    15.00   6.90   52.18    68.74    1.00      2.25     0.00   0.00  890.00     2.25    0.00      0.00     0.00   0.00    0.00     0.00    2.00  445.00   11.17   8.35
sdd            218.00  14912.00    15.00   6.44    7.05    68.40    0.50      0.25     0.00   0.00   40.00     0.50    0.00      0.00     0.00   0.00    0.00     0.00    1.00   20.00    1.15   8.35


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc              4.50    288.00     0.00   0.00 1930.33    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    8.68   0.65
sdd              3.00    192.00     0.00   0.00    0.33    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.50


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.06    0.00    0.00   99.94

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc             11.50    736.00     0.00   0.00 2999.26    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   34.48   1.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00    0.00    0.00  100.00

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc              4.00   1152.00    14.00  77.78 3092.12   288.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   12.36   0.85
sdd              1.50    992.00    14.00  90.32    3.33   661.33    0.50      2.00     0.00   0.00   20.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    1.00    9.50    0.01   0.65


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.31    0.00    0.00   99.69

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc              5.50    352.00     0.00   0.00 1011.09    64.00    1.00      2.25     0.00   0.00  265.00     2.25    0.00      0.00     0.00   0.00    0.00     0.00    2.00  132.25    5.82   1.15
sdd             21.00   1344.00     0.00   0.00    0.36    64.00    0.50      0.25     0.00   0.00   31.00     0.50    0.00      0.00     0.00   0.00    0.00     0.00    1.00   16.00    0.01   1.15


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.06    0.00    0.00   99.94

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc             15.50    992.00     0.00   0.00 2365.74    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   36.64   1.00
sdd             15.50    992.00     0.00   0.00    0.29    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   1.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.00    0.12    0.00    0.00   99.81

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc              4.50    288.00     0.00   0.00 2292.33    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   10.31   0.35
sdd              1.50     96.00     0.00   0.00    0.33    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.35


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.06    0.00    0.00   99.94

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc             11.50    736.00     0.00   0.00 3806.57    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   43.76   0.65
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.00    0.94    0.00    0.00   99.00

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc            145.50  10208.00    14.00   8.78   52.79    70.16    1.00      2.25     0.00   0.00   17.50     2.25    0.00      0.00     0.00   0.00    0.00     0.00    2.00    8.75    7.43   6.15
sdd            160.00  11136.00    14.00   8.05    0.39    69.60    1.00      2.25     0.00   0.00   32.00     2.25    0.00      0.00     0.00   0.00    0.00     0.00    2.00   16.00    0.05   6.35


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.12    0.00    0.00   99.88

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc             12.50    800.00     0.00   0.00 2143.32    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   26.76   1.35
sdd             12.50    800.00     0.00   0.00    0.28    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   1.30


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.06    0.00    0.06    0.00    0.00   99.88

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc              3.00    192.00     0.00   0.00 3173.00    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    9.51   0.50
sdd              2.50    160.00     0.00   0.00    0.40    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.70


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.06    0.00    0.00   99.94

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdc              3.50    224.00     0.00   0.00 4033.29    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   14.11   0.35
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


[00:27 valkyrie:/home/david/tmp] # smartctl -a /dev/sdc
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.5.6-arch1-1] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1ER166
Serial Number:    Z50264LN
LU WWN Device Id: 5 000c50 087801e14
Firmware Version: CC26
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Mon Mar  2 00:27:28 2020 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (   80) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 316) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   087   006    Pre-fail  Always       -       145749056
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       32
  5 Reallocated_Sector_Ct   0x0033   089   089   010    Pre-fail  Always       -       13648
  7 Seek_Error_Rate         0x000f   080   060   030    Pre-fail  Always       -       103164271
  9 Power_On_Hours          0x0032   062   062   000    Old_age   Always       -       34041
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       32
183 Runtime_Bad_Block       0x0032   098   098   000    Old_age   Always       -       2
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   082   082   000    Old_age   Always       -       18
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       5 6 6
189 High_Fly_Writes         0x003a   001   001   000    Old_age   Always       -       116
190 Airflow_Temperature_Cel 0x0022   067   063   045    Old_age   Always       -       33 (Min/Max 20/34)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   090   090   000    Old_age   Always       -       21932
194 Temperature_Celsius     0x0022   033   040   000    Old_age   Always       -       33 (0 18 0 0 0)
197 Current_Pending_Sector  0x0012   085   085   000    Old_age   Always       -       2544
198 Offline_Uncorrectable   0x0010   085   085   000    Old_age   Offline      -       2544
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       5185h+21m+36.806s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2947035570
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       457517019401

SMART Error Log Version: 1
ATA Error Count: 18 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 18 occurred at disk power-on lifetime: 31122 hours (1296 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00  40d+19:06:08.272  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  40d+19:06:08.272  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  40d+19:06:08.272  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00  40d+19:06:08.271  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  40d+19:06:08.271  IDENTIFY DEVICE

Error 17 occurred at disk power-on lifetime: 31122 hours (1296 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: WP at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 80 ff ff ff 4f 00  40d+19:05:38.855  WRITE FPDMA QUEUED
  61 00 80 ff ff ff 4f 00  40d+19:05:38.851  WRITE FPDMA QUEUED
  61 00 80 ff ff ff 4f 00  40d+19:05:38.851  WRITE FPDMA QUEUED
  61 00 80 ff ff ff 4f 00  40d+19:05:38.851  WRITE FPDMA QUEUED
  61 00 80 ff ff ff 4f 00  40d+19:05:38.851  WRITE FPDMA QUEUED

Error 16 occurred at disk power-on lifetime: 31122 hours (1296 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: WP at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 80 ff ff ff 4f 00  40d+19:05:35.175  WRITE FPDMA QUEUED
  61 00 80 ff ff ff 4f 00  40d+19:05:35.175  WRITE FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  40d+19:05:35.175  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  40d+19:05:35.175  READ FPDMA QUEUED
  61 00 80 ff ff ff 4f 00  40d+19:05:35.175  WRITE FPDMA QUEUED

Error 15 occurred at disk power-on lifetime: 31122 hours (1296 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: WP at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 80 ff ff ff 4f 00  40d+19:05:31.509  WRITE FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  40d+19:05:31.509  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  40d+19:05:31.509  READ FPDMA QUEUED
  61 00 80 ff ff ff 4f 00  40d+19:05:31.509  WRITE FPDMA QUEUED
  60 00 80 ff ff ff 4f 00  40d+19:05:31.509  READ FPDMA QUEUED

Error 14 occurred at disk power-on lifetime: 31122 hours (1296 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00  40d+19:05:27.858  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  40d+19:05:27.857  READ FPDMA QUEUED
  61 00 80 ff ff ff 4f 00  40d+19:05:27.857  WRITE FPDMA QUEUED
  60 00 80 ff ff ff 4f 00  40d+19:05:27.857  READ FPDMA QUEUED
  60 00 80 ff ff ff 4f 00  40d+19:05:27.857  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%        12         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[00:27 valkyrie:/home/david/tmp] # smartctl -a /dev/sdd
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.5.6-arch1-1] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1ER166
Serial Number:    Z5025WPD
LU WWN Device Id: 5 000c50 08780900c
Firmware Version: CC26
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Mar  2 00:29:03 2020 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (   80) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 318) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   115   099   006    Pre-fail  Always       -       85209376
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       32
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   067   057   030    Pre-fail  Always       -       8601381709
  9 Power_On_Hours          0x0032   062   062   000    Old_age   Always       -       34042
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       32
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   098   098   000    Old_age   Always       -       2
190 Airflow_Temperature_Cel 0x0022   069   064   045    Old_age   Always       -       31 (Min/Max 20/33)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   092   092   000    Old_age   Always       -       17156
194 Temperature_Celsius     0x0022   031   040   000    Old_age   Always       -       31 (0 20 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       1849h+54m+59.374s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       14667824592
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       418504152720

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%        12         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
  2020-03-02  6:38   ` David C. Rankin
@ 2020-03-02  6:46     ` David C. Rankin
  2020-03-02  6:51     ` Roman Mamedov
  1 sibling, 0 replies; 14+ messages in thread
From: David C. Rankin @ 2020-03-02  6:46 UTC (permalink / raw)
  To: mdraid

On 03/02/2020 12:38 AM, David C. Rankin wrote:
> Thank you Roman, iostat and smartctl -a for sdc/sdd attached,
> 
>   sdc has a few errors from a power hit taken 3000 hours ago or so, but since
> that time it has been fine. I had rolled back to several earlier kernels from
> Jan 14, Jan 21, and Jan 27 with no change, I then updated to current which is
> Archlinux 5.5.6-arch1-1.
> 
>   I'm not sure what to make of the iostat output, but the r_await looks
> suspicious. Could this all be due to one flaky disk without it throwing any
> errors?

Actually,

  I think sdc looks like the flaky culprit... So what is recommended in the
interim, fail sdc and remove from array and run on sdd until the replacement
arrives to rebuild? (seems better than limping along with sdc if that is the
one causing all the slowdown)

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
  2020-03-02  6:38   ` David C. Rankin
  2020-03-02  6:46     ` David C. Rankin
@ 2020-03-02  6:51     ` Roman Mamedov
  2020-03-02  6:57       ` David C. Rankin
  1 sibling, 1 reply; 14+ messages in thread
From: Roman Mamedov @ 2020-03-02  6:51 UTC (permalink / raw)
  To: David C. Rankin; +Cc: mdraid

On Mon, 2 Mar 2020 00:38:16 -0600
"David C. Rankin" <drankinatty@suddenlinkmail.com> wrote:

> On 03/01/2020 11:25 PM, Roman Mamedov wrote:
> > On Sun, 1 Mar 2020 19:50:03 -0600
> > "David C. Rankin" <drankinatty@suddenlinkmail.com> wrote:
> > 
> >>   Let me know if there is anything else I can send, and let me know if I
> >> should stop the scrub or just let it run. I'm happy to run any diagnostic you
> >> can think of that might help. Thanks.
> > 
> > It doesn't seem convincing that the issue is raw devices vs partitions, or
> > even kernel version related, especially since you rolled it back and the issue
> > remains.
> > 
> > What else you could send is "smartctl -a" of all devices;
> > 
> > and most importantly, while the "slow" scrub is running on md4, start:
> > 
> >   iostat -x 2 /dev/sdc /dev/sdd
> > 
> > (enlarge the terminal window) and see if any of the 2 devices is pegged into
> > 100.0 in the last "%util" column, or just showing much higher values there
> > than the other one.
> > 
> 
> Thank you Roman, iostat and smartctl -a for sdc/sdd attached,
> 
>   sdc has a few errors from a power hit taken 3000 hours ago or so, but since
> that time it has been fine. I had rolled back to several earlier kernels from
> Jan 14, Jan 21, and Jan 27 with no change, I then updated to current which is
> Archlinux 5.5.6-arch1-1.

These show not just a few errors, but that it is basically dying:

  5 Reallocated_Sector_Ct   0x0033   089   089   010    Pre-fail  Always  13648
197 Current_Pending_Sector  0x0012   085   085   000    Old_age   Always   2544
198 Offline_Uncorrectable   0x0010   085   085   000    Old_age   Offline  2544

>   I'm not sure what to make of the iostat output, but the r_await looks
> suspicious. Could this all be due to one flaky disk without it throwing any
> errors?

Yes, replace the drive ASAP, and see if that solves it.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
  2020-03-02  6:51     ` Roman Mamedov
@ 2020-03-02  6:57       ` David C. Rankin
  2020-03-02  7:08         ` Chris Murphy
  2020-03-04 22:53         ` David C. Rankin
  0 siblings, 2 replies; 14+ messages in thread
From: David C. Rankin @ 2020-03-02  6:57 UTC (permalink / raw)
  To: mdraid

On 03/02/2020 12:51 AM, Roman Mamedov wrote:
> Yes, replace the drive ASAP, and see if that solves it.

Will do, thank you!

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
  2020-03-02  6:57       ` David C. Rankin
@ 2020-03-02  7:08         ` Chris Murphy
  2020-03-02  9:27           ` David C. Rankin
  2020-03-04 22:53         ` David C. Rankin
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2020-03-02  7:08 UTC (permalink / raw)
  To: David C. Rankin; +Cc: mdraid

smart also reports for /de/sdc

  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455


So I'm suspicious of timeout mismatch as well.
https://raid.wiki.kernel.org/index.php/Timeout_Mismatch


Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
  2020-03-02  7:08         ` Chris Murphy
@ 2020-03-02  9:27           ` David C. Rankin
  2020-03-02 11:44             ` Phil Turmel
  2020-03-02 21:09             ` Chris Murphy
  0 siblings, 2 replies; 14+ messages in thread
From: David C. Rankin @ 2020-03-02  9:27 UTC (permalink / raw)
  To: mdraid

On 03/02/2020 01:08 AM, Chris Murphy wrote:
> smart also reports for /de/sdc
> 
>   40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
> 
> 
> So I'm suspicious of timeout mismatch as well.
> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
> 
> 
> Chris Murphy
> 

The strace between the virtualbox host and guess show and number of I/O waits
that would seem to fit some timeout issue like that. But according to the
page, both drives in this array provide:

SCT capabilities:              (0x1085) SCT Status supported.

Which should be able to handle the correction without stumbling into the
timeout problem. Something is FUBAR. On a Archlinux guest running on that
array, At a text console when you type your user name and press [Enter], the
login may timeout before the password: prompt is ever displayed. So this is
really giving virtualbox fits.

On the host itself, you don't really notice much, other than a bit of slowdown
with readline and tab-completion every once in a while, but apps looking to
that array -- all bets are off.

And still not a single error in the journal or mailed from mdadm. You would
think if it was going to take 26 days to scrub a 3T array, some error should
pop up somewhere :-)

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
  2020-03-02  9:27           ` David C. Rankin
@ 2020-03-02 11:44             ` Phil Turmel
  2020-03-02 13:32               ` Wols Lists
  2020-03-02 21:09             ` Chris Murphy
  1 sibling, 1 reply; 14+ messages in thread
From: Phil Turmel @ 2020-03-02 11:44 UTC (permalink / raw)
  To: David C. Rankin, mdraid

Hi David,

On 3/2/20 4:27 AM, David C. Rankin wrote:
> On 03/02/2020 01:08 AM, Chris Murphy wrote:
>> smart also reports for /de/sdc
>>
>>    40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
>>
>>
>> So I'm suspicious of timeout mismatch as well.
>> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
>>
>>
>> Chris Murphy
>>
> 
> The strace between the virtualbox host and guess show and number of I/O waits
> that would seem to fit some timeout issue like that. But according to the
> page, both drives in this array provide:
> 
> SCT capabilities:              (0x1085) SCT Status supported.

SCT Status itself isn't sufficient.  You must have ERC "Error Recovery 
Control", an optional part of SCT.

smartctl -a doesn't expose that.  Use smartctl -x in general, or 
smartctl -l scterc to specifically check the needed setting.

Phil

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
  2020-03-02 11:44             ` Phil Turmel
@ 2020-03-02 13:32               ` Wols Lists
  2020-03-02 21:21                 ` David C. Rankin
  0 siblings, 1 reply; 14+ messages in thread
From: Wols Lists @ 2020-03-02 13:32 UTC (permalink / raw)
  To: Phil Turmel, David C. Rankin, mdraid

On 02/03/20 11:44, Phil Turmel wrote:
> Hi David,
> 
> On 3/2/20 4:27 AM, David C. Rankin wrote:
>> On 03/02/2020 01:08 AM, Chris Murphy wrote:
>>> smart also reports for /de/sdc
>>>
>>>    40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
>>>
>>>
>>> So I'm suspicious of timeout mismatch as well.
>>> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
>>>
>>>
>>> Chris Murphy
>>>
>>
>> The strace between the virtualbox host and guess show and number of
>> I/O waits
>> that would seem to fit some timeout issue like that. But according to the
>> page, both drives in this array provide:
>>
>> SCT capabilities:              (0x1085) SCT Status supported.
> 
> SCT Status itself isn't sufficient.  You must have ERC "Error Recovery
> Control", an optional part of SCT.
> 
> smartctl -a doesn't expose that.  Use smartctl -x in general, or
> smartctl -l scterc to specifically check the needed setting.
> 
It's a Seagate Barracuda ... nuff said

(For David, Barracudas don't support SCT/ERC - they are not recommended
for raid. Okay for 1 but definitely not anything else. Get an Ironwolf
to replace it.)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
  2020-03-02  9:27           ` David C. Rankin
  2020-03-02 11:44             ` Phil Turmel
@ 2020-03-02 21:09             ` Chris Murphy
  1 sibling, 0 replies; 14+ messages in thread
From: Chris Murphy @ 2020-03-02 21:09 UTC (permalink / raw)
  To: David C. Rankin; +Cc: mdraid

On Mon, Mar 2, 2020 at 2:27 AM David C. Rankin
<drankinatty@suddenlinkmail.com> wrote:
>
> On 03/02/2020 01:08 AM, Chris Murphy wrote:
> > smart also reports for /de/sdc
> >
> >   40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
> >
> >
> > So I'm suspicious of timeout mismatch as well.
> > https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
> >
> >
> > Chris Murphy
> >
>
> The strace between the virtualbox host and guess show and number of I/O waits
> that would seem to fit some timeout issue like that. But according to the
> page, both drives in this array provide:
>
> SCT capabilities:              (0x1085) SCT Status supported.
>

Check the value 'smartctl -l scterc /dev/'
Change the value 'smartctl -l scterc,70,70 /dev/'

Of course no change needed if it's a value already below sysfs timeout
value for each block device. Note that SCT ERC times are deciseconds.

This is on the host.

> Which should be able to handle the correction without stumbling into the
> timeout problem. Something is FUBAR. On a Archlinux guest running on that
> array, At a text console when you type your user name and press [Enter], the
> login may timeout before the password: prompt is ever displayed. So this is
> really giving virtualbox fits.

Weird, I'm not sure what's causing that kind of latency in a vbox guest.


>
> On the host itself, you don't really notice much, other than a bit of slowdown
> with readline and tab-completion every once in a while, but apps looking to
> that array -- all bets are off.
>
> And still not a single error in the journal or mailed from mdadm. You would
> think if it was going to take 26 days to scrub a 3T array, some error should
> pop up somewhere :-)

Yes. At the least the default SCSI command timer should spit back a
hard link reset, both in the journal and to the device. I don't think
mdadm will report that.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
  2020-03-02 13:32               ` Wols Lists
@ 2020-03-02 21:21                 ` David C. Rankin
  0 siblings, 0 replies; 14+ messages in thread
From: David C. Rankin @ 2020-03-02 21:21 UTC (permalink / raw)
  To: mdraid

On 03/02/2020 07:32 AM, Wols Lists wrote:
> It's a Seagate Barracuda ... nuff said
> 
> (For David, Barracudas don't support SCT/ERC - they are not recommended
> for raid. Okay for 1 but definitely not anything else. Get an Ironwolf
> to replace it.)

That's a good model!

  It's a shame that drive manufacturers have stripped functionality from most
of the drives over the past 20 years. 20+ years ago when you when you bought a
drive, it had all the drive features standard. Even back to the RLL/MFM days,
all features were supported. Now with the 4 flavors of drives from every
manufacturer it seems to be a race to put out the cheapest stripped down
drives they can make.

  Then don't make 'em like they used to. Just checking the collection of old
boxes still spinning, there is an ancient data drive hanging off one probably
from the mid 2005 timeframe:

 === START OF INFORMATION SECTION ===
Model Family:     Maxtor DiamondMax 10 (ATA/133 and SATA/150)
Device Model:     Maxtor 6L300R0
Serial Number:    L604P3MH
Firmware Version: BAH41E00
User Capacity:    300,090,728,448 bytes [300 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Mon Mar  2 15:14:20 2020 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

  It has probably been spinning continually for 12-15 years. So long the
power-on hours are reported only as "17h+26m",

  I'll update the thread when the new drives come in and the raid is rebuilt
and let you know how it goes.


-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
  2020-03-02  6:57       ` David C. Rankin
  2020-03-02  7:08         ` Chris Murphy
@ 2020-03-04 22:53         ` David C. Rankin
  2020-03-05 17:18           ` Wols Lists
  1 sibling, 1 reply; 14+ messages in thread
From: David C. Rankin @ 2020-03-04 22:53 UTC (permalink / raw)
  To: mdraid

On 03/02/2020 12:57 AM, David C. Rankin wrote:
> On 03/02/2020 12:51 AM, Roman Mamedov wrote:
>> Yes, replace the drive ASAP, and see if that solves it.
> 
> Will do, thank you!
> 

Drive replaced and rebuilding:

md4 : active raid1 sdc[3] sdd[2]
      2930135488 blocks super 1.2 [2/1] [_U]
      [>....................]  recovery =  1.5% (46390912/2930135488)
finish=276.0min speed=174102K/sec
      bitmap: 1/22 pages [4KB], 65536KB chunk

Things are looking good, speed=174102K/sec, which is a far-sight better than
speed=2022K/sec. This will give a 4.5 hour rebuild (instead of a 26 day
scrub). I suspect the virtualbox problems will disappear as well once the
rebuild is done.

Thank you to everyone for helping get me pointed in the right direction. I'll
let you know if I have any further issues here, but I don't anticipate any
(fingers-crossed...)

-- 
David C. Rankin, J.D.,P.E.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O
  2020-03-04 22:53         ` David C. Rankin
@ 2020-03-05 17:18           ` Wols Lists
  0 siblings, 0 replies; 14+ messages in thread
From: Wols Lists @ 2020-03-05 17:18 UTC (permalink / raw)
  To: David C. Rankin, mdraid

On 04/03/20 22:53, David C. Rankin wrote:
> On 03/02/2020 12:57 AM, David C. Rankin wrote:
>> On 03/02/2020 12:51 AM, Roman Mamedov wrote:
>>> Yes, replace the drive ASAP, and see if that solves it.
>>
>> Will do, thank you!
>>
> 
> Drive replaced and rebuilding:
> 
> md4 : active raid1 sdc[3] sdd[2]
>       2930135488 blocks super 1.2 [2/1] [_U]
>       [>....................]  recovery =  1.5% (46390912/2930135488)
> finish=276.0min speed=174102K/sec
>       bitmap: 1/22 pages [4KB], 65536KB chunk
> 
> Things are looking good, speed=174102K/sec, which is a far-sight better than
> speed=2022K/sec. This will give a 4.5 hour rebuild (instead of a 26 day
> scrub). I suspect the virtualbox problems will disappear as well once the
> rebuild is done.
> 
> Thank you to everyone for helping get me pointed in the right direction. I'll
> let you know if I have any further issues here, but I don't anticipate any
> (fingers-crossed...)
> 
Raid 1 - look at dm-integrity. That should make scrubbing (hopefully)
redundant :-)

I might at last soon get my new system up and running (got a shop to
look at it - dud motherboard :-( Of course it's now out of warranty and
my supplier has gone bust, but if the shop say it was dud from the start
I might be able to claim something ...

But that means I'll have a test system - I've acquired about 6 x 1TB
drives - so I shall be playing with some slightly more heavy-duty raid
configs :-)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-03-05 17:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-02  1:50 Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O David C. Rankin
2020-03-02  5:25 ` Roman Mamedov
2020-03-02  6:38   ` David C. Rankin
2020-03-02  6:46     ` David C. Rankin
2020-03-02  6:51     ` Roman Mamedov
2020-03-02  6:57       ` David C. Rankin
2020-03-02  7:08         ` Chris Murphy
2020-03-02  9:27           ` David C. Rankin
2020-03-02 11:44             ` Phil Turmel
2020-03-02 13:32               ` Wols Lists
2020-03-02 21:21                 ` David C. Rankin
2020-03-02 21:09             ` Chris Murphy
2020-03-04 22:53         ` David C. Rankin
2020-03-05 17:18           ` Wols Lists

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.