* Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O @ 2020-03-02 1:50 David C. Rankin 2020-03-02 5:25 ` Roman Mamedov 0 siblings, 1 reply; 14+ messages in thread From: David C. Rankin @ 2020-03-02 1:50 UTC (permalink / raw) To: mdraid Mayday.... OS: Archlinux Kernel: 5.5 mdadm: 4.1-2 RAID 1: 2-disk 3T on devices sdc, sdd (not on partitions, on device) (see mdadm -E on both disks and mdadm -D on array below message) array: /dev/md4 After update to Linux 5.5 kernel, a 2-disk 3T Raid1 on devices /sdc and /sdd I/O on the array has dropped from ~speed=85166K/sec during scrub to ~speed=2022K/sec with speed as low as speed=737K/sec. There are no errors. This array takes exacty 5 hours 10 minutes to scrub normally and has for the past 4 years. The scrub has now been running for over 14 hours (without error) and is only 2.8% complete, e.g. cat /proc/mdstat Personalities : [raid1] md4 : active raid1 sdc[0] sdd[2] 2930135488 blocks super 1.2 [2/2] [UU] [>....................] check = 2.8% (82114752/2930135488) finish=38635.1min speed=1228K/sec bitmap: 0/22 pages [0KB], 65536KB chunk <snip> The last 3 months of logging the scrub shows the scrub completing in 5:10 every month (the timestamp is the completion time for the scrub, just subtract the difference between /dev/md4 and /dev/md2 for the scrub time on /dev/md4): Dec 1 03:01:02 '/dev/md0' mismatch_cnt = 0 Dec 1 03:10:02 '/dev/md1' mismatch_cnt = 0 Dec 1 07:10:03 '/dev/md2' mismatch_cnt = 0 Dec 1 12:20:03 '/dev/md4' mismatch_cnt = 0 Jan 1 03:01:01 '/dev/md0' mismatch_cnt = 0 Jan 1 03:07:01 '/dev/md1' mismatch_cnt = 0 Jan 1 05:04:02 '/dev/md2' mismatch_cnt = 0 Jan 1 10:14:03 '/dev/md4' mismatch_cnt = 0 Feb 1 03:01:01 '/dev/md0' mismatch_cnt = 0 Feb 1 03:07:01 '/dev/md1' mismatch_cnt = 0 Feb 1 05:01:02 '/dev/md2' mismatch_cnt = 0 Feb 1 10:11:02 '/dev/md4' mismatch_cnt = 0 After the 5.5 kernel update I have noticed apps such as the virtualbox guests on the drive becoming unusably slow and thought initiially it was a problem with Oracle virtualbox and the 5.5 kernel. The iowate is over 99% at times running top on the Archlinux guest at times, and iostat on the guest shows: Linux 5.5.5-arch1-1 (vl1) 02/24/2020 _x86_64_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.14 0.00 0.18 54.55 0.00 45.14 (I have a screenshot of the top wa over 98% if you need that too) Originally, I opened a bug with Oracle https://www.virtualbox.org/ticket/19311, but that has left them scratching thier head, and it wasn't until my scrub kicked of and I saw that it would take a month to complete that I snapped to the fact it was a kernel Raid issue. I check the scrub regularly and have it log completion of each array. Checking right at the end of /dev/md2 (the array before before this one starts scrubbing), all is normal, speed is fine and it completed in normal time: 05:00 valkyrie:~/tmp> cat /proc/mdstat Personalities : [raid1] md4 : active raid1 sdc[0] sdd[2] 2930135488 blocks super 1.2 [2/2] [UU] bitmap: 0/22 pages [0KB], 65536KB chunk md2 : active raid1 sda7[0] sdb7[1] 921030656 blocks super 1.2 [2/2] [UU] [===================>.] check = 99.8% (919643456/921030656) finish=0.2min speed=85166K/sec bitmap: 2/7 pages [8KB], 65536KB chunk md1 : active raid1 sda6[0] sdb6[1] 52396032 blocks super 1.2 [2/2] [UU] md3 : active raid1 sda8[0] sdb8[1] 2115584 blocks super 1.2 [2/2] [UU] md0 : active raid1 sda5[0] sdb5[1] 511680 blocks super 1.2 [2/2] [UU] unused devices: <none> However, checking at the beginning of /dev/md4, speed plunged to speed=2022K/sec (What??) 05:00 valkyrie:~/tmp> cat /proc/mdstat Personalities : [raid1] md4 : active raid1 sdc[0] sdd[2] 2930135488 blocks super 1.2 [2/2] [UU] [>....................] check = 0.0% (155712/2930135488) finish=24141.2min speed=2022K/sec bitmap: 0/22 pages [0KB], 65536KB chunk md2 : active raid1 sda7[0] sdb7[1] 921030656 blocks super 1.2 [2/2] [UU] bitmap: 0/7 pages [0KB], 65536KB chunk md1 : active raid1 sda6[0] sdb6[1] 52396032 blocks super 1.2 [2/2] [UU] md3 : active raid1 sda8[0] sdb8[1] 2115584 blocks super 1.2 [2/2] [UU] md0 : active raid1 sda5[0] sdb5[1] 511680 blocks super 1.2 [2/2] [UU] unused devices: <none> The perplexing problem is I have rolled the Archlinux install back to the 5.4 kenel before this problem originally appeared, but for reasons I cannot explain, the array remains unusably slow. (I don't know if something was written that changes the array for Linux 5.5 or what, but there is no question it was like a switch was thrown on the 5.5 kernel update that crippled this array, but left the other 3 arrays that are on partitions instead of devices fine). There are no errors logged to the journal, but it is like I/O to this array is coming through a Dixie Straw and most of the time it is like there is a race-condition somewhere causing the thing to just sit and spin. Here are the mdadm -E and mdadm -D details: [14:17 valkyrie:/home/david/tmp] # mdadm -E /dev/sdc /dev/sdc: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 6e520607:f152d8b9:dd2a3bec:5f9dc875 Name : valkyrie:4 (local to host valkyrie) Creation Time : Mon Mar 21 02:27:21 2016 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Array Size : 2930135488 (2794.39 GiB 3000.46 GB) Used Dev Size : 5860270976 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=48 sectors State : clean Device UUID : e15f0ea7:7e973d0c:f7ae51a1:9ee4b3a4 Internal Bitmap : 8 sectors from superblock Update Time : Sun Mar 1 14:18:07 2020 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 62472be - correct Events : 8193 Device Role : Active device 0 Array State : AA ('A' == active, '.' == missing, 'R' == replacing) [14:18 valkyrie:/home/david/tmp] # mdadm -E /dev/sdd /dev/sdd: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 6e520607:f152d8b9:dd2a3bec:5f9dc875 Name : valkyrie:4 (local to host valkyrie) Creation Time : Mon Mar 21 02:27:21 2016 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB) Array Size : 2930135488 (2794.39 GiB 3000.46 GB) Used Dev Size : 5860270976 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=48 sectors State : clean Device UUID : f745d11a:c323f477:71f8a0d9:27d8c717 Internal Bitmap : 8 sectors from superblock Update Time : Sun Mar 1 14:18:15 2020 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 9101220e - correct Events : 8194 Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing, 'R' == replacing) [14:18 valkyrie:/home/david/tmp] # mdadm -D /dev/md4 /dev/md4: Version : 1.2 Creation Time : Mon Mar 21 02:27:21 2016 Raid Level : raid1 Array Size : 2930135488 (2794.39 GiB 3000.46 GB) Used Dev Size : 2930135488 (2794.39 GiB 3000.46 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sun Mar 1 14:18:32 2020 State : clean, checking Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : bitmap Check Status : 1% complete Name : valkyrie:4 (local to host valkyrie) UUID : 6e520607:f152d8b9:dd2a3bec:5f9dc875 Events : 8194 Number Major Minor RaidDevice State 0 8 32 0 active sync /dev/sdc 2 8 48 1 active sync /dev/sdd A current mdstat (the scrub began at 05:00): 19:39 valkyrie:/home/david/tmp] # cat /proc/mdstat Personalities : [raid1] md4 : active raid1 sdc[0] sdd[2] 2930135488 blocks super 1.2 [2/2] [UU] [>....................] check = 2.8% (84842176/2930135488) finish=28990.7min speed=1635K/sec bitmap: 0/22 pages [0KB], 65536KB chunk md2 : active raid1 sda7[0] sdb7[1] 921030656 blocks super 1.2 [2/2] [UU] bitmap: 0/7 pages [0KB], 65536KB chunk md1 : active raid1 sda6[0] sdb6[1] 52396032 blocks super 1.2 [2/2] [UU] md3 : active raid1 sda8[0] sdb8[1] 2115584 blocks super 1.2 [2/2] [UU] md0 : active raid1 sda5[0] sdb5[1] 511680 blocks super 1.2 [2/2] [UU] unused devices: <none> Here is the complete scrub log for the past year. Feb 1 03:07:02 '/dev/md1' mismatch_cnt = 0 Feb 1 05:02:02 '/dev/md2' mismatch_cnt = 0 Feb 1 10:12:03 '/dev/md4' mismatch_cnt = 0 Mar 1 03:01:02 '/dev/md0' mismatch_cnt = 0 Mar 1 03:07:02 '/dev/md1' mismatch_cnt = 0 Mar 1 05:05:02 '/dev/md2' mismatch_cnt = 0 Mar 1 10:15:03 '/dev/md4' mismatch_cnt = 0 Apr 1 03:01:02 '/dev/md0' mismatch_cnt = 0 Apr 1 03:07:02 '/dev/md1' mismatch_cnt = 0 Apr 1 05:03:02 '/dev/md2' mismatch_cnt = 0 Apr 1 10:13:03 '/dev/md4' mismatch_cnt = 0 May 1 03:01:01 '/dev/md0' mismatch_cnt = 0 May 1 03:07:01 '/dev/md1' mismatch_cnt = 0 May 1 05:06:02 '/dev/md2' mismatch_cnt = 0 May 1 10:16:02 '/dev/md4' mismatch_cnt = 0 Jun 1 03:01:01 '/dev/md0' mismatch_cnt = 0 Jun 1 03:07:01 '/dev/md1' mismatch_cnt = 0 Jun 1 05:02:02 '/dev/md2' mismatch_cnt = 0 Jun 1 10:12:02 '/dev/md4' mismatch_cnt = 0 Jul 1 03:01:01 '/dev/md0' mismatch_cnt = 0 Jul 1 03:07:01 '/dev/md1' mismatch_cnt = 0 Jul 1 05:01:02 '/dev/md2' mismatch_cnt = 0 Jul 1 10:11:02 '/dev/md4' mismatch_cnt = 0 Aug 1 03:01:01 '/dev/md0' mismatch_cnt = 0 Aug 1 03:07:01 '/dev/md1' mismatch_cnt = 0 Aug 1 05:01:02 '/dev/md2' mismatch_cnt = 0 Sep 1 03:01:01 '/dev/md0' mismatch_cnt = 0 Sep 1 03:07:01 '/dev/md1' mismatch_cnt = 0 Sep 1 05:01:02 '/dev/md2' mismatch_cnt = 0 Sep 1 10:11:02 '/dev/md4' error: mismatch_cnt = 256 Oct 1 03:01:01 '/dev/md0' mismatch_cnt = 0 Oct 1 03:06:01 '/dev/md1' mismatch_cnt = 0 Oct 1 05:00:02 '/dev/md2' mismatch_cnt = 0 Oct 1 10:10:02 '/dev/md4' error: mismatch_cnt = 128 Nov 1 03:01:01 '/dev/md0' mismatch_cnt = 0 Nov 1 03:06:01 '/dev/md1' mismatch_cnt = 0 Nov 1 05:00:02 '/dev/md2' mismatch_cnt = 0 Nov 1 10:10:02 '/dev/md4' error: mismatch_cnt = 3584 Dec 1 03:01:02 '/dev/md0' mismatch_cnt = 0 Dec 1 03:10:02 '/dev/md1' mismatch_cnt = 0 Dec 1 07:10:03 '/dev/md2' mismatch_cnt = 0 Dec 1 12:20:03 '/dev/md4' mismatch_cnt = 0 Jan 1 03:01:01 '/dev/md0' mismatch_cnt = 0 Jan 1 03:07:01 '/dev/md1' mismatch_cnt = 0 Jan 1 05:04:02 '/dev/md2' mismatch_cnt = 0 Jan 1 10:14:03 '/dev/md4' mismatch_cnt = 0 Feb 1 03:01:01 '/dev/md0' mismatch_cnt = 0 Feb 1 03:07:01 '/dev/md1' mismatch_cnt = 0 Feb 1 05:01:02 '/dev/md2' mismatch_cnt = 0 Feb 1 10:11:02 '/dev/md4' mismatch_cnt = 0 I need help, I don't know what else to check or what else to send you? I've tried to think of the most relevant informatation I can provide. I so have straces between the virtualbox host and guest on that machne if that would help. There is nothing in the journal to send of any disk error, etc... It's just like the 5.5 kernel doesn't handle Raid1 on a device (instead of partition) the same way did before 5.5 that is brining I/O to it's knees. Let me know if there is anything else I can send, and let me know if I should stop the scrub or just let it run. I'm happy to run any diagnostic you can think of that might help. Thanks. -- David C. Rankin, J.D.,P.E. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O 2020-03-02 1:50 Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O David C. Rankin @ 2020-03-02 5:25 ` Roman Mamedov 2020-03-02 6:38 ` David C. Rankin 0 siblings, 1 reply; 14+ messages in thread From: Roman Mamedov @ 2020-03-02 5:25 UTC (permalink / raw) To: David C. Rankin; +Cc: mdraid On Sun, 1 Mar 2020 19:50:03 -0600 "David C. Rankin" <drankinatty@suddenlinkmail.com> wrote: > Let me know if there is anything else I can send, and let me know if I > should stop the scrub or just let it run. I'm happy to run any diagnostic you > can think of that might help. Thanks. It doesn't seem convincing that the issue is raw devices vs partitions, or even kernel version related, especially since you rolled it back and the issue remains. What else you could send is "smartctl -a" of all devices; and most importantly, while the "slow" scrub is running on md4, start: iostat -x 2 /dev/sdc /dev/sdd (enlarge the terminal window) and see if any of the 2 devices is pegged into 100.0 in the last "%util" column, or just showing much higher values there than the other one. -- With respect, Roman ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O 2020-03-02 5:25 ` Roman Mamedov @ 2020-03-02 6:38 ` David C. Rankin 2020-03-02 6:46 ` David C. Rankin 2020-03-02 6:51 ` Roman Mamedov 0 siblings, 2 replies; 14+ messages in thread From: David C. Rankin @ 2020-03-02 6:38 UTC (permalink / raw) To: mdraid [-- Attachment #1: Type: text/plain, Size: 1386 bytes --] On 03/01/2020 11:25 PM, Roman Mamedov wrote: > On Sun, 1 Mar 2020 19:50:03 -0600 > "David C. Rankin" <drankinatty@suddenlinkmail.com> wrote: > >> Let me know if there is anything else I can send, and let me know if I >> should stop the scrub or just let it run. I'm happy to run any diagnostic you >> can think of that might help. Thanks. > > It doesn't seem convincing that the issue is raw devices vs partitions, or > even kernel version related, especially since you rolled it back and the issue > remains. > > What else you could send is "smartctl -a" of all devices; > > and most importantly, while the "slow" scrub is running on md4, start: > > iostat -x 2 /dev/sdc /dev/sdd > > (enlarge the terminal window) and see if any of the 2 devices is pegged into > 100.0 in the last "%util" column, or just showing much higher values there > than the other one. > Thank you Roman, iostat and smartctl -a for sdc/sdd attached, sdc has a few errors from a power hit taken 3000 hours ago or so, but since that time it has been fine. I had rolled back to several earlier kernels from Jan 14, Jan 21, and Jan 27 with no change, I then updated to current which is Archlinux 5.5.6-arch1-1. I'm not sure what to make of the iostat output, but the r_await looks suspicious. Could this all be due to one flaky disk without it throwing any errors? -- David C. Rankin, J.D.,P.E. [-- Attachment #2: iostat-x2_sdc_sdd.txt --] [-- Type: text/plain, Size: 32395 bytes --] # iostat -x 2 /dev/sdc /dev/sdd Linux 5.5.6-arch1-1 (valkyrie) 03/02/2020 _x86_64_ (8 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.02 0.01 0.36 0.28 0.00 99.34 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 4.17 323.18 1.01 19.52 918.56 77.56 0.16 5.88 0.01 6.11 472.99 35.69 0.00 0.00 0.00 0.00 0.00 0.00 0.17 333.64 3.90 0.33 sdd 4.16 323.93 1.00 19.33 3.99 77.81 0.16 5.88 0.01 6.06 25.57 35.66 0.00 0.00 0.00 0.00 0.00 0.00 0.17 20.40 0.02 0.29 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.19 0.00 0.00 99.81 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 1.50 992.00 0.00 0.00 1722.00 661.33 1.00 2.25 0.00 0.00 62.00 2.25 0.00 0.00 0.00 0.00 0.00 0.00 2.00 30.75 2.64 1.15 sdd 15.50 992.00 0.00 0.00 0.45 64.00 1.00 2.25 0.00 0.00 19.00 2.25 0.00 0.00 0.00 0.00 0.00 0.00 2.00 9.25 0.02 1.20 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.12 0.00 0.00 99.88 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 10.50 672.00 0.00 0.00 1179.43 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12.37 1.00 sdd 10.50 672.00 0.00 0.00 0.29 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.80 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.12 0.00 0.00 99.88 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 15.50 992.00 0.00 0.00 2609.16 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 40.41 1.35 sdd 8.50 544.00 0.00 0.00 0.29 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.06 0.00 0.00 99.94 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 9.50 1248.00 14.00 59.57 1875.53 131.37 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 17.80 1.15 sdd 1.50 992.00 14.00 90.32 11.33 661.33 0.50 2.00 0.00 0.00 24.00 4.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 12.00 0.03 0.85 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.25 0.00 0.00 99.75 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 8.50 800.00 0.00 0.00 485.18 94.12 1.00 2.25 0.00 0.00 704.50 2.25 0.00 0.00 0.00 0.00 0.00 0.00 2.00 352.25 4.82 1.35 sdd 23.50 1504.00 0.00 0.00 0.36 64.00 0.50 0.25 0.00 0.00 40.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00 1.00 20.00 0.02 1.35 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.00 0.00 0.00 100.00 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 11.50 736.00 0.00 0.00 1675.09 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 19.25 1.00 sdd 11.00 704.00 0.00 0.00 0.32 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.15 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.12 0.00 0.00 99.88 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 3.50 224.00 0.00 0.00 3240.00 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 11.33 0.50 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.00 0.00 0.00 100.00 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 11.50 736.00 14.00 54.90 4613.87 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 53.04 1.00 sdd 1.50 992.00 14.00 90.32 11.00 661.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.65 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.00 0.00 0.00 100.00 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 1.50 992.00 0.00 0.00 1215.67 661.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.82 0.50 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.50 2.00 0.00 0.00 20.00 4.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 9.50 0.01 0.35 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 1.13 0.00 0.00 98.87 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 202.50 13920.00 15.00 6.90 52.18 68.74 1.00 2.25 0.00 0.00 890.00 2.25 0.00 0.00 0.00 0.00 0.00 0.00 2.00 445.00 11.17 8.35 sdd 218.00 14912.00 15.00 6.44 7.05 68.40 0.50 0.25 0.00 0.00 40.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00 1.00 20.00 1.15 8.35 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.00 0.00 0.00 100.00 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 4.50 288.00 0.00 0.00 1930.33 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8.68 0.65 sdd 3.00 192.00 0.00 0.00 0.33 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.50 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.06 0.00 0.00 99.94 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 11.50 736.00 0.00 0.00 2999.26 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 34.48 1.00 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.00 0.00 0.00 100.00 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 4.00 1152.00 14.00 77.78 3092.12 288.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12.36 0.85 sdd 1.50 992.00 14.00 90.32 3.33 661.33 0.50 2.00 0.00 0.00 20.00 4.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 9.50 0.01 0.65 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.31 0.00 0.00 99.69 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 5.50 352.00 0.00 0.00 1011.09 64.00 1.00 2.25 0.00 0.00 265.00 2.25 0.00 0.00 0.00 0.00 0.00 0.00 2.00 132.25 5.82 1.15 sdd 21.00 1344.00 0.00 0.00 0.36 64.00 0.50 0.25 0.00 0.00 31.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00 1.00 16.00 0.01 1.15 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.06 0.00 0.00 99.94 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 15.50 992.00 0.00 0.00 2365.74 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 36.64 1.00 sdd 15.50 992.00 0.00 0.00 0.29 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.06 0.00 0.12 0.00 0.00 99.81 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 4.50 288.00 0.00 0.00 2292.33 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10.31 0.35 sdd 1.50 96.00 0.00 0.00 0.33 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.35 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.06 0.00 0.00 99.94 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 11.50 736.00 0.00 0.00 3806.57 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 43.76 0.65 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.06 0.00 0.94 0.00 0.00 99.00 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 145.50 10208.00 14.00 8.78 52.79 70.16 1.00 2.25 0.00 0.00 17.50 2.25 0.00 0.00 0.00 0.00 0.00 0.00 2.00 8.75 7.43 6.15 sdd 160.00 11136.00 14.00 8.05 0.39 69.60 1.00 2.25 0.00 0.00 32.00 2.25 0.00 0.00 0.00 0.00 0.00 0.00 2.00 16.00 0.05 6.35 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.12 0.00 0.00 99.88 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 12.50 800.00 0.00 0.00 2143.32 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 26.76 1.35 sdd 12.50 800.00 0.00 0.00 0.28 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.30 avg-cpu: %user %nice %system %iowait %steal %idle 0.06 0.00 0.06 0.00 0.00 99.88 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 3.00 192.00 0.00 0.00 3173.00 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 9.51 0.50 sdd 2.50 160.00 0.00 0.00 0.40 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.70 avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.06 0.00 0.00 99.94 Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sdc 3.50 224.00 0.00 0.00 4033.29 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 14.11 0.35 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 [00:27 valkyrie:/home/david/tmp] # smartctl -a /dev/sdc smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.5.6-arch1-1] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST3000DM001-1ER166 Serial Number: Z50264LN LU WWN Device Id: 5 000c50 087801e14 Firmware Version: CC26 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Mon Mar 2 00:27:28 2020 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 80) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 316) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x1085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 117 087 006 Pre-fail Always - 145749056 3 Spin_Up_Time 0x0003 094 094 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 32 5 Reallocated_Sector_Ct 0x0033 089 089 010 Pre-fail Always - 13648 7 Seek_Error_Rate 0x000f 080 060 030 Pre-fail Always - 103164271 9 Power_On_Hours 0x0032 062 062 000 Old_age Always - 34041 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 32 183 Runtime_Bad_Block 0x0032 098 098 000 Old_age Always - 2 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 082 082 000 Old_age Always - 18 188 Command_Timeout 0x0032 100 099 000 Old_age Always - 5 6 6 189 High_Fly_Writes 0x003a 001 001 000 Old_age Always - 116 190 Airflow_Temperature_Cel 0x0022 067 063 045 Old_age Always - 33 (Min/Max 20/34) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 090 090 000 Old_age Always - 21932 194 Temperature_Celsius 0x0022 033 040 000 Old_age Always - 33 (0 18 0 0 0) 197 Current_Pending_Sector 0x0012 085 085 000 Old_age Always - 2544 198 Offline_Uncorrectable 0x0010 085 085 000 Old_age Offline - 2544 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 5185h+21m+36.806s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 2947035570 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 457517019401 SMART Error Log Version: 1 ATA Error Count: 18 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 18 occurred at disk power-on lifetime: 31122 hours (1296 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 00 ff ff ff 4f 00 40d+19:06:08.272 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 40d+19:06:08.272 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 40d+19:06:08.272 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 e0 00 40d+19:06:08.271 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 a0 00 40d+19:06:08.271 IDENTIFY DEVICE Error 17 occurred at disk power-on lifetime: 31122 hours (1296 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 61 00 80 ff ff ff 4f 00 40d+19:05:38.855 WRITE FPDMA QUEUED 61 00 80 ff ff ff 4f 00 40d+19:05:38.851 WRITE FPDMA QUEUED 61 00 80 ff ff ff 4f 00 40d+19:05:38.851 WRITE FPDMA QUEUED 61 00 80 ff ff ff 4f 00 40d+19:05:38.851 WRITE FPDMA QUEUED 61 00 80 ff ff ff 4f 00 40d+19:05:38.851 WRITE FPDMA QUEUED Error 16 occurred at disk power-on lifetime: 31122 hours (1296 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 61 00 80 ff ff ff 4f 00 40d+19:05:35.175 WRITE FPDMA QUEUED 61 00 80 ff ff ff 4f 00 40d+19:05:35.175 WRITE FPDMA QUEUED 60 00 00 ff ff ff 4f 00 40d+19:05:35.175 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 40d+19:05:35.175 READ FPDMA QUEUED 61 00 80 ff ff ff 4f 00 40d+19:05:35.175 WRITE FPDMA QUEUED Error 15 occurred at disk power-on lifetime: 31122 hours (1296 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 61 00 80 ff ff ff 4f 00 40d+19:05:31.509 WRITE FPDMA QUEUED 60 00 00 ff ff ff 4f 00 40d+19:05:31.509 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 40d+19:05:31.509 READ FPDMA QUEUED 61 00 80 ff ff ff 4f 00 40d+19:05:31.509 WRITE FPDMA QUEUED 60 00 80 ff ff ff 4f 00 40d+19:05:31.509 READ FPDMA QUEUED Error 14 occurred at disk power-on lifetime: 31122 hours (1296 days + 18 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 00 ff ff ff 4f 00 40d+19:05:27.858 READ FPDMA QUEUED 60 00 00 ff ff ff 4f 00 40d+19:05:27.857 READ FPDMA QUEUED 61 00 80 ff ff ff 4f 00 40d+19:05:27.857 WRITE FPDMA QUEUED 60 00 80 ff ff ff 4f 00 40d+19:05:27.857 READ FPDMA QUEUED 60 00 80 ff ff ff 4f 00 40d+19:05:27.857 READ FPDMA QUEUED SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 12 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. [00:27 valkyrie:/home/david/tmp] # smartctl -a /dev/sdd smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.5.6-arch1-1] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST3000DM001-1ER166 Serial Number: Z5025WPD LU WWN Device Id: 5 000c50 08780900c Firmware Version: CC26 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Mon Mar 2 00:29:03 2020 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 80) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 318) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x1085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 115 099 006 Pre-fail Always - 85209376 3 Spin_Up_Time 0x0003 094 094 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 32 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 067 057 030 Pre-fail Always - 8601381709 9 Power_On_Hours 0x0032 062 062 000 Old_age Always - 34042 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 32 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 098 098 000 Old_age Always - 2 190 Airflow_Temperature_Cel 0x0022 069 064 045 Old_age Always - 31 (Min/Max 20/33) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 092 092 000 Old_age Always - 17156 194 Temperature_Celsius 0x0022 031 040 000 Old_age Always - 31 (0 20 0 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 1849h+54m+59.374s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 14667824592 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 418504152720 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 12 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O 2020-03-02 6:38 ` David C. Rankin @ 2020-03-02 6:46 ` David C. Rankin 2020-03-02 6:51 ` Roman Mamedov 1 sibling, 0 replies; 14+ messages in thread From: David C. Rankin @ 2020-03-02 6:46 UTC (permalink / raw) To: mdraid On 03/02/2020 12:38 AM, David C. Rankin wrote: > Thank you Roman, iostat and smartctl -a for sdc/sdd attached, > > sdc has a few errors from a power hit taken 3000 hours ago or so, but since > that time it has been fine. I had rolled back to several earlier kernels from > Jan 14, Jan 21, and Jan 27 with no change, I then updated to current which is > Archlinux 5.5.6-arch1-1. > > I'm not sure what to make of the iostat output, but the r_await looks > suspicious. Could this all be due to one flaky disk without it throwing any > errors? Actually, I think sdc looks like the flaky culprit... So what is recommended in the interim, fail sdc and remove from array and run on sdd until the replacement arrives to rebuild? (seems better than limping along with sdc if that is the one causing all the slowdown) -- David C. Rankin, J.D.,P.E. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O 2020-03-02 6:38 ` David C. Rankin 2020-03-02 6:46 ` David C. Rankin @ 2020-03-02 6:51 ` Roman Mamedov 2020-03-02 6:57 ` David C. Rankin 1 sibling, 1 reply; 14+ messages in thread From: Roman Mamedov @ 2020-03-02 6:51 UTC (permalink / raw) To: David C. Rankin; +Cc: mdraid On Mon, 2 Mar 2020 00:38:16 -0600 "David C. Rankin" <drankinatty@suddenlinkmail.com> wrote: > On 03/01/2020 11:25 PM, Roman Mamedov wrote: > > On Sun, 1 Mar 2020 19:50:03 -0600 > > "David C. Rankin" <drankinatty@suddenlinkmail.com> wrote: > > > >> Let me know if there is anything else I can send, and let me know if I > >> should stop the scrub or just let it run. I'm happy to run any diagnostic you > >> can think of that might help. Thanks. > > > > It doesn't seem convincing that the issue is raw devices vs partitions, or > > even kernel version related, especially since you rolled it back and the issue > > remains. > > > > What else you could send is "smartctl -a" of all devices; > > > > and most importantly, while the "slow" scrub is running on md4, start: > > > > iostat -x 2 /dev/sdc /dev/sdd > > > > (enlarge the terminal window) and see if any of the 2 devices is pegged into > > 100.0 in the last "%util" column, or just showing much higher values there > > than the other one. > > > > Thank you Roman, iostat and smartctl -a for sdc/sdd attached, > > sdc has a few errors from a power hit taken 3000 hours ago or so, but since > that time it has been fine. I had rolled back to several earlier kernels from > Jan 14, Jan 21, and Jan 27 with no change, I then updated to current which is > Archlinux 5.5.6-arch1-1. These show not just a few errors, but that it is basically dying: 5 Reallocated_Sector_Ct 0x0033 089 089 010 Pre-fail Always 13648 197 Current_Pending_Sector 0x0012 085 085 000 Old_age Always 2544 198 Offline_Uncorrectable 0x0010 085 085 000 Old_age Offline 2544 > I'm not sure what to make of the iostat output, but the r_await looks > suspicious. Could this all be due to one flaky disk without it throwing any > errors? Yes, replace the drive ASAP, and see if that solves it. -- With respect, Roman ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O 2020-03-02 6:51 ` Roman Mamedov @ 2020-03-02 6:57 ` David C. Rankin 2020-03-02 7:08 ` Chris Murphy 2020-03-04 22:53 ` David C. Rankin 0 siblings, 2 replies; 14+ messages in thread From: David C. Rankin @ 2020-03-02 6:57 UTC (permalink / raw) To: mdraid On 03/02/2020 12:51 AM, Roman Mamedov wrote: > Yes, replace the drive ASAP, and see if that solves it. Will do, thank you! -- David C. Rankin, J.D.,P.E. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O 2020-03-02 6:57 ` David C. Rankin @ 2020-03-02 7:08 ` Chris Murphy 2020-03-02 9:27 ` David C. Rankin 2020-03-04 22:53 ` David C. Rankin 1 sibling, 1 reply; 14+ messages in thread From: Chris Murphy @ 2020-03-02 7:08 UTC (permalink / raw) To: David C. Rankin; +Cc: mdraid smart also reports for /de/sdc 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 So I'm suspicious of timeout mismatch as well. https://raid.wiki.kernel.org/index.php/Timeout_Mismatch Chris Murphy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O 2020-03-02 7:08 ` Chris Murphy @ 2020-03-02 9:27 ` David C. Rankin 2020-03-02 11:44 ` Phil Turmel 2020-03-02 21:09 ` Chris Murphy 0 siblings, 2 replies; 14+ messages in thread From: David C. Rankin @ 2020-03-02 9:27 UTC (permalink / raw) To: mdraid On 03/02/2020 01:08 AM, Chris Murphy wrote: > smart also reports for /de/sdc > > 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 > > > So I'm suspicious of timeout mismatch as well. > https://raid.wiki.kernel.org/index.php/Timeout_Mismatch > > > Chris Murphy > The strace between the virtualbox host and guess show and number of I/O waits that would seem to fit some timeout issue like that. But according to the page, both drives in this array provide: SCT capabilities: (0x1085) SCT Status supported. Which should be able to handle the correction without stumbling into the timeout problem. Something is FUBAR. On a Archlinux guest running on that array, At a text console when you type your user name and press [Enter], the login may timeout before the password: prompt is ever displayed. So this is really giving virtualbox fits. On the host itself, you don't really notice much, other than a bit of slowdown with readline and tab-completion every once in a while, but apps looking to that array -- all bets are off. And still not a single error in the journal or mailed from mdadm. You would think if it was going to take 26 days to scrub a 3T array, some error should pop up somewhere :-) -- David C. Rankin, J.D.,P.E. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O 2020-03-02 9:27 ` David C. Rankin @ 2020-03-02 11:44 ` Phil Turmel 2020-03-02 13:32 ` Wols Lists 2020-03-02 21:09 ` Chris Murphy 1 sibling, 1 reply; 14+ messages in thread From: Phil Turmel @ 2020-03-02 11:44 UTC (permalink / raw) To: David C. Rankin, mdraid Hi David, On 3/2/20 4:27 AM, David C. Rankin wrote: > On 03/02/2020 01:08 AM, Chris Murphy wrote: >> smart also reports for /de/sdc >> >> 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 >> >> >> So I'm suspicious of timeout mismatch as well. >> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch >> >> >> Chris Murphy >> > > The strace between the virtualbox host and guess show and number of I/O waits > that would seem to fit some timeout issue like that. But according to the > page, both drives in this array provide: > > SCT capabilities: (0x1085) SCT Status supported. SCT Status itself isn't sufficient. You must have ERC "Error Recovery Control", an optional part of SCT. smartctl -a doesn't expose that. Use smartctl -x in general, or smartctl -l scterc to specifically check the needed setting. Phil ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O 2020-03-02 11:44 ` Phil Turmel @ 2020-03-02 13:32 ` Wols Lists 2020-03-02 21:21 ` David C. Rankin 0 siblings, 1 reply; 14+ messages in thread From: Wols Lists @ 2020-03-02 13:32 UTC (permalink / raw) To: Phil Turmel, David C. Rankin, mdraid On 02/03/20 11:44, Phil Turmel wrote: > Hi David, > > On 3/2/20 4:27 AM, David C. Rankin wrote: >> On 03/02/2020 01:08 AM, Chris Murphy wrote: >>> smart also reports for /de/sdc >>> >>> 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 >>> >>> >>> So I'm suspicious of timeout mismatch as well. >>> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch >>> >>> >>> Chris Murphy >>> >> >> The strace between the virtualbox host and guess show and number of >> I/O waits >> that would seem to fit some timeout issue like that. But according to the >> page, both drives in this array provide: >> >> SCT capabilities: (0x1085) SCT Status supported. > > SCT Status itself isn't sufficient. You must have ERC "Error Recovery > Control", an optional part of SCT. > > smartctl -a doesn't expose that. Use smartctl -x in general, or > smartctl -l scterc to specifically check the needed setting. > It's a Seagate Barracuda ... nuff said (For David, Barracudas don't support SCT/ERC - they are not recommended for raid. Okay for 1 but definitely not anything else. Get an Ironwolf to replace it.) Cheers, Wol ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O 2020-03-02 13:32 ` Wols Lists @ 2020-03-02 21:21 ` David C. Rankin 0 siblings, 0 replies; 14+ messages in thread From: David C. Rankin @ 2020-03-02 21:21 UTC (permalink / raw) To: mdraid On 03/02/2020 07:32 AM, Wols Lists wrote: > It's a Seagate Barracuda ... nuff said > > (For David, Barracudas don't support SCT/ERC - they are not recommended > for raid. Okay for 1 but definitely not anything else. Get an Ironwolf > to replace it.) That's a good model! It's a shame that drive manufacturers have stripped functionality from most of the drives over the past 20 years. 20+ years ago when you when you bought a drive, it had all the drive features standard. Even back to the RLL/MFM days, all features were supported. Now with the 4 flavors of drives from every manufacturer it seems to be a race to put out the cheapest stripped down drives they can make. Then don't make 'em like they used to. Just checking the collection of old boxes still spinning, there is an ancient data drive hanging off one probably from the mid 2005 timeframe: === START OF INFORMATION SECTION === Model Family: Maxtor DiamondMax 10 (ATA/133 and SATA/150) Device Model: Maxtor 6L300R0 Serial Number: L604P3MH Firmware Version: BAH41E00 User Capacity: 300,090,728,448 bytes [300 GB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 Local Time is: Mon Mar 2 15:14:20 2020 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled It has probably been spinning continually for 12-15 years. So long the power-on hours are reported only as "17h+26m", I'll update the thread when the new drives come in and the raid is rebuilt and let you know how it goes. -- David C. Rankin, J.D.,P.E. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O 2020-03-02 9:27 ` David C. Rankin 2020-03-02 11:44 ` Phil Turmel @ 2020-03-02 21:09 ` Chris Murphy 1 sibling, 0 replies; 14+ messages in thread From: Chris Murphy @ 2020-03-02 21:09 UTC (permalink / raw) To: David C. Rankin; +Cc: mdraid On Mon, Mar 2, 2020 at 2:27 AM David C. Rankin <drankinatty@suddenlinkmail.com> wrote: > > On 03/02/2020 01:08 AM, Chris Murphy wrote: > > smart also reports for /de/sdc > > > > 40 53 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 > > > > > > So I'm suspicious of timeout mismatch as well. > > https://raid.wiki.kernel.org/index.php/Timeout_Mismatch > > > > > > Chris Murphy > > > > The strace between the virtualbox host and guess show and number of I/O waits > that would seem to fit some timeout issue like that. But according to the > page, both drives in this array provide: > > SCT capabilities: (0x1085) SCT Status supported. > Check the value 'smartctl -l scterc /dev/' Change the value 'smartctl -l scterc,70,70 /dev/' Of course no change needed if it's a value already below sysfs timeout value for each block device. Note that SCT ERC times are deciseconds. This is on the host. > Which should be able to handle the correction without stumbling into the > timeout problem. Something is FUBAR. On a Archlinux guest running on that > array, At a text console when you type your user name and press [Enter], the > login may timeout before the password: prompt is ever displayed. So this is > really giving virtualbox fits. Weird, I'm not sure what's causing that kind of latency in a vbox guest. > > On the host itself, you don't really notice much, other than a bit of slowdown > with readline and tab-completion every once in a while, but apps looking to > that array -- all bets are off. > > And still not a single error in the journal or mailed from mdadm. You would > think if it was going to take 26 days to scrub a 3T array, some error should > pop up somewhere :-) Yes. At the least the default SCSI command timer should spit back a hard link reset, both in the journal and to the device. I don't think mdadm will report that. -- Chris Murphy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O 2020-03-02 6:57 ` David C. Rankin 2020-03-02 7:08 ` Chris Murphy @ 2020-03-04 22:53 ` David C. Rankin 2020-03-05 17:18 ` Wols Lists 1 sibling, 1 reply; 14+ messages in thread From: David C. Rankin @ 2020-03-04 22:53 UTC (permalink / raw) To: mdraid On 03/02/2020 12:57 AM, David C. Rankin wrote: > On 03/02/2020 12:51 AM, Roman Mamedov wrote: >> Yes, replace the drive ASAP, and see if that solves it. > > Will do, thank you! > Drive replaced and rebuilding: md4 : active raid1 sdc[3] sdd[2] 2930135488 blocks super 1.2 [2/1] [_U] [>....................] recovery = 1.5% (46390912/2930135488) finish=276.0min speed=174102K/sec bitmap: 1/22 pages [4KB], 65536KB chunk Things are looking good, speed=174102K/sec, which is a far-sight better than speed=2022K/sec. This will give a 4.5 hour rebuild (instead of a 26 day scrub). I suspect the virtualbox problems will disappear as well once the rebuild is done. Thank you to everyone for helping get me pointed in the right direction. I'll let you know if I have any further issues here, but I don't anticipate any (fingers-crossed...) -- David C. Rankin, J.D.,P.E. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O 2020-03-04 22:53 ` David C. Rankin @ 2020-03-05 17:18 ` Wols Lists 0 siblings, 0 replies; 14+ messages in thread From: Wols Lists @ 2020-03-05 17:18 UTC (permalink / raw) To: David C. Rankin, mdraid On 04/03/20 22:53, David C. Rankin wrote: > On 03/02/2020 12:57 AM, David C. Rankin wrote: >> On 03/02/2020 12:51 AM, Roman Mamedov wrote: >>> Yes, replace the drive ASAP, and see if that solves it. >> >> Will do, thank you! >> > > Drive replaced and rebuilding: > > md4 : active raid1 sdc[3] sdd[2] > 2930135488 blocks super 1.2 [2/1] [_U] > [>....................] recovery = 1.5% (46390912/2930135488) > finish=276.0min speed=174102K/sec > bitmap: 1/22 pages [4KB], 65536KB chunk > > Things are looking good, speed=174102K/sec, which is a far-sight better than > speed=2022K/sec. This will give a 4.5 hour rebuild (instead of a 26 day > scrub). I suspect the virtualbox problems will disappear as well once the > rebuild is done. > > Thank you to everyone for helping get me pointed in the right direction. I'll > let you know if I have any further issues here, but I don't anticipate any > (fingers-crossed...) > Raid 1 - look at dm-integrity. That should make scrubbing (hopefully) redundant :-) I might at last soon get my new system up and running (got a shop to look at it - dud motherboard :-( Of course it's now out of warranty and my supplier has gone bust, but if the shop say it was dud from the start I might be able to claim something ... But that means I'll have a test system - I've acquired about 6 x 1TB drives - so I shall be playing with some slightly more heavy-duty raid configs :-) Cheers, Wol ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-03-05 17:18 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-02 1:50 Linux 5.5 Breaks Raid1 on Device instead of Partition, Unusable I/O David C. Rankin 2020-03-02 5:25 ` Roman Mamedov 2020-03-02 6:38 ` David C. Rankin 2020-03-02 6:46 ` David C. Rankin 2020-03-02 6:51 ` Roman Mamedov 2020-03-02 6:57 ` David C. Rankin 2020-03-02 7:08 ` Chris Murphy 2020-03-02 9:27 ` David C. Rankin 2020-03-02 11:44 ` Phil Turmel 2020-03-02 13:32 ` Wols Lists 2020-03-02 21:21 ` David C. Rankin 2020-03-02 21:09 ` Chris Murphy 2020-03-04 22:53 ` David C. Rankin 2020-03-05 17:18 ` Wols Lists
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.