* Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
@ 2023-04-23 19:09 Jove
2023-04-23 19:19 ` Reindl Harald
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Jove @ 2023-04-23 19:09 UTC (permalink / raw)
To: linux-raid
Hi,
I've added two drives to my raid5 array and tried to migrate
it to raid6 with the following command:
mdadm --grow /dev/md0 --raid-devices 4 --level 6
--backup-file=/root/mdadm_raid6_backup.md
This may have been my first mistake, as there are only 5
drives. it should have been --raid-devices 3, I think.
As soon as I started this grow, the filesystems went
unavailable. All processes trying to access files on it hung.
I searched the web which said a reboot during a rebuild
was not problematic if things shut down cleanly, so I
rebooted. The reboot hung too. The drive activity
continued so I let it run overnight. I did wake up to a
rebooted system in emergency mode as it could not
mount all the partitions on the raid array.
The OS tried to reassemble the array and succeeded.
However the udev processes that try to create the dev
entries hang.
I went back to Google and found out how i could reboot
my system without this automatic assemble.
I tried reassembling the array with:
mdadm --verbose --assemble --backup-file mdadm_raid6_backup.md0 /dev/md0
This failed with:
No backup metadata on mdadm_raid6_backup.md0
Failed to find final backup of critical section.
Failed to restore critical section for reshape, sorry.
I tried again wtih:
mdadm --verbose --assemble --backup-file mdadm_raid6_backup.md0
--invalid-backup /dev/md0
Rhis said in addition to the lines above:
continuying without restoring backup
This seemed to have succeeded in reassembling the
array but it also hangs indefinitely.
/proc/mdstat now shows:
md0 : active (read-only) raid6 sdc1[0] sde[4](S) sdf[5] sdd1[3] sdg1[1]
7813771264 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_]
bitmap: 1/30 pages [4KB], 65536KB chunk
Again the udev processes trying to access this device hung indefinitely
Eventually, the kernel dumps this in my journal:
Apr 23 19:17:22 atom kernel: task:systemd-udevd state:D stack: 0
pid: 8121 ppid: 706 flags:0x00000006
Apr 23 19:17:22 atom kernel: Call Trace:
Apr 23 19:17:22 atom kernel: <TASK>
Apr 23 19:17:22 atom kernel: __schedule+0x20a/0x550
Apr 23 19:17:22 atom kernel: schedule+0x5a/0xc0
Apr 23 19:17:22 atom kernel: schedule_timeout+0x11f/0x160
Apr 23 19:17:22 atom kernel: ? make_stripe_request+0x284/0x490 [raid456]
Apr 23 19:17:22 atom kernel: wait_woken+0x50/0x70
Apr 23 19:17:22 atom kernel: raid5_make_request+0x2cb/0x3e0 [raid456]
Apr 23 19:17:22 atom kernel: ? sched_show_numa+0xf0/0xf0
Apr 23 19:17:22 atom kernel: md_handle_request+0x132/0x1e0
Apr 23 19:17:22 atom kernel: ? do_mpage_readpage+0x282/0x6b0
Apr 23 19:17:22 atom kernel: __submit_bio+0x86/0x130
Apr 23 19:17:22 atom kernel: __submit_bio_noacct+0x81/0x1f0
Apr 23 19:17:22 atom kernel: mpage_readahead+0x15c/0x1d0
Apr 23 19:17:22 atom kernel: ? blkdev_write_begin+0x20/0x20
Apr 23 19:17:22 atom kernel: read_pages+0x58/0x2f0
Apr 23 19:17:22 atom kernel: page_cache_ra_unbounded+0x137/0x180
Apr 23 19:17:22 atom kernel: force_page_cache_ra+0xc5/0xf0
Apr 23 19:17:22 atom kernel: filemap_get_pages+0xe4/0x350
Apr 23 19:17:22 atom kernel: filemap_read+0xbe/0x3c0
Apr 23 19:17:22 atom kernel: ? make_kgid+0x13/0x20
Apr 23 19:17:22 atom kernel: ? deactivate_locked_super+0x90/0xa0
Apr 23 19:17:22 atom kernel: blkdev_read_iter+0xaf/0x170
Apr 23 19:17:22 atom kernel: new_sync_read+0xf9/0x180
Apr 23 19:17:22 atom kernel: vfs_read+0x13c/0x190
Apr 23 19:17:22 atom kernel: ksys_read+0x5f/0xe0
Apr 23 19:17:22 atom kernel: do_syscall_64+0x59/0x90
Apr 23 19:17:22 atom kernel: ? do_user_addr_fault+0x1dd/0x6b0
Apr 23 19:17:22 atom kernel: ? do_syscall_64+0x69/0x90
Apr 23 19:17:22 atom kernel: ? exc_page_fault+0x62/0x150
Apr 23 19:17:22 atom kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
Apr 23 19:17:22 atom kernel: RIP: 0033:0x7fb20653eaf2
Apr 23 19:17:22 atom kernel: RSP: 002b:00007ffe1e3e8d28 EFLAGS:
00000246 ORIG_RAX: 0000000000000000
Apr 23 19:17:22 atom kernel: RAX: ffffffffffffffda RBX:
0000555888b0e0b8 RCX: 00007fb20653eaf2
Apr 23 19:17:22 atom kernel: RDX: 0000000000000040 RSI:
0000555888b0e0c8 RDI: 000000000000000d
Apr 23 19:17:22 atom kernel: RBP: 0000555888ad64e0 R08:
0000000000000000 R09: 0000000000000000
Apr 23 19:17:22 atom kernel: R10: 0000000000000010 R11:
0000000000000246 R12: 00000746f2bf0000
Apr 23 19:17:22 atom kernel: R13: 0000000000000040 R14:
0000555888b0e0a0 R15: 0000555888ad6530
Apr 23 19:17:22 atom kernel: </TASK>
Any help to recover the data on my array would be much appreciated.
Additional system and drive information below.
Thank you for your attention,
Johan
This is the hung mdadm command.
# cat /proc/8110/stack
[<0>] mddev_suspend+0x14f/0x180
[<0>] suspend_lo_store+0x60/0xb0
[<0>] md_attr_store+0x80/0xf0
[<0>] kernfs_fop_write_iter+0x121/0x1b0
[<0>] new_sync_write+0xfc/0x190
[<0>] vfs_write+0x1ef/0x280
[<0>] ksys_write+0x5f/0xe0
[<0>] do_syscall_64+0x59/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
# cat /etc/centos-release
CentOS Stream release 9
# uname -a
Linux atom 5.14.0-299.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 13
10:08:03 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
# mdadm --version
mdadm - v4.2 - 2021-12-30 - 8
# mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Sat Oct 21 01:57:20 2017
Raid Level : raid6
Array Size : 7813771264 (7.28 TiB 8.00 TB)
Used Dev Size : 3906885632 (3.64 TiB 4.00 TB)
Raid Devices : 4
Total Devices : 5
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sun Apr 23 10:32:01 2023
State : clean, degraded
Active Devices : 3
Working Devices : 5
Failed Devices : 0
Spare Devices : 2
Layout : left-symmetric-6
Chunk Size : 512K
Consistency Policy : bitmap
New Layout : left-symmetric
Name : atom:0 (local to host atom)
UUID : 8c56384e:ba1a3cec:aaf34c17:d0cd9318
Events : 669453
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 97 1 active sync /dev/sdg1
3 8 49 2 active sync /dev/sdd1
5 8 80 3 spare rebuilding /dev/sdf
4 8 64 - spare /dev/sde
# mdadm --examine /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 8c56384e:ba1a3cec:aaf34c17:d0cd9318
Name : atom:0 (local to host atom)
Creation Time : Sat Oct 21 01:57:20 2017
Raid Level : raid6
Raid Devices : 4
Avail Dev Size : 7813771264 sectors (3.64 TiB 4.00 TB)
Array Size : 7813771264 KiB (7.28 TiB 8.00 TB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : e6cbce38:ce3a1997:254cd445:65a67d5d
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 3473357824 (3.23 TiB 3.56 TB)
New Layout : left-symmetric
Update Time : Sun Apr 23 10:32:01 2023
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : f3ffb20c - correct
Events : 669453
Layout : left-symmetric-6
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sdg1
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 8c56384e:ba1a3cec:aaf34c17:d0cd9318
Name : atom:0 (local to host atom)
Creation Time : Sat Oct 21 01:57:20 2017
Raid Level : raid6
Raid Devices : 4
Avail Dev Size : 7813771264 sectors (3.64 TiB 4.00 TB)
Array Size : 7813771264 KiB (7.28 TiB 8.00 TB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : 9c130a77:d12da8fa:ca8a2e59:4778168e
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 3473357824 (3.23 TiB 3.56 TB)
New Layout : left-symmetric
Update Time : Sun Apr 23 10:32:01 2023
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : d9bcfd4e - correct
Events : 669453
Layout : left-symmetric-6
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 8c56384e:ba1a3cec:aaf34c17:d0cd9318
Name : atom:0 (local to host atom)
Creation Time : Sat Oct 21 01:57:20 2017
Raid Level : raid6
Raid Devices : 4
Avail Dev Size : 7813771264 sectors (3.64 TiB 4.00 TB)
Array Size : 7813771264 KiB (7.28 TiB 8.00 TB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : c298e079:1f616f66:3e4c5df6:cb942253
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 3473357824 (3.23 TiB 3.56 TB)
New Layout : left-symmetric
Update Time : Sun Apr 23 10:32:01 2023
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : aa9593c4 - correct
Events : 669453
Layout : left-symmetric-6
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sdf
/dev/sdf:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x7
Array UUID : 8c56384e:ba1a3cec:aaf34c17:d0cd9318
Name : atom:0 (local to host atom)
Creation Time : Sat Oct 21 01:57:20 2017
Raid Level : raid6
Raid Devices : 4
Avail Dev Size : 7813775024 sectors (3.64 TiB 4.00 TB)
Array Size : 7813771264 KiB (7.28 TiB 8.00 TB)
Used Dev Size : 7813771264 sectors (3.64 TiB 4.00 TB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Recovery Offset : 3473357824 sectors
Unused Space : before=262064 sectors, after=3760 sectors
State : clean
Device UUID : 277110b0:d174c17a:3bac9963:405bf18e
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 3473357824 (3.23 TiB 3.56 TB)
New Layout : left-symmetric
Update Time : Sun Apr 23 10:32:01 2023
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : 6d29f0ca - correct
Events : 669453
Layout : left-symmetric-6
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm --examine /dev/sde
/dev/sde:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x5
Array UUID : 8c56384e:ba1a3cec:aaf34c17:d0cd9318
Name : atom:0 (local to host atom)
Creation Time : Sat Oct 21 01:57:20 2017
Raid Level : raid6
Raid Devices : 4
Avail Dev Size : 7813775024 sectors (3.64 TiB 4.00 TB)
Array Size : 7813771264 KiB (7.28 TiB 8.00 TB)
Used Dev Size : 7813771264 sectors (3.64 TiB 4.00 TB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262064 sectors, after=3760 sectors
State : clean
Device UUID : 000ceb71:ab7291e6:5721b832:5003c849
Internal Bitmap : 8 sectors from superblock
Reshape pos'n : 3473357824 (3.23 TiB 3.56 TB)
New Layout : left-symmetric
Update Time : Sun Apr 23 10:32:01 2023
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : 55c26aa5 - correct
Events : 669453
Layout : left-symmetric-6
Chunk Size : 512K
Device Role : spare
Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
# smartctl --xall /dev/sdc1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.0-299.el9.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68N32N0
Serial Number: WD-WCC7K6DNPVFP
LU WWN Device Id: 5 0014ee 20f133383
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Apr 23 20:48:22 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (43740) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 464) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS--K 192 162 021 - 5375
4 Start_Stop_Count -O--CK 100 100 000 - 83
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 041 041 000 - 43289
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 83
192 Power-Off_Retract_Count -O--CK 200 200 000 - 65
193 Load_Cycle_Count -O--CK 200 200 000 - 193
194 Temperature_Celsius -O---K 115 096 000 - 35
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 34
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb6 GPL,SL VS 1 Device vendor specific log
0xb7 GPL,SL VS 56 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 43107 -
# 2 Short offline Completed without error 00% 42939 -
# 3 Short offline Completed without error 00% 42771 -
# 4 Extended offline Completed without error 00% 42760 -
# 5 Short offline Completed without error 00% 42437 -
# 6 Short offline Completed without error 00% 42269 -
# 7 Short offline Completed without error 00% 42101 -
# 8 Extended offline Completed without error 00% 42017 -
# 9 Short offline Completed without error 00% 41933 -
#10 Short offline Completed without error 00% 41765 -
#11 Short offline Completed without error 00% 41598 -
#12 Short offline Completed without error 00% 41430 -
#13 Extended offline Completed without error 00% 41346 -
#14 Short offline Completed without error 00% 41262 -
#15 Short offline Completed without error 00% 41094 -
#16 Short offline Completed without error 00% 40926 -
#17 Short offline Completed without error 00% 40759 -
#18 Extended offline Completed without error 00% 40602 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
Device State: Active (0)
Current Temperature: 35 Celsius
Power Cycle Min/Max Temperature: 35/38 Celsius
Lifetime Min/Max Temperature: 20/54 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/65 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (233)
Index Estimated Time Temperature Celsius
234 2023-04-23 12:51 36 *****************
... ..( 20 skipped). .. *****************
255 2023-04-23 13:12 36 *****************
256 2023-04-23 13:13 35 ****************
... ..( 20 skipped). .. ****************
277 2023-04-23 13:34 35 ****************
278 2023-04-23 13:35 36 *****************
... ..( 22 skipped). .. *****************
301 2023-04-23 13:58 36 *****************
302 2023-04-23 13:59 37 ******************
... ..( 10 skipped). .. ******************
313 2023-04-23 14:10 37 ******************
314 2023-04-23 14:11 36 *****************
315 2023-04-23 14:12 37 ******************
... ..( 44 skipped). .. ******************
360 2023-04-23 14:57 37 ******************
361 2023-04-23 14:58 38 *******************
... ..( 11 skipped). .. *******************
373 2023-04-23 15:10 38 *******************
374 2023-04-23 15:11 37 ******************
... ..( 54 skipped). .. ******************
429 2023-04-23 16:06 37 ******************
430 2023-04-23 16:07 36 *****************
... ..( 38 skipped). .. *****************
469 2023-04-23 16:46 36 *****************
470 2023-04-23 16:47 35 ****************
... ..( 50 skipped). .. ****************
43 2023-04-23 17:38 35 ****************
44 2023-04-23 17:39 38 *******************
... ..( 71 skipped). .. *******************
116 2023-04-23 18:51 38 *******************
117 2023-04-23 18:52 39 ********************
... ..( 48 skipped). .. ********************
166 2023-04-23 19:41 39 ********************
167 2023-04-23 19:42 38 *******************
... ..( 4 skipped). .. *******************
172 2023-04-23 19:47 38 *******************
173 2023-04-23 19:48 37 ******************
... ..( 5 skipped). .. ******************
179 2023-04-23 19:54 37 ******************
180 2023-04-23 19:55 36 *****************
... ..( 5 skipped). .. *****************
186 2023-04-23 20:01 36 *****************
187 2023-04-23 20:02 ? -
188 2023-04-23 20:03 33 **************
... ..( 4 skipped). .. **************
193 2023-04-23 20:08 33 **************
194 2023-04-23 20:09 ? -
195 2023-04-23 20:10 34 ***************
... ..( 8 skipped). .. ***************
204 2023-04-23 20:19 34 ***************
205 2023-04-23 20:20 35 ****************
... ..( 15 skipped). .. ****************
221 2023-04-23 20:36 35 ****************
222 2023-04-23 20:37 ? -
223 2023-04-23 20:38 35 ****************
224 2023-04-23 20:39 ? -
225 2023-04-23 20:40 35 ****************
226 2023-04-23 20:41 35 ****************
227 2023-04-23 20:42 ? -
228 2023-04-23 20:43 36 *****************
... ..( 4 skipped). .. *****************
233 2023-04-23 20:48 36 *****************
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 1) ==
0x01 0x008 4 83 --- Lifetime Power-On Resets
0x01 0x010 4 43289 --- Power-on Hours
0x01 0x018 6 15614407059 --- Logical Sectors Written
0x01 0x020 6 599311580 --- Number of Write Commands
0x01 0x028 6 908162628478 --- Logical Sectors Read
0x01 0x030 6 2826279430 --- Number of Read Commands
0x01 0x038 6 1221577344 --- Date and Time TimeStamp
0x03 ===== = = === == Rotating Media Statistics (rev 1) ==
0x03 0x008 4 47885 --- Spindle Motor Power-on Hours
0x03 0x010 4 47296 --- Head Flying Hours
0x03 0x018 4 259 --- Head Load Events
0x03 0x020 4 0 --- Number of Reallocated Logical Sectors
0x03 0x028 4 0 --- Read Recovery Attempts
0x03 0x030 4 0 --- Number of Mechanical Start Failures
0x03 0x038 4 0 --- Number of Realloc. Candidate
Logical Sectors
0x03 0x040 4 65 --- Number of High Priority Unload Events
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors
0x04 0x010 4 1 --- Resets Between Cmd Acceptance and
Completion
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 36 --- Current Temperature
0x05 0x010 1 37 --- Average Short Term Temperature
0x05 0x018 1 34 --- Average Long Term Temperature
0x05 0x020 1 54 --- Highest Temperature
0x05 0x028 1 23 --- Lowest Temperature
0x05 0x030 1 52 --- Highest Average Short Term Temperature
0x05 0x038 1 27 --- Lowest Average Short Term Temperature
0x05 0x040 1 44 --- Highest Average Long Term Temperature
0x05 0x048 1 31 --- Lowest Average Long Term Temperature
0x05 0x050 4 0 --- Time in Over-Temperature
0x05 0x058 1 65 --- Specified Maximum Operating Temperature
0x05 0x060 4 0 --- Time in Under-Temperature
0x05 0x068 1 0 --- Specified Minimum Operating Temperature
0x06 ===== = = === == Transport Statistics (rev 1) ==
0x06 0x008 4 809 --- Number of Hardware Resets
0x06 0x010 4 374 --- Number of ASR Events
0x06 0x018 4 0 --- Number of Interface CRC Errors
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
Pending Defects log (GP Log 0x0c) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 88 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 82 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 0 Non-CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 18432 Vendor specific
# smartctl --xall /dev/sdg
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.0-299.el9.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68N32N0
Serial Number: WD-WCC7K3EXJ3S7
LU WWN Device Id: 5 0014ee 264687983
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sun Apr 23 20:48:54 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (43440) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 462) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS--K 185 156 021 - 5716
4 Start_Stop_Count -O--CK 100 100 000 - 83
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 100 253 000 - 0
9 Power_On_Hours -O--CK 043 043 000 - 42100
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 83
192 Power-Off_Retract_Count -O--CK 200 200 000 - 65
193 Load_Cycle_Count -O--CK 200 200 000 - 199
194 Temperature_Celsius -O---K 117 102 000 - 33
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb6 GPL,SL VS 1 Device vendor specific log
0xb7 GPL,SL VS 56 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 41918 -
# 2 Short offline Completed without error 00% 41750 -
# 3 Short offline Completed without error 00% 41582 -
# 4 Extended offline Completed without error 00% 41570 -
# 5 Short offline Completed without error 00% 41248 -
# 6 Short offline Completed without error 00% 41080 -
# 7 Short offline Completed without error 00% 40912 -
# 8 Extended offline Completed without error 00% 40828 -
# 9 Short offline Completed without error 00% 40744 -
#10 Short offline Completed without error 00% 40576 -
#11 Short offline Completed without error 00% 40408 -
#12 Short offline Completed without error 00% 40241 -
#13 Extended offline Completed without error 00% 40157 -
#14 Short offline Completed without error 00% 40073 -
#15 Short offline Completed without error 00% 41098 -
#16 Short offline Completed without error 00% 40930 -
#17 Short offline Completed without error 00% 40762 -
#18 Extended offline Completed without error 00% 40606 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
Device State: Active (0)
Current Temperature: 33 Celsius
Power Cycle Min/Max Temperature: 33/36 Celsius
Lifetime Min/Max Temperature: 19/48 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/65 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (453)
Index Estimated Time Temperature Celsius
454 2023-04-23 12:51 35 ****************
... ..( 11 skipped). .. ****************
466 2023-04-23 13:03 35 ****************
467 2023-04-23 13:04 34 ***************
... ..( 32 skipped). .. ***************
22 2023-04-23 13:37 34 ***************
23 2023-04-23 13:38 35 ****************
... ..( 60 skipped). .. ****************
84 2023-04-23 14:39 35 ****************
85 2023-04-23 14:40 36 *****************
... ..( 48 skipped). .. *****************
134 2023-04-23 15:29 36 *****************
135 2023-04-23 15:30 35 ****************
... ..( 44 skipped). .. ****************
180 2023-04-23 16:15 35 ****************
181 2023-04-23 16:16 34 ***************
... ..( 69 skipped). .. ***************
251 2023-04-23 17:26 34 ***************
252 2023-04-23 17:27 33 **************
... ..( 10 skipped). .. **************
263 2023-04-23 17:38 33 **************
264 2023-04-23 17:39 36 *****************
... ..(140 skipped). .. *****************
405 2023-04-23 20:00 36 *****************
406 2023-04-23 20:01 ? -
407 2023-04-23 20:02 31 ************
... ..( 4 skipped). .. ************
412 2023-04-23 20:07 31 ************
413 2023-04-23 20:08 ? -
414 2023-04-23 20:09 32 *************
... ..( 3 skipped). .. *************
418 2023-04-23 20:13 32 *************
419 2023-04-23 20:14 33 **************
... ..( 8 skipped). .. **************
428 2023-04-23 20:23 33 **************
429 2023-04-23 20:24 34 ***************
... ..( 10 skipped). .. ***************
440 2023-04-23 20:35 34 ***************
441 2023-04-23 20:36 ? -
442 2023-04-23 20:37 34 ***************
443 2023-04-23 20:38 ? -
444 2023-04-23 20:39 34 ***************
445 2023-04-23 20:40 34 ***************
446 2023-04-23 20:41 ? -
447 2023-04-23 20:42 35 ****************
... ..( 5 skipped). .. ****************
453 2023-04-23 20:48 35 ****************
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 1) ==
0x01 0x008 4 83 --- Lifetime Power-On Resets
0x01 0x010 4 42100 --- Power-on Hours
0x01 0x018 6 15407958681 --- Logical Sectors Written
0x01 0x020 6 595027021 --- Number of Write Commands
0x01 0x028 6 908203824645 --- Logical Sectors Read
0x01 0x030 6 2834811358 --- Number of Read Commands
0x01 0x038 6 1236144640 --- Date and Time TimeStamp
0x03 ===== = = === == Rotating Media Statistics (rev 1) ==
0x03 0x008 4 47888 --- Spindle Motor Power-on Hours
0x03 0x010 4 47298 --- Head Flying Hours
0x03 0x018 4 265 --- Head Load Events
0x03 0x020 4 0 --- Number of Reallocated Logical Sectors
0x03 0x028 4 11 --- Read Recovery Attempts
0x03 0x030 4 0 --- Number of Mechanical Start Failures
0x03 0x038 4 0 --- Number of Realloc. Candidate
Logical Sectors
0x03 0x040 4 65 --- Number of High Priority Unload Events
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors
0x04 0x010 4 0 --- Resets Between Cmd Acceptance and
Completion
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 34 --- Current Temperature
0x05 0x010 1 35 --- Average Short Term Temperature
0x05 0x018 1 30 --- Average Long Term Temperature
0x05 0x020 1 48 --- Highest Temperature
0x05 0x028 1 22 --- Lowest Temperature
0x05 0x030 1 46 --- Highest Average Short Term Temperature
0x05 0x038 1 24 --- Lowest Average Short Term Temperature
0x05 0x040 1 39 --- Highest Average Long Term Temperature
0x05 0x048 1 26 --- Lowest Average Long Term Temperature
0x05 0x050 4 0 --- Time in Over-Temperature
0x05 0x058 1 65 --- Specified Maximum Operating Temperature
0x05 0x060 4 0 --- Time in Under-Temperature
0x05 0x068 1 0 --- Specified Minimum Operating Temperature
0x06 ===== = = === == Transport Statistics (rev 1) ==
0x06 0x008 4 793 --- Number of Hardware Resets
0x06 0x010 4 332 --- Number of ASR Events
0x06 0x018 4 0 --- Number of Interface CRC Errors
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
Pending Defects log (GP Log 0x0c) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 88 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 82 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 0 Non-CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 18464 Vendor specific
# smartctl --xall /dev/sdd
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.0-299.el9.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4E0645620
LU WWN Device Id: 5 0014ee 2b438eb78
Firmware Version: 80.00A80
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Apr 23 20:49:30 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (55560) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 555) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 12
3 Spin_Up_Time POS--K 190 177 021 - 7458
4 Start_Stop_Count -O--CK 097 097 000 - 3213
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 100 253 000 - 0
9 Power_On_Hours -O--CK 014 014 000 - 62899
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 111
192 Power-Off_Retract_Count -O--CK 200 200 000 - 78
193 Load_Cycle_Count -O--CK 198 198 000 - 6721
194 Temperature_Celsius -O---K 117 097 000 - 35
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb7 GPL,SL VS 1 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 14
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 14 [13] occurred at disk power-on lifetime: 21182 hours (882
days + 14 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 b7 3e 3a b8 40 00 Error: UNC at LBA =
0x1b73e3ab8 = 7369276088
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 d8 00 01 b7 3e 3a f0 40 08 8d+09:09:02.085 READ FPDMA QUEUED
60 00 08 00 d0 00 01 b7 3e 3b 10 40 08 8d+09:09:02.085 READ FPDMA QUEUED
60 00 08 00 c8 00 01 b7 3e 3a e8 40 08 8d+09:09:02.085 READ FPDMA QUEUED
60 00 08 00 c0 00 01 b7 3e 3a e0 40 08 8d+09:09:02.085 READ FPDMA QUEUED
60 00 08 00 b8 00 01 b7 3e 3a d8 40 08 8d+09:09:02.084 READ FPDMA QUEUED
Error 13 [12] occurred at disk power-on lifetime: 21182 hours (882
days + 14 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 b7 3e 3a b8 40 00 Error: UNC at LBA =
0x1b73e3ab8 = 7369276088
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 d8 00 01 b7 3e 3b a8 40 08 8d+09:08:58.566 READ FPDMA QUEUED
60 00 08 00 d0 00 01 b7 3e 3b a0 40 08 8d+09:08:58.566 READ FPDMA QUEUED
60 00 08 00 c8 00 01 b7 3e 3b 98 40 08 8d+09:08:58.566 READ FPDMA QUEUED
60 00 08 00 c0 00 01 b7 3e 3b 90 40 08 8d+09:08:58.566 READ FPDMA QUEUED
60 00 08 00 b8 00 01 b7 3e 3b 88 40 08 8d+09:08:58.566 READ FPDMA QUEUED
Error 12 [11] occurred at disk power-on lifetime: 21182 hours (882
days + 14 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 b7 3e 3a b8 40 00 Error: UNC at LBA =
0x1b73e3ab8 = 7369276088
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 d8 00 01 b7 3e 3a f0 40 08 8d+09:08:55.047 READ FPDMA QUEUED
60 00 08 00 d0 00 01 b7 3e 3b 10 40 08 8d+09:08:55.047 READ FPDMA QUEUED
60 00 08 00 c8 00 01 b7 3e 3a e8 40 08 8d+09:08:55.047 READ FPDMA QUEUED
60 00 08 00 c0 00 01 b7 3e 3a e0 40 08 8d+09:08:55.047 READ FPDMA QUEUED
60 00 08 00 b8 00 01 b7 3e 3a d8 40 08 8d+09:08:55.047 READ FPDMA QUEUED
Error 11 [10] occurred at disk power-on lifetime: 21182 hours (882
days + 14 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 b7 3e 3a b8 40 00 Error: UNC at LBA =
0x1b73e3ab8 = 7369276088
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 d8 00 01 b7 3e 3b a8 40 08 8d+09:08:51.528 READ FPDMA QUEUED
60 00 08 00 d0 00 01 b7 3e 3b a0 40 08 8d+09:08:51.528 READ FPDMA QUEUED
60 00 08 00 c8 00 01 b7 3e 3b 98 40 08 8d+09:08:51.528 READ FPDMA QUEUED
60 00 08 00 c0 00 01 b7 3e 3b 90 40 08 8d+09:08:51.528 READ FPDMA QUEUED
60 00 08 00 b8 00 01 b7 3e 3b 88 40 08 8d+09:08:51.528 READ FPDMA QUEUED
Error 10 [9] occurred at disk power-on lifetime: 21182 hours (882 days
+ 14 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 b7 3e 3a b8 40 00 Error: UNC at LBA =
0x1b73e3ab8 = 7369276088
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 d8 00 01 b7 3e 3a f0 40 08 8d+09:08:48.010 READ FPDMA QUEUED
60 00 08 00 d0 00 01 b7 3e 3b 10 40 08 8d+09:08:48.010 READ FPDMA QUEUED
60 00 08 00 c8 00 01 b7 3e 3a e8 40 08 8d+09:08:48.010 READ FPDMA QUEUED
60 00 08 00 c0 00 01 b7 3e 3a e0 40 08 8d+09:08:48.010 READ FPDMA QUEUED
60 00 08 00 b8 00 01 b7 3e 3a d8 40 08 8d+09:08:48.010 READ FPDMA QUEUED
Error 9 [8] occurred at disk power-on lifetime: 21182 hours (882 days
+ 14 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 b7 3e 3a b8 40 00 Error: UNC at LBA =
0x1b73e3ab8 = 7369276088
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 d8 00 01 b7 3e 3b a8 40 08 8d+09:08:44.517 READ FPDMA QUEUED
60 00 08 00 d0 00 01 b7 3e 3b a0 40 08 8d+09:08:44.516 READ FPDMA QUEUED
60 00 08 00 c8 00 01 b7 3e 3b 98 40 08 8d+09:08:44.509 READ FPDMA QUEUED
60 00 08 00 c0 00 01 b7 3e 3b 90 40 08 8d+09:08:44.502 READ FPDMA QUEUED
60 00 08 00 b8 00 01 b7 3e 3b 88 40 08 8d+09:08:44.495 READ FPDMA QUEUED
Error 8 [7] occurred at disk power-on lifetime: 21182 hours (882 days
+ 14 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 b7 3e 3a b8 40 00 Error: UNC at LBA =
0x1b73e3ab8 = 7369276088
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 04 00 00 90 00 01 b7 3e 3e 08 40 08 8d+09:08:40.925 READ FPDMA QUEUED
60 04 00 00 88 00 01 b7 3e 42 08 40 08 8d+09:08:40.925 READ FPDMA QUEUED
60 04 00 00 80 00 01 b7 3e 46 08 40 08 8d+09:08:40.925 READ FPDMA QUEUED
60 04 00 00 78 00 01 b7 3e 62 08 40 08 8d+09:08:40.925 READ FPDMA QUEUED
60 04 00 00 70 00 01 b7 3e 66 08 40 08 8d+09:08:40.925 READ FPDMA QUEUED
Error 7 [6] occurred at disk power-on lifetime: 21182 hours (882 days
+ 14 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 b7 3e 3a b8 40 00 Error: UNC at LBA =
0x1b73e3ab8 = 7369276088
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 04 00 00 18 00 01 b7 3e 62 08 40 08 8d+09:08:37.429 READ FPDMA QUEUED
60 04 00 00 10 00 01 b7 3e 46 08 40 08 8d+09:08:37.429 READ FPDMA QUEUED
60 04 00 00 08 00 01 b7 3e 42 08 40 08 8d+09:08:37.429 READ FPDMA QUEUED
60 04 00 00 00 00 01 b7 3e 3e 08 40 08 8d+09:08:37.429 READ FPDMA QUEUED
60 04 00 00 f0 00 01 b7 3e 3a 08 40 08 8d+09:08:37.429 READ FPDMA QUEUED
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 62717 -
# 2 Short offline Completed without error 00% 62549 -
# 3 Short offline Completed without error 00% 62382 -
# 4 Extended offline Completed without error 00% 62373 -
# 5 Short offline Completed without error 00% 62047 -
# 6 Short offline Completed without error 00% 61879 -
# 7 Short offline Completed without error 00% 61711 -
# 8 Extended offline Completed without error 00% 61631 -
# 9 Short offline Completed without error 00% 61544 -
#10 Short offline Completed without error 00% 61376 -
#11 Short offline Completed without error 00% 61208 -
#12 Short offline Completed without error 00% 61040 -
#13 Extended offline Completed without error 00% 60959 -
#14 Short offline Completed without error 00% 60872 -
#15 Short offline Completed without error 00% 61897 -
#16 Short offline Completed without error 00% 61730 -
#17 Short offline Completed without error 00% 61562 -
#18 Extended offline Completed without error 00% 61409 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
Device State: Active (0)
Current Temperature: 35 Celsius
Power Cycle Min/Max Temperature: 35/38 Celsius
Lifetime Min/Max Temperature: 16/55 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (211)
Index Estimated Time Temperature Celsius
212 2023-04-23 12:52 36 *****************
... ..( 17 skipped). .. *****************
230 2023-04-23 13:10 36 *****************
231 2023-04-23 13:11 37 ******************
... ..( 61 skipped). .. ******************
293 2023-04-23 14:13 37 ******************
294 2023-04-23 14:14 38 *******************
... ..( 22 skipped). .. *******************
317 2023-04-23 14:37 38 *******************
318 2023-04-23 14:38 37 ******************
319 2023-04-23 14:39 37 ******************
320 2023-04-23 14:40 38 *******************
... ..( 9 skipped). .. *******************
330 2023-04-23 14:50 38 *******************
331 2023-04-23 14:51 37 ******************
... ..( 52 skipped). .. ******************
384 2023-04-23 15:44 37 ******************
385 2023-04-23 15:45 36 *****************
... ..( 57 skipped). .. *****************
443 2023-04-23 16:43 36 *****************
444 2023-04-23 16:44 35 ****************
... ..( 20 skipped). .. ****************
465 2023-04-23 17:05 35 ****************
466 2023-04-23 17:06 38 *******************
... ..(109 skipped). .. *******************
98 2023-04-23 18:56 38 *******************
99 2023-04-23 18:57 39 ********************
... ..( 28 skipped). .. ********************
128 2023-04-23 19:26 39 ********************
129 2023-04-23 19:27 38 *******************
130 2023-04-23 19:28 ? -
131 2023-04-23 19:29 33 **************
... ..( 2 skipped). .. **************
134 2023-04-23 19:32 33 **************
135 2023-04-23 19:33 ? -
136 2023-04-23 19:34 34 ***************
... ..( 3 skipped). .. ***************
140 2023-04-23 19:38 34 ***************
141 2023-04-23 19:39 35 ****************
... ..( 8 skipped). .. ****************
150 2023-04-23 19:48 35 ****************
151 2023-04-23 19:49 36 *****************
... ..( 12 skipped). .. *****************
164 2023-04-23 20:02 36 *****************
165 2023-04-23 20:03 ? -
166 2023-04-23 20:04 36 *****************
167 2023-04-23 20:05 ? -
168 2023-04-23 20:06 36 *****************
169 2023-04-23 20:07 36 *****************
170 2023-04-23 20:08 ? -
171 2023-04-23 20:09 37 ******************
172 2023-04-23 20:10 36 *****************
... ..( 29 skipped). .. *****************
202 2023-04-23 20:40 36 *****************
203 2023-04-23 20:41 35 ****************
... ..( 2 skipped). .. ****************
206 2023-04-23 20:44 35 ****************
207 2023-04-23 20:45 36 *****************
... ..( 3 skipped). .. *****************
211 2023-04-23 20:49 36 *****************
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP/SMART Log 0x04) not supported
Pending Defects log (GP Log 0x0c) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 87 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 73 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 18495 Vendor specific
# smartctl --xall /dev/sde
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.0-299.el9.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: WDC WD40EFPX-68C6CN0
Serial Number: WD-WXK2AA2HCDY2
LU WWN Device Id: 5 0014ee 26ada4de8
Firmware Version: 81.00A81
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Apr 23 20:51:17 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (42000) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 437) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3039) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 100 253 051 - 0
3 Spin_Up_Time POS--K 207 207 021 - 2625
4 Start_Stop_Count -O--CK 100 100 000 - 7
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 100 253 000 - 0
9 Power_On_Hours -O--CK 100 100 000 - 22
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 6
192 Power-Off_Retract_Count -O--CK 200 200 000 - 4
193 Load_Cycle_Count -O--CK 200 200 000 - 15
194 Temperature_Celsius -O---K 120 111 000 - 27
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x04 GPL R/O 256 Device Statistics log
0x04 SL R/O 255 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x0c GPL R/O 2048 Pending Defects log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x24 GPL R/O 307 Current Device Internal Status Data log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb6 GPL,SL VS 1 Device vendor specific log
0xb7 GPL,SL VS 78 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
Device State: Active (0)
Current Temperature: 27 Celsius
Power Cycle Min/Max Temperature: 27/29 Celsius
Lifetime Min/Max Temperature: 20/29 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/65 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (361)
Index Estimated Time Temperature Celsius
362 2023-04-23 12:54 26 *******
... ..(146 skipped). .. *******
31 2023-04-23 15:21 26 *******
32 2023-04-23 15:22 ? -
33 2023-04-23 15:23 25 ******
34 2023-04-23 15:24 24 *****
35 2023-04-23 15:25 24 *****
36 2023-04-23 15:26 25 ******
... ..( 3 skipped). .. ******
40 2023-04-23 15:30 25 ******
41 2023-04-23 15:31 ? -
42 2023-04-23 15:32 26 *******
... ..( 4 skipped). .. *******
47 2023-04-23 15:37 26 *******
48 2023-04-23 15:38 27 ********
... ..( 5 skipped). .. ********
54 2023-04-23 15:44 27 ********
55 2023-04-23 15:45 ? -
56 2023-04-23 15:46 27 ********
57 2023-04-23 15:47 ? -
58 2023-04-23 15:48 27 ********
... ..( 4 skipped). .. ********
63 2023-04-23 15:53 27 ********
64 2023-04-23 15:54 ? -
65 2023-04-23 15:55 28 *********
66 2023-04-23 15:56 27 ********
... ..( 2 skipped). .. ********
69 2023-04-23 15:59 27 ********
70 2023-04-23 16:00 28 *********
... ..( 13 skipped). .. *********
84 2023-04-23 16:14 28 *********
85 2023-04-23 16:15 27 ********
... ..( 19 skipped). .. ********
105 2023-04-23 16:35 27 ********
106 2023-04-23 16:36 28 *********
... ..( 3 skipped). .. *********
110 2023-04-23 16:40 28 *********
111 2023-04-23 16:41 27 ********
112 2023-04-23 16:42 28 *********
113 2023-04-23 16:43 27 ********
... ..( 14 skipped). .. ********
128 2023-04-23 16:58 27 ********
129 2023-04-23 16:59 28 *********
... ..( 3 skipped). .. *********
133 2023-04-23 17:03 28 *********
134 2023-04-23 17:04 27 ********
135 2023-04-23 17:05 27 ********
136 2023-04-23 17:06 27 ********
137 2023-04-23 17:07 28 *********
... ..( 16 skipped). .. *********
154 2023-04-23 17:24 28 *********
155 2023-04-23 17:25 27 ********
... ..( 4 skipped). .. ********
160 2023-04-23 17:30 27 ********
161 2023-04-23 17:31 28 *********
... ..( 15 skipped). .. *********
177 2023-04-23 17:47 28 *********
178 2023-04-23 17:48 29 **********
179 2023-04-23 17:49 29 **********
180 2023-04-23 17:50 28 *********
... ..( 5 skipped). .. *********
186 2023-04-23 17:56 28 *********
187 2023-04-23 17:57 29 **********
188 2023-04-23 17:58 28 *********
... ..( 4 skipped). .. *********
193 2023-04-23 18:03 28 *********
194 2023-04-23 18:04 29 **********
195 2023-04-23 18:05 29 **********
196 2023-04-23 18:06 28 *********
... ..( 3 skipped). .. *********
200 2023-04-23 18:10 28 *********
201 2023-04-23 18:11 29 **********
... ..( 6 skipped). .. **********
208 2023-04-23 18:18 29 **********
209 2023-04-23 18:19 28 *********
... ..( 9 skipped). .. *********
219 2023-04-23 18:29 28 *********
220 2023-04-23 18:30 29 **********
221 2023-04-23 18:31 29 **********
222 2023-04-23 18:32 29 **********
223 2023-04-23 18:33 28 *********
... ..( 21 skipped). .. *********
245 2023-04-23 18:55 28 *********
246 2023-04-23 18:56 29 **********
247 2023-04-23 18:57 28 *********
... ..( 31 skipped). .. *********
279 2023-04-23 19:29 28 *********
280 2023-04-23 19:30 27 ********
281 2023-04-23 19:31 28 *********
282 2023-04-23 19:32 28 *********
283 2023-04-23 19:33 27 ********
284 2023-04-23 19:34 28 *********
285 2023-04-23 19:35 27 ********
... ..( 75 skipped). .. ********
361 2023-04-23 20:51 27 ********
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 3) ==
0x01 0x008 4 6 --- Lifetime Power-On Resets
0x01 0x010 4 22 --- Power-on Hours
0x01 0x018 6 5438 --- Logical Sectors Written
0x01 0x020 6 5429 --- Number of Write Commands
0x01 0x028 6 25138 --- Logical Sectors Read
0x01 0x030 6 1710 --- Number of Read Commands
0x01 0x038 6 79200000 --- Date and Time TimeStamp
0x02 ===== = = === == Free-Fall Statistics (rev 1) ==
0x02 0x010 4 0 --- Overlimit Shock Events
0x03 ===== = = === == Rotating Media Statistics (rev 1) ==
0x03 0x008 4 21 --- Spindle Motor Power-on Hours
0x03 0x010 4 18 --- Head Flying Hours
0x03 0x018 4 20 --- Head Load Events
0x03 0x020 4 0 --- Number of Reallocated Logical Sectors
0x03 0x028 4 0 --- Read Recovery Attempts
0x03 0x030 4 0 --- Number of Mechanical Start Failures
0x03 0x038 4 0 --- Number of Realloc. Candidate
Logical Sectors
0x03 0x040 4 4 --- Number of High Priority Unload Events
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors
0x04 0x010 4 0 --- Resets Between Cmd Acceptance and
Completion
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 27 --- Current Temperature
0x05 0x010 1 - --- Average Short Term Temperature
0x05 0x018 1 - --- Average Long Term Temperature
0x05 0x020 1 29 --- Highest Temperature
0x05 0x028 1 24 --- Lowest Temperature
0x05 0x030 1 - --- Highest Average Short Term Temperature
0x05 0x038 1 - --- Lowest Average Short Term Temperature
0x05 0x040 1 - --- Highest Average Long Term Temperature
0x05 0x048 1 - --- Lowest Average Long Term Temperature
0x05 0x050 4 0 --- Time in Over-Temperature
0x05 0x058 1 65 --- Specified Maximum Operating Temperature
0x05 0x060 4 0 --- Time in Under-Temperature
0x05 0x068 1 0 --- Specified Minimum Operating Temperature
0x06 ===== = = === == Transport Statistics (rev 1) ==
0x06 0x008 4 130 --- Number of Hardware Resets
0x06 0x010 4 63 --- Number of ASR Events
0x06 0x018 4 0 --- Number of Interface CRC Errors
0xff ===== = = === == Vendor Specific Statistics (rev 1) ==
0xff 0x008 7 0 --- Vendor Specific
0xff 0x010 7 0 --- Vendor Specific
0xff 0x018 7 0 --- Vendor Specific
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
Pending Defects log (GP Log 0x0c)
No Defects Logged
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 88 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 89 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 0 Non-CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 18606 Vendor specific
# smartctl --xall /dev/sdf
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.0-299.el9.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: WDC WD40EFPX-68C6CN0
Serial Number: WD-WX42A92A31RX
LU WWN Device Id: 5 0014ee 215825736
Firmware Version: 81.00A81
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Apr 23 20:51:46 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (39060) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 407) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3039) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 100 253 051 - 0
3 Spin_Up_Time POS--K 206 206 021 - 2658
4 Start_Stop_Count -O--CK 100 100 000 - 7
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 100 100 000 - 22
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 6
192 Power-Off_Retract_Count -O--CK 200 200 000 - 4
193 Load_Cycle_Count -O--CK 200 200 000 - 19
194 Temperature_Celsius -O---K 118 113 000 - 29
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x04 GPL R/O 256 Device Statistics log
0x04 SL R/O 255 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x0c GPL R/O 2048 Pending Defects log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x24 GPL R/O 307 Current Device Internal Status Data log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb6 GPL,SL VS 1 Device vendor specific log
0xb7 GPL,SL VS 78 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
Device State: Active (0)
Current Temperature: 29 Celsius
Power Cycle Min/Max Temperature: 29/31 Celsius
Lifetime Min/Max Temperature: 20/31 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/65 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (376)
Index Estimated Time Temperature Celsius
377 2023-04-23 12:54 29 **********
... ..(132 skipped). .. **********
32 2023-04-23 15:07 29 **********
33 2023-04-23 15:08 ? -
34 2023-04-23 15:09 26 *******
... ..( 2 skipped). .. *******
37 2023-04-23 15:12 26 *******
38 2023-04-23 15:13 27 ********
... ..( 2 skipped). .. ********
41 2023-04-23 15:16 27 ********
42 2023-04-23 15:17 ? -
43 2023-04-23 15:18 28 *********
... ..( 4 skipped). .. *********
48 2023-04-23 15:23 28 *********
49 2023-04-23 15:24 29 **********
... ..( 7 skipped). .. **********
57 2023-04-23 15:32 29 **********
58 2023-04-23 15:33 30 ***********
... ..( 9 skipped). .. ***********
68 2023-04-23 15:43 30 ***********
69 2023-04-23 15:44 29 **********
70 2023-04-23 15:45 ? -
71 2023-04-23 15:46 29 **********
72 2023-04-23 15:47 ? -
73 2023-04-23 15:48 29 **********
... ..( 2 skipped). .. **********
76 2023-04-23 15:51 29 **********
77 2023-04-23 15:52 30 ***********
78 2023-04-23 15:53 30 ***********
79 2023-04-23 15:54 ? -
80 2023-04-23 15:55 30 ***********
... ..( 5 skipped). .. ***********
86 2023-04-23 16:01 30 ***********
87 2023-04-23 16:02 31 ************
88 2023-04-23 16:03 31 ************
89 2023-04-23 16:04 30 ***********
... ..( 15 skipped). .. ***********
105 2023-04-23 16:20 30 ***********
106 2023-04-23 16:21 29 **********
107 2023-04-23 16:22 30 ***********
108 2023-04-23 16:23 29 **********
... ..( 11 skipped). .. **********
120 2023-04-23 16:35 29 **********
121 2023-04-23 16:36 30 ***********
... ..( 6 skipped). .. ***********
128 2023-04-23 16:43 30 ***********
129 2023-04-23 16:44 29 **********
... ..( 4 skipped). .. **********
134 2023-04-23 16:49 29 **********
135 2023-04-23 16:50 30 ***********
... ..( 2 skipped). .. ***********
138 2023-04-23 16:53 30 ***********
139 2023-04-23 16:54 29 **********
140 2023-04-23 16:55 29 **********
141 2023-04-23 16:56 30 ***********
... ..( 6 skipped). .. ***********
148 2023-04-23 17:03 30 ***********
149 2023-04-23 17:04 29 **********
150 2023-04-23 17:05 29 **********
151 2023-04-23 17:06 30 ***********
... ..( 14 skipped). .. ***********
166 2023-04-23 17:21 30 ***********
167 2023-04-23 17:22 29 **********
... ..( 3 skipped). .. **********
171 2023-04-23 17:26 29 **********
172 2023-04-23 17:27 30 ***********
... ..( 19 skipped). .. ***********
192 2023-04-23 17:47 30 ***********
193 2023-04-23 17:48 31 ************
194 2023-04-23 17:49 31 ************
195 2023-04-23 17:50 31 ************
196 2023-04-23 17:51 30 ***********
... ..( 4 skipped). .. ***********
201 2023-04-23 17:56 30 ***********
202 2023-04-23 17:57 31 ************
203 2023-04-23 17:58 30 ***********
... ..( 4 skipped). .. ***********
208 2023-04-23 18:03 30 ***********
209 2023-04-23 18:04 31 ************
210 2023-04-23 18:05 31 ************
211 2023-04-23 18:06 30 ***********
... ..( 4 skipped). .. ***********
216 2023-04-23 18:11 30 ***********
217 2023-04-23 18:12 31 ************
... ..( 5 skipped). .. ************
223 2023-04-23 18:18 31 ************
224 2023-04-23 18:19 30 ***********
... ..( 9 skipped). .. ***********
234 2023-04-23 18:29 30 ***********
235 2023-04-23 18:30 31 ************
... ..( 2 skipped). .. ************
238 2023-04-23 18:33 31 ************
239 2023-04-23 18:34 30 ***********
... ..( 12 skipped). .. ***********
252 2023-04-23 18:47 30 ***********
253 2023-04-23 18:48 29 **********
254 2023-04-23 18:49 29 **********
255 2023-04-23 18:50 30 ***********
... ..( 30 skipped). .. ***********
286 2023-04-23 19:21 30 ***********
287 2023-04-23 19:22 29 **********
... ..( 25 skipped). .. **********
313 2023-04-23 19:48 29 **********
314 2023-04-23 19:49 30 ***********
... ..( 2 skipped). .. ***********
317 2023-04-23 19:52 30 ***********
318 2023-04-23 19:53 29 **********
... ..( 57 skipped). .. **********
376 2023-04-23 20:51 29 **********
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 3) ==
0x01 0x008 4 6 --- Lifetime Power-On Resets
0x01 0x010 4 22 --- Power-on Hours
0x01 0x018 6 3474015532 --- Logical Sectors Written
0x01 0x020 6 29684071 --- Number of Write Commands
0x01 0x028 6 25060 --- Logical Sectors Read
0x01 0x030 6 1639 --- Number of Read Commands
0x01 0x038 6 79200000 --- Date and Time TimeStamp
0x02 ===== = = === == Free-Fall Statistics (rev 1) ==
0x02 0x010 4 0 --- Overlimit Shock Events
0x03 ===== = = === == Rotating Media Statistics (rev 1) ==
0x03 0x008 4 22 --- Spindle Motor Power-on Hours
0x03 0x010 4 18 --- Head Flying Hours
0x03 0x018 4 24 --- Head Load Events
0x03 0x020 4 0 --- Number of Reallocated Logical Sectors
0x03 0x028 4 0 --- Read Recovery Attempts
0x03 0x030 4 0 --- Number of Mechanical Start Failures
0x03 0x038 4 0 --- Number of Realloc. Candidate
Logical Sectors
0x03 0x040 4 4 --- Number of High Priority Unload Events
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors
0x04 0x010 4 0 --- Resets Between Cmd Acceptance and
Completion
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 29 --- Current Temperature
0x05 0x010 1 - --- Average Short Term Temperature
0x05 0x018 1 - --- Average Long Term Temperature
0x05 0x020 1 31 --- Highest Temperature
0x05 0x028 1 25 --- Lowest Temperature
0x05 0x030 1 - --- Highest Average Short Term Temperature
0x05 0x038 1 - --- Lowest Average Short Term Temperature
0x05 0x040 1 - --- Highest Average Long Term Temperature
0x05 0x048 1 - --- Lowest Average Long Term Temperature
0x05 0x050 4 0 --- Time in Over-Temperature
0x05 0x058 1 65 --- Specified Maximum Operating Temperature
0x05 0x060 4 0 --- Time in Under-Temperature
0x05 0x068 1 0 --- Specified Minimum Operating Temperature
0x06 ===== = = === == Transport Statistics (rev 1) ==
0x06 0x008 4 130 --- Number of Hardware Resets
0x06 0x010 4 66 --- Number of ASR Events
0x06 0x018 4 0 --- Number of Interface CRC Errors
0xff ===== = = === == Vendor Specific Statistics (rev 1) ==
0xff 0x008 7 0 --- Vendor Specific
0xff 0x010 7 0 --- Vendor Specific
0xff 0x018 7 0 --- Vendor Specific
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
Pending Defects log (GP Log 0x0c)
No Defects Logged
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 88 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 89 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 0 Non-CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 18634 Vendor specific
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-04-23 19:09 Raid5 to raid6 grow interrupted, mdadm hangs on assemble command Jove
@ 2023-04-23 19:19 ` Reindl Harald
2023-04-23 19:32 ` Jove
2023-04-24 7:41 ` Wols Lists
2023-05-04 11:41 ` Yu Kuai
2 siblings, 1 reply; 21+ messages in thread
From: Reindl Harald @ 2023-04-23 19:19 UTC (permalink / raw)
To: Jove, linux-raid
Am 23.04.23 um 21:09 schrieb Jove:
> I've added two drives to my raid5 array and tried to migrate
> it to raid6 with the following command:
>
> mdadm --grow /dev/md0 --raid-devices 4 --level 6
> --backup-file=/root/mdadm_raid6_backup.md
>
> This may have been my first mistake, as there are only 5
> drives. it should have been --raid-devices 3, I think.
how do you come to the conclusion 3 when there are 5 drives? you tell it
how much drives there are and pretty sure after "mdadm --add" you can
skip "--raid-devices" entirely because it knows how many drives there are
https://raid.wiki.kernel.org/index.php/Growing
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-04-23 19:19 ` Reindl Harald
@ 2023-04-23 19:32 ` Jove
2023-04-24 7:02 ` Jove
0 siblings, 1 reply; 21+ messages in thread
From: Jove @ 2023-04-23 19:32 UTC (permalink / raw)
To: Reindl Harald; +Cc: linux-raid
That comment was because I misunderstood the actual function
of the argument. It should have been 5, not 4 or 3 :).
I do doubt this is the cause of my problems though.
On Sun, Apr 23, 2023 at 9:19 PM Reindl Harald <h.reindl@thelounge.net> wrote:
>
>
>
> Am 23.04.23 um 21:09 schrieb Jove:
> > I've added two drives to my raid5 array and tried to migrate
> > it to raid6 with the following command:
> >
> > mdadm --grow /dev/md0 --raid-devices 4 --level 6
> > --backup-file=/root/mdadm_raid6_backup.md
> >
> > This may have been my first mistake, as there are only 5
> > drives. it should have been --raid-devices 3, I think.
>
> how do you come to the conclusion 3 when there are 5 drives? you tell it
> how much drives there are and pretty sure after "mdadm --add" you can
> skip "--raid-devices" entirely because it knows how many drives there are
>
> https://raid.wiki.kernel.org/index.php/Growing
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-04-23 19:32 ` Jove
@ 2023-04-24 7:02 ` Jove
2023-04-24 7:30 ` Wols Lists
0 siblings, 1 reply; 21+ messages in thread
From: Jove @ 2023-04-24 7:02 UTC (permalink / raw)
To: Reindl Harald; +Cc: linux-raid
> I do doubt this is the cause of my problems though.
Just to clarify, migrating an array from a 3 disk raid5 to a 4 disk
raid6 should be fine?
On Sun, Apr 23, 2023 at 9:32 PM Jove <jovetoo@gmail.com> wrote:
>
> That comment was because I misunderstood the actual function
> of the argument. It should have been 5, not 4 or 3 :).
>
> I do doubt this is the cause of my problems though.
>
> On Sun, Apr 23, 2023 at 9:19 PM Reindl Harald <h.reindl@thelounge.net> wrote:
> >
> >
> >
> > Am 23.04.23 um 21:09 schrieb Jove:
> > > I've added two drives to my raid5 array and tried to migrate
> > > it to raid6 with the following command:
> > >
> > > mdadm --grow /dev/md0 --raid-devices 4 --level 6
> > > --backup-file=/root/mdadm_raid6_backup.md
> > >
> > > This may have been my first mistake, as there are only 5
> > > drives. it should have been --raid-devices 3, I think.
> >
> > how do you come to the conclusion 3 when there are 5 drives? you tell it
> > how much drives there are and pretty sure after "mdadm --add" you can
> > skip "--raid-devices" entirely because it knows how many drives there are
> >
> > https://raid.wiki.kernel.org/index.php/Growing
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-04-24 7:02 ` Jove
@ 2023-04-24 7:30 ` Wols Lists
0 siblings, 0 replies; 21+ messages in thread
From: Wols Lists @ 2023-04-24 7:30 UTC (permalink / raw)
To: Jove; +Cc: linux-raid
On 24/04/2023 08:02, Jove wrote:
>> I do doubt this is the cause of my problems though.
> Just to clarify, migrating an array from a 3 disk raid5 to a 4 disk
> raid6 should be fine?
Yup. This should not have been a problem.
I notice you have WD Reds ... are the new drives new Reds? Not a wise
move...
At what percent is the conversion hung? If a status says 0% complete,
then a data recovery should be fine. Snag is, this doesn't at first
glance sound like that.
And you shouldn't have needed a backup file - again I'll have to dig
deeper ...
Give me a chance, I'll dig deeper.
Cheers,
Wol
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-04-23 19:09 Raid5 to raid6 grow interrupted, mdadm hangs on assemble command Jove
2023-04-23 19:19 ` Reindl Harald
@ 2023-04-24 7:41 ` Wols Lists
2023-04-24 13:31 ` Jove
2023-05-04 11:41 ` Yu Kuai
2 siblings, 1 reply; 21+ messages in thread
From: Wols Lists @ 2023-04-24 7:41 UTC (permalink / raw)
To: Jove, linux-raid; +Cc: Phil Turmel, NeilBrown
On 23/04/2023 20:09, Jove wrote:
> # mdadm --version
> mdadm - v4.2 - 2021-12-30 - 8
>
> # mdadm -D /dev/md0
> /dev/md0:
> Version : 1.2
> Creation Time : Sat Oct 21 01:57:20 2017
> Raid Level : raid6
> Array Size : 7813771264 (7.28 TiB 8.00 TB)
> Used Dev Size : 3906885632 (3.64 TiB 4.00 TB)
> Raid Devices : 4
> Total Devices : 5
> Persistence : Superblock is persistent
>
> Intent Bitmap : Internal
>
> Update Time : Sun Apr 23 10:32:01 2023
> State : clean, degraded
> Active Devices : 3
> Working Devices : 5
> Failed Devices : 0
> Spare Devices : 2
>
> Layout : left-symmetric-6
> Chunk Size : 512K
>
> Consistency Policy : bitmap
>
> New Layout : left-symmetric
>
> Name : atom:0 (local to host atom)
> UUID : 8c56384e:ba1a3cec:aaf34c17:d0cd9318
> Events : 669453
>
> Number Major Minor RaidDevice State
> 0 8 33 0 active sync /dev/sdc1
> 1 8 97 1 active sync /dev/sdg1
> 3 8 49 2 active sync /dev/sdd1
> 5 8 80 3 spare rebuilding /dev/sdf
>
> 4 8 64 - spare /dev/sde
This bit looks good. You have three active drives, so I'm HOPEFUL your
data hasn't actually been damaged.
I've cc'd two people more experienced than me who I hope can help.
Cheers,
Wol
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-04-24 7:41 ` Wols Lists
@ 2023-04-24 13:31 ` Jove
2023-04-24 21:29 ` Jove
0 siblings, 1 reply; 21+ messages in thread
From: Jove @ 2023-04-24 13:31 UTC (permalink / raw)
To: Wols Lists; +Cc: linux-raid, Phil Turmel, NeilBrown
Any data that can be retrieved would be a plus. There is much data on
this array that I don't mind being trashed.
The older drives are WD Red, they are pre-SHMR. I have made sure after
that to use WD Red Plus and WD Red Pro drives. From what I found
online, they should be CMR too. Unless they quietly changed those too.
No, the conversion definitely did not stop at 0%. It ran for several
hours. It stopped during the night, so I can't tell you more.
I am worried that the processes are hung, though. Is that normal?
Thank you for your time!
On Mon, Apr 24, 2023 at 9:41 AM Wols Lists <antlists@youngman.org.uk> wrote:
>
> On 23/04/2023 20:09, Jove wrote:
> > # mdadm --version
> > mdadm - v4.2 - 2021-12-30 - 8
> >
> > # mdadm -D /dev/md0
> > /dev/md0:
> > Version : 1.2
> > Creation Time : Sat Oct 21 01:57:20 2017
> > Raid Level : raid6
> > Array Size : 7813771264 (7.28 TiB 8.00 TB)
> > Used Dev Size : 3906885632 (3.64 TiB 4.00 TB)
> > Raid Devices : 4
> > Total Devices : 5
> > Persistence : Superblock is persistent
> >
> > Intent Bitmap : Internal
> >
> > Update Time : Sun Apr 23 10:32:01 2023
> > State : clean, degraded
> > Active Devices : 3
> > Working Devices : 5
> > Failed Devices : 0
> > Spare Devices : 2
> >
> > Layout : left-symmetric-6
> > Chunk Size : 512K
> >
> > Consistency Policy : bitmap
> >
> > New Layout : left-symmetric
> >
> > Name : atom:0 (local to host atom)
> > UUID : 8c56384e:ba1a3cec:aaf34c17:d0cd9318
> > Events : 669453
> >
> > Number Major Minor RaidDevice State
> > 0 8 33 0 active sync /dev/sdc1
> > 1 8 97 1 active sync /dev/sdg1
> > 3 8 49 2 active sync /dev/sdd1
> > 5 8 80 3 spare rebuilding /dev/sdf
> >
> > 4 8 64 - spare /dev/sde
>
> This bit looks good. You have three active drives, so I'm HOPEFUL your
> data hasn't actually been damaged.
>
> I've cc'd two people more experienced than me who I hope can help.
>
> Cheers,
> Wol
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-04-24 13:31 ` Jove
@ 2023-04-24 21:29 ` Jove
0 siblings, 0 replies; 21+ messages in thread
From: Jove @ 2023-04-24 21:29 UTC (permalink / raw)
To: Wols Lists; +Cc: linux-raid, Phil Turmel, NeilBrown
> There is much data on this array that I don't mind being trashed.
There is about 200GB I would very much like to have back. Email archive,
travel pictures, openhab configuration, ... It is all in a huge LVM
with different
logical volumes.
On Mon, Apr 24, 2023 at 3:31 PM Jove <jovetoo@gmail.com> wrote:
>
> Any data that can be retrieved would be a plus. There is much data on
> this array that I don't mind being trashed.
>
> The older drives are WD Red, they are pre-SHMR. I have made sure after
> that to use WD Red Plus and WD Red Pro drives. From what I found
> online, they should be CMR too. Unless they quietly changed those too.
>
> No, the conversion definitely did not stop at 0%. It ran for several
> hours. It stopped during the night, so I can't tell you more.
>
> I am worried that the processes are hung, though. Is that normal?
>
> Thank you for your time!
>
> On Mon, Apr 24, 2023 at 9:41 AM Wols Lists <antlists@youngman.org.uk> wrote:
> >
> > On 23/04/2023 20:09, Jove wrote:
> > > # mdadm --version
> > > mdadm - v4.2 - 2021-12-30 - 8
> > >
> > > # mdadm -D /dev/md0
> > > /dev/md0:
> > > Version : 1.2
> > > Creation Time : Sat Oct 21 01:57:20 2017
> > > Raid Level : raid6
> > > Array Size : 7813771264 (7.28 TiB 8.00 TB)
> > > Used Dev Size : 3906885632 (3.64 TiB 4.00 TB)
> > > Raid Devices : 4
> > > Total Devices : 5
> > > Persistence : Superblock is persistent
> > >
> > > Intent Bitmap : Internal
> > >
> > > Update Time : Sun Apr 23 10:32:01 2023
> > > State : clean, degraded
> > > Active Devices : 3
> > > Working Devices : 5
> > > Failed Devices : 0
> > > Spare Devices : 2
> > >
> > > Layout : left-symmetric-6
> > > Chunk Size : 512K
> > >
> > > Consistency Policy : bitmap
> > >
> > > New Layout : left-symmetric
> > >
> > > Name : atom:0 (local to host atom)
> > > UUID : 8c56384e:ba1a3cec:aaf34c17:d0cd9318
> > > Events : 669453
> > >
> > > Number Major Minor RaidDevice State
> > > 0 8 33 0 active sync /dev/sdc1
> > > 1 8 97 1 active sync /dev/sdg1
> > > 3 8 49 2 active sync /dev/sdd1
> > > 5 8 80 3 spare rebuilding /dev/sdf
> > >
> > > 4 8 64 - spare /dev/sde
> >
> > This bit looks good. You have three active drives, so I'm HOPEFUL your
> > data hasn't actually been damaged.
> >
> > I've cc'd two people more experienced than me who I hope can help.
> >
> > Cheers,
> > Wol
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-04-23 19:09 Raid5 to raid6 grow interrupted, mdadm hangs on assemble command Jove
2023-04-23 19:19 ` Reindl Harald
2023-04-24 7:41 ` Wols Lists
@ 2023-05-04 11:41 ` Yu Kuai
2023-05-04 18:02 ` Jove
2 siblings, 1 reply; 21+ messages in thread
From: Yu Kuai @ 2023-05-04 11:41 UTC (permalink / raw)
To: Jove, linux-raid; +Cc: yukuai (C)
Hi,
在 2023/04/24 3:09, Jove 写道:
> Hi,
>
> I've added two drives to my raid5 array and tried to migrate
> it to raid6 with the following command:
>
> mdadm --grow /dev/md0 --raid-devices 4 --level 6
> --backup-file=/root/mdadm_raid6_backup.md
>
> This may have been my first mistake, as there are only 5
> drives. it should have been --raid-devices 3, I think.
>
> As soon as I started this grow, the filesystems went
> unavailable. All processes trying to access files on it hung.
> I searched the web which said a reboot during a rebuild
> was not problematic if things shut down cleanly, so I
> rebooted. The reboot hung too. The drive activity
> continued so I let it run overnight. I did wake up to a
> rebooted system in emergency mode as it could not
> mount all the partitions on the raid array.
>
> The OS tried to reassemble the array and succeeded.
> However the udev processes that try to create the dev
> entries hang.
>
> I went back to Google and found out how i could reboot
> my system without this automatic assemble.
> I tried reassembling the array with:
>
> mdadm --verbose --assemble --backup-file mdadm_raid6_backup.md0 /dev/md0
>
> This failed with:
> No backup metadata on mdadm_raid6_backup.md0
> Failed to find final backup of critical section.
> Failed to restore critical section for reshape, sorry.
>
> I tried again wtih:
>
> mdadm --verbose --assemble --backup-file mdadm_raid6_backup.md0
> --invalid-backup /dev/md0
>
> Rhis said in addition to the lines above:
>
> continuying without restoring backup
>
> This seemed to have succeeded in reassembling the
> array but it also hangs indefinitely.
>
> /proc/mdstat now shows:
>
> md0 : active (read-only) raid6 sdc1[0] sde[4](S) sdf[5] sdd1[3] sdg1[1]
> 7813771264 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_]
> bitmap: 1/30 pages [4KB], 65536KB chunk
Read only can't continue reshape progress, see details in
md_check_recovery(), reshape can only start if md_is_rdwr(mddev) pass.
Do you know why this array is read-only?
>
> Again the udev processes trying to access this device hung indefinitely
>
> Eventually, the kernel dumps this in my journal:
>
> Apr 23 19:17:22 atom kernel: task:systemd-udevd state:D stack: 0
> pid: 8121 ppid: 706 flags:0x00000006
> Apr 23 19:17:22 atom kernel: Call Trace:
> Apr 23 19:17:22 atom kernel: <TASK>
> Apr 23 19:17:22 atom kernel: __schedule+0x20a/0x550
> Apr 23 19:17:22 atom kernel: schedule+0x5a/0xc0
> Apr 23 19:17:22 atom kernel: schedule_timeout+0x11f/0x160
> Apr 23 19:17:22 atom kernel: ? make_stripe_request+0x284/0x490 [raid456]
> Apr 23 19:17:22 atom kernel: wait_woken+0x50/0x70
Looks like this normal io is waiting for reshape to be done, that's why
it hanged indefinitely.
This really is a kernel bug, perhaps it can be bypassed if reshape can
be done, hopefully automatically if this array can be read/write. Noted
never echo reshape to sync_action, this will corrupt data in your case.
Thanks,
Kuai
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-05-04 11:41 ` Yu Kuai
@ 2023-05-04 18:02 ` Jove
2023-05-05 1:34 ` Yu Kuai
0 siblings, 1 reply; 21+ messages in thread
From: Jove @ 2023-05-04 18:02 UTC (permalink / raw)
To: Yu Kuai; +Cc: linux-raid, yukuai (C)
Hi Kuai,
the madm --assemble command also hangs in the kernel. It never completes.
root 142 112 1 19:01 tty1 00:00:00 mdadm --assemble
/dev/md0 /dev/ubdb /dev/ubdc /dev/ubdd /dev/ubde --backup-file
mdadm_raid6_backup.md0 --invalid-backup
root 145 2 0 19:01 ? 00:00:00 [md0_raid6]
[root@LXCNAME ~]# cat /proc/142/stack
[<0>] __switch_to+0x50/0x7f
[<0>] __schedule+0x39c/0x3dd
[<0>] schedule+0x78/0xb9
[<0>] mddev_suspend+0x10b/0x1e8
[<0>] suspend_lo_store+0x72/0xbb
[<0>] md_attr_store+0x6c/0x8d
[<0>] sysfs_kf_write+0x34/0x37
[<0>] kernfs_fop_write_iter+0x167/0x1d0
[<0>] new_sync_write+0x68/0xd8
[<0>] vfs_write+0xe7/0x12b
[<0>] ksys_write+0x6d/0xa6
[<0>] sys_write+0x10/0x12
[<0>] handle_syscall+0x81/0xb1
[<0>] userspace+0x3db/0x598
[<0>] fork_handler+0x94/0x96
[root@LXCNAME ~]# cat /proc/145/stack
[<0>] __switch_to+0x50/0x7f
[<0>] __schedule+0x39c/0x3dd
[<0>] schedule+0x78/0xb9
[<0>] schedule_timeout+0xd2/0xfb
[<0>] md_thread+0x12c/0x18a
[<0>] kthread+0x11d/0x122
[<0>] new_thread_handler+0x81/0xb2
I have had one case in which mdadm didn't hang and in which the
reshape continued. Sadly, I was using sparse overlay files and the
filesystem could not handle the full 4x 4TB. I had to terminate the
reshape.
Best regards,
Johan
On Thu, May 4, 2023 at 1:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/04/24 3:09, Jove 写道:
> > Hi,
> >
> > I've added two drives to my raid5 array and tried to migrate
> > it to raid6 with the following command:
> >
> > mdadm --grow /dev/md0 --raid-devices 4 --level 6
> > --backup-file=/root/mdadm_raid6_backup.md
> >
> > This may have been my first mistake, as there are only 5
> > drives. it should have been --raid-devices 3, I think.
> >
> > As soon as I started this grow, the filesystems went
> > unavailable. All processes trying to access files on it hung.
> > I searched the web which said a reboot during a rebuild
> > was not problematic if things shut down cleanly, so I
> > rebooted. The reboot hung too. The drive activity
> > continued so I let it run overnight. I did wake up to a
> > rebooted system in emergency mode as it could not
> > mount all the partitions on the raid array.
> >
> > The OS tried to reassemble the array and succeeded.
> > However the udev processes that try to create the dev
> > entries hang.
> >
> > I went back to Google and found out how i could reboot
> > my system without this automatic assemble.
> > I tried reassembling the array with:
> >
> > mdadm --verbose --assemble --backup-file mdadm_raid6_backup.md0 /dev/md0
> >
> > This failed with:
> > No backup metadata on mdadm_raid6_backup.md0
> > Failed to find final backup of critical section.
> > Failed to restore critical section for reshape, sorry.
> >
> > I tried again wtih:
> >
> > mdadm --verbose --assemble --backup-file mdadm_raid6_backup.md0
> > --invalid-backup /dev/md0
> >
> > Rhis said in addition to the lines above:
> >
> > continuying without restoring backup
> >
> > This seemed to have succeeded in reassembling the
> > array but it also hangs indefinitely.
> >
> > /proc/mdstat now shows:
> >
> > md0 : active (read-only) raid6 sdc1[0] sde[4](S) sdf[5] sdd1[3] sdg1[1]
> > 7813771264 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_]
> > bitmap: 1/30 pages [4KB], 65536KB chunk
>
> Read only can't continue reshape progress, see details in
> md_check_recovery(), reshape can only start if md_is_rdwr(mddev) pass.
> Do you know why this array is read-only?
>
> >
> > Again the udev processes trying to access this device hung indefinitely
> >
> > Eventually, the kernel dumps this in my journal:
> >
> > Apr 23 19:17:22 atom kernel: task:systemd-udevd state:D stack: 0
> > pid: 8121 ppid: 706 flags:0x00000006
> > Apr 23 19:17:22 atom kernel: Call Trace:
> > Apr 23 19:17:22 atom kernel: <TASK>
> > Apr 23 19:17:22 atom kernel: __schedule+0x20a/0x550
> > Apr 23 19:17:22 atom kernel: schedule+0x5a/0xc0
> > Apr 23 19:17:22 atom kernel: schedule_timeout+0x11f/0x160
> > Apr 23 19:17:22 atom kernel: ? make_stripe_request+0x284/0x490 [raid456]
> > Apr 23 19:17:22 atom kernel: wait_woken+0x50/0x70
>
> Looks like this normal io is waiting for reshape to be done, that's why
> it hanged indefinitely.
>
> This really is a kernel bug, perhaps it can be bypassed if reshape can
> be done, hopefully automatically if this array can be read/write. Noted
> never echo reshape to sync_action, this will corrupt data in your case.
>
> Thanks,
> Kuai
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-05-04 18:02 ` Jove
@ 2023-05-05 1:34 ` Yu Kuai
2023-05-05 6:58 ` Wol
0 siblings, 1 reply; 21+ messages in thread
From: Yu Kuai @ 2023-05-05 1:34 UTC (permalink / raw)
To: Jove, Yu Kuai; +Cc: linux-raid, yukuai (C)
Hi,
在 2023/05/05 2:02, Jove 写道:
> Hi Kuai,
>
> the madm --assemble command also hangs in the kernel. It never completes.
>
> root 142 112 1 19:01 tty1 00:00:00 mdadm --assemble
> /dev/md0 /dev/ubdb /dev/ubdc /dev/ubdd /dev/ubde --backup-file
> mdadm_raid6_backup.md0 --invalid-backup
> root 145 2 0 19:01 ? 00:00:00 [md0_raid6]
>
> [root@LXCNAME ~]# cat /proc/142/stack
> [<0>] __switch_to+0x50/0x7f
> [<0>] __schedule+0x39c/0x3dd
> [<0>] schedule+0x78/0xb9
> [<0>] mddev_suspend+0x10b/0x1e8
mddev_suspend is wait for read io to be done, while read io is waiting
for reshape to progress.
So this is just based on if there is a read io beyond reshape position
while mdadm is executed.
> [<0>] suspend_lo_store+0x72/0xbb
> [<0>] md_attr_store+0x6c/0x8d
> [<0>] sysfs_kf_write+0x34/0x37
> [<0>] kernfs_fop_write_iter+0x167/0x1d0
> [<0>] new_sync_write+0x68/0xd8
> [<0>] vfs_write+0xe7/0x12b
> [<0>] ksys_write+0x6d/0xa6
> [<0>] sys_write+0x10/0x12
> [<0>] handle_syscall+0x81/0xb1
> [<0>] userspace+0x3db/0x598
> [<0>] fork_handler+0x94/0x96
>
> [root@LXCNAME ~]# cat /proc/145/stack
> [<0>] __switch_to+0x50/0x7f
> [<0>] __schedule+0x39c/0x3dd
> [<0>] schedule+0x78/0xb9
> [<0>] schedule_timeout+0xd2/0xfb
> [<0>] md_thread+0x12c/0x18a
> [<0>] kthread+0x11d/0x122
> [<0>] new_thread_handler+0x81/0xb2
>
> I have had one case in which mdadm didn't hang and in which the
> reshape continued. Sadly, I was using sparse overlay files and the
> filesystem could not handle the full 4x 4TB. I had to terminate the
> reshape.
This sounds like a dead end for now, normal io beyond reshape position
must wait:
raid5_make_request
make_stripe_request
ahead_of_reshape
wait_woken
Thanks,
Kuai
>
> Best regards,
>
> Johan
>
> On Thu, May 4, 2023 at 1:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/04/24 3:09, Jove 写道:
>>> Hi,
>>>
>>> I've added two drives to my raid5 array and tried to migrate
>>> it to raid6 with the following command:
>>>
>>> mdadm --grow /dev/md0 --raid-devices 4 --level 6
>>> --backup-file=/root/mdadm_raid6_backup.md
>>>
>>> This may have been my first mistake, as there are only 5
>>> drives. it should have been --raid-devices 3, I think.
>>>
>>> As soon as I started this grow, the filesystems went
>>> unavailable. All processes trying to access files on it hung.
>>> I searched the web which said a reboot during a rebuild
>>> was not problematic if things shut down cleanly, so I
>>> rebooted. The reboot hung too. The drive activity
>>> continued so I let it run overnight. I did wake up to a
>>> rebooted system in emergency mode as it could not
>>> mount all the partitions on the raid array.
>>>
>>> The OS tried to reassemble the array and succeeded.
>>> However the udev processes that try to create the dev
>>> entries hang.
>>>
>>> I went back to Google and found out how i could reboot
>>> my system without this automatic assemble.
>>> I tried reassembling the array with:
>>>
>>> mdadm --verbose --assemble --backup-file mdadm_raid6_backup.md0 /dev/md0
>>>
>>> This failed with:
>>> No backup metadata on mdadm_raid6_backup.md0
>>> Failed to find final backup of critical section.
>>> Failed to restore critical section for reshape, sorry.
>>>
>>> I tried again wtih:
>>>
>>> mdadm --verbose --assemble --backup-file mdadm_raid6_backup.md0
>>> --invalid-backup /dev/md0
>>>
>>> Rhis said in addition to the lines above:
>>>
>>> continuying without restoring backup
>>>
>>> This seemed to have succeeded in reassembling the
>>> array but it also hangs indefinitely.
>>>
>>> /proc/mdstat now shows:
>>>
>>> md0 : active (read-only) raid6 sdc1[0] sde[4](S) sdf[5] sdd1[3] sdg1[1]
>>> 7813771264 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_]
>>> bitmap: 1/30 pages [4KB], 65536KB chunk
>>
>> Read only can't continue reshape progress, see details in
>> md_check_recovery(), reshape can only start if md_is_rdwr(mddev) pass.
>> Do you know why this array is read-only?
>>
>>>
>>> Again the udev processes trying to access this device hung indefinitely
>>>
>>> Eventually, the kernel dumps this in my journal:
>>>
>>> Apr 23 19:17:22 atom kernel: task:systemd-udevd state:D stack: 0
>>> pid: 8121 ppid: 706 flags:0x00000006
>>> Apr 23 19:17:22 atom kernel: Call Trace:
>>> Apr 23 19:17:22 atom kernel: <TASK>
>>> Apr 23 19:17:22 atom kernel: __schedule+0x20a/0x550
>>> Apr 23 19:17:22 atom kernel: schedule+0x5a/0xc0
>>> Apr 23 19:17:22 atom kernel: schedule_timeout+0x11f/0x160
>>> Apr 23 19:17:22 atom kernel: ? make_stripe_request+0x284/0x490 [raid456]
>>> Apr 23 19:17:22 atom kernel: wait_woken+0x50/0x70
>>
>> Looks like this normal io is waiting for reshape to be done, that's why
>> it hanged indefinitely.
>>
>> This really is a kernel bug, perhaps it can be bypassed if reshape can
>> be done, hopefully automatically if this array can be read/write. Noted
>> never echo reshape to sync_action, this will corrupt data in your case.
>>
>> Thanks,
>> Kuai
>>
> .
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-05-05 1:34 ` Yu Kuai
@ 2023-05-05 6:58 ` Wol
2023-05-05 8:02 ` Yu Kuai
0 siblings, 1 reply; 21+ messages in thread
From: Wol @ 2023-05-05 6:58 UTC (permalink / raw)
To: Yu Kuai, Jove; +Cc: linux-raid, yukuai (C)
On 05/05/2023 02:34, Yu Kuai wrote:
>> I have had one case in which mdadm didn't hang and in which the
>> reshape continued. Sadly, I was using sparse overlay files and the
>> filesystem could not handle the full 4x 4TB. I had to terminate the
>> reshape.
>
> This sounds like a dead end for now, normal io beyond reshape position
> must wait:
>
> raid5_make_request
> make_stripe_request
> ahead_of_reshape
> wait_woken
Not sure if I've got the wrong end of the stick, but if I've understood
correctly, that shouldn't be the case.
Reshape takes place in a window. All io *beyond* the window is allowed
to proceed normally - that part of the array has not been reshaped so
the old parameters are used.
All io *in front* of the window is allowed to proceed normally - that
part of the array has been reshaped so the new parameters are used.
io *IN* the window is paused until the window has passed. This
interruption should be short and sweet.
Cheers,
Wol
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-05-05 6:58 ` Wol
@ 2023-05-05 8:02 ` Yu Kuai
2023-05-05 15:47 ` Jove
0 siblings, 1 reply; 21+ messages in thread
From: Yu Kuai @ 2023-05-05 8:02 UTC (permalink / raw)
To: Wol, Yu Kuai, Jove; +Cc: linux-raid, yukuai (C)
Hi,
在 2023/05/05 14:58, Wol 写道:
> On 05/05/2023 02:34, Yu Kuai wrote:
>>> I have had one case in which mdadm didn't hang and in which the
>>> reshape continued. Sadly, I was using sparse overlay files and the
>>> filesystem could not handle the full 4x 4TB. I had to terminate the
>>> reshape.
>>
>> This sounds like a dead end for now, normal io beyond reshape position
>> must wait:
>>
>> raid5_make_request
>> make_stripe_request
>> ahead_of_reshape
>> wait_woken
>
> Not sure if I've got the wrong end of the stick, but if I've understood
> correctly, that shouldn't be the case.
>
> Reshape takes place in a window. All io *beyond* the window is allowed
> to proceed normally - that part of the array has not been reshaped so
> the old parameters are used.
>
> All io *in front* of the window is allowed to proceed normally - that
> part of the array has been reshaped so the new parameters are used.
>
> io *IN* the window is paused until the window has passed. This
> interruption should be short and sweet.
Yes, it's correct, and in this case reshape_safe should be the same as
reshapge_progress, and I guess io is stuck because
stripe_ahead_of_reshape() return true.
So this deadlock happens when io is blocked because of reshape, and
mddev_suspend() is waiting for this io to be done, in the meantime
reshape can't start untill mddev_suspend() returns.
Jove, As I understand this, if mdadm make progress without a blocked
io, and reshape continues, it seems you can use this array without
problem.
Thanks,
Kuai
>
> Cheers,
> Wol
>
> .
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-05-05 8:02 ` Yu Kuai
@ 2023-05-05 15:47 ` Jove
2023-05-06 1:33 ` Yu Kuai
0 siblings, 1 reply; 21+ messages in thread
From: Jove @ 2023-05-05 15:47 UTC (permalink / raw)
To: Yu Kuai; +Cc: Wol, linux-raid, yukuai (C)
Hi Kuai.
> Jove, As I understand this, if mdadm make progress without a blocked
> io, and reshape continues, it seems you can use this array without
> problem
I've had to do some sleuthing to figure out who was doing that array
access, I was already running a minimal FedoraCore image. I've
discovered that the culprit is the systemd-udevd daemon. I do not know
why it accesses the array but if I stop it and rename that executable
(it gets started automatically when the array is assembled) then the
reshape continues.
Now it is just a matter of time until the reshape is finished and I
can discover just how much data I still have :)
Thank you all for your help, I will send a last mail when I know more.
Best regards,
Johan
On Fri, May 5, 2023 at 10:02 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/05/05 14:58, Wol 写道:
> > On 05/05/2023 02:34, Yu Kuai wrote:
> >>> I have had one case in which mdadm didn't hang and in which the
> >>> reshape continued. Sadly, I was using sparse overlay files and the
> >>> filesystem could not handle the full 4x 4TB. I had to terminate the
> >>> reshape.
> >>
> >> This sounds like a dead end for now, normal io beyond reshape position
> >> must wait:
> >>
> >> raid5_make_request
> >> make_stripe_request
> >> ahead_of_reshape
> >> wait_woken
> >
> > Not sure if I've got the wrong end of the stick, but if I've understood
> > correctly, that shouldn't be the case.
> >
> > Reshape takes place in a window. All io *beyond* the window is allowed
> > to proceed normally - that part of the array has not been reshaped so
> > the old parameters are used.
> >
> > All io *in front* of the window is allowed to proceed normally - that
> > part of the array has been reshaped so the new parameters are used.
> >
> > io *IN* the window is paused until the window has passed. This
> > interruption should be short and sweet.
>
> Yes, it's correct, and in this case reshape_safe should be the same as
> reshapge_progress, and I guess io is stuck because
> stripe_ahead_of_reshape() return true.
>
> So this deadlock happens when io is blocked because of reshape, and
> mddev_suspend() is waiting for this io to be done, in the meantime
> reshape can't start untill mddev_suspend() returns.
>
> Jove, As I understand this, if mdadm make progress without a blocked
> io, and reshape continues, it seems you can use this array without
> problem.
>
> Thanks,
> Kuai
> >
> > Cheers,
> > Wol
> >
> > .
> >
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-05-05 15:47 ` Jove
@ 2023-05-06 1:33 ` Yu Kuai
2023-05-06 13:07 ` Jove
0 siblings, 1 reply; 21+ messages in thread
From: Yu Kuai @ 2023-05-06 1:33 UTC (permalink / raw)
To: Jove, Yu Kuai; +Cc: Wol, linux-raid, yukuai (C)
Hi,
在 2023/05/05 23:47, Jove 写道:
> Hi Kuai.
>
>> Jove, As I understand this, if mdadm make progress without a blocked
>> io, and reshape continues, it seems you can use this array without
>> problem
>
> I've had to do some sleuthing to figure out who was doing that array
> access, I was already running a minimal FedoraCore image. I've
> discovered that the culprit is the systemd-udevd daemon. I do not know
> why it accesses the array but if I stop it and rename that executable
> (it gets started automatically when the array is assembled) then the
> reshape continues.
Thanks for confirming this, however, I have no idea why systemd-udevd is
accessing the array.
In the meantime, I'll try to fix this deadlock, hope you don't mind a
reported-by tag.
Thanks,
Kuai
>
> Now it is just a matter of time until the reshape is finished and I
> can discover just how much data I still have :)
>
> Thank you all for your help, I will send a last mail when I know more.
>
> Best regards,
>
> Johan
>
>
>
> On Fri, May 5, 2023 at 10:02 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/05/05 14:58, Wol 写道:
>>> On 05/05/2023 02:34, Yu Kuai wrote:
>>>>> I have had one case in which mdadm didn't hang and in which the
>>>>> reshape continued. Sadly, I was using sparse overlay files and the
>>>>> filesystem could not handle the full 4x 4TB. I had to terminate the
>>>>> reshape.
>>>>
>>>> This sounds like a dead end for now, normal io beyond reshape position
>>>> must wait:
>>>>
>>>> raid5_make_request
>>>> make_stripe_request
>>>> ahead_of_reshape
>>>> wait_woken
>>>
>>> Not sure if I've got the wrong end of the stick, but if I've understood
>>> correctly, that shouldn't be the case.
>>>
>>> Reshape takes place in a window. All io *beyond* the window is allowed
>>> to proceed normally - that part of the array has not been reshaped so
>>> the old parameters are used.
>>>
>>> All io *in front* of the window is allowed to proceed normally - that
>>> part of the array has been reshaped so the new parameters are used.
>>>
>>> io *IN* the window is paused until the window has passed. This
>>> interruption should be short and sweet.
>>
>> Yes, it's correct, and in this case reshape_safe should be the same as
>> reshapge_progress, and I guess io is stuck because
>> stripe_ahead_of_reshape() return true.
>>
>> So this deadlock happens when io is blocked because of reshape, and
>> mddev_suspend() is waiting for this io to be done, in the meantime
>> reshape can't start untill mddev_suspend() returns.
>>
>> Jove, As I understand this, if mdadm make progress without a blocked
>> io, and reshape continues, it seems you can use this array without
>> problem.
>>
>> Thanks,
>> Kuai
>>>
>>> Cheers,
>>> Wol
>>>
>>> .
>>>
>>
> .
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-05-06 1:33 ` Yu Kuai
@ 2023-05-06 13:07 ` Jove
2023-05-06 21:59 ` Wol
2023-05-09 2:10 ` Yu Kuai
0 siblings, 2 replies; 21+ messages in thread
From: Jove @ 2023-05-06 13:07 UTC (permalink / raw)
To: Yu Kuai; +Cc: Wol, linux-raid, yukuai (C)
Hi Kuai,
Just to confirm, the array seems fine after the reshape. Copying files now.
Would it be best if I scrap this array and create a new one or is this
array safe to use in the long term? It had to use the --invalid-backup
flag to get it to reshape, so there might be corruption before that
resume point?
I have to do a reshape anyway, to 5 raid devices.
> In the meantime, I'll try to fix this deadlock, hope you don't mind a
> reported-by tag.
I would not, thank you.
I still have the backup images of the drive in reshape. If you wish I
can test any fix you create.
> I have no idea why systemd-udevd is accessing the array.
My guess is it is accessing this array is because it checks it for the
lvm layout so it can automatically create the /dev/mapper entries.
With systemd-udevd disabled, these entries to not automatically
appear.
And thank you again for getting me my data back.
Best regards,
Johan
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-05-06 13:07 ` Jove
@ 2023-05-06 21:59 ` Wol
2023-05-07 11:30 ` Jove
2023-05-09 2:10 ` Yu Kuai
1 sibling, 1 reply; 21+ messages in thread
From: Wol @ 2023-05-06 21:59 UTC (permalink / raw)
To: Jove, Yu Kuai; +Cc: linux-raid, yukuai (C)
On 06/05/2023 14:07, Jove wrote:
> Hi Kuai,
>
> Just to confirm, the array seems fine after the reshape. Copying files now.
>
> Would it be best if I scrap this array and create a new one or is this
> array safe to use in the long term? It had to use the --invalid-backup
> flag to get it to reshape, so there might be corruption before that
> resume point?
>
> I have to do a reshape anyway, to 5 raid devices.
>
I wouldn't think it necessary to scrap the array, but if you've backed
it up and are happier doing so ...
AIUI it was an external program squeezing in where it shouldn't that
(quite literally) threw a spanner in the works and jammed things up. The
array itself should be perfectly okay.
As for the "invalid backup" problem, you should never have given it a
backup in the first place, and (while I don't know the code) I very much
expect it ignored the option completely. You have superblock 1.2, which
has a chunk of space "reserved for internal use", one of which is to
provide this backup.
The only real good reason I can think of for scrapping and recreating
the array is that it will give you a clean array, with ALL THE CURRENT
DEFAULTS. This is important if anything goes wrong in future, if you
have an array with a known creation date, that has not been "messed
about" with since, it's easier to recover if you're really stupid and
damage it and lose your records of the layout. Once an array goes
through reshapes, it can be a lot harder to work out the layout if you
have to rescue the array by recreating it.
Cheers,
Wol
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-05-06 21:59 ` Wol
@ 2023-05-07 11:30 ` Jove
0 siblings, 0 replies; 21+ messages in thread
From: Jove @ 2023-05-07 11:30 UTC (permalink / raw)
To: Wol; +Cc: Yu Kuai, linux-raid, yukuai (C)
Hi Wol,
> I wouldn't think it necessary to scrap the array, but if you've backed
> it up and are happier doing so ...
Not particularly. I have taken backups and I am reshaping it to 5 raid
devices and if it works, I'll keep it.
> As for the "invalid backup" problem, you should never have given it a
> backup in the first place, and (while I don't know the code) I very much
> expect it ignored the option completely.
I don't know, Wol. I added the option because the wiki recommended it.
All I know is that when I tried to resume the reshape without the
option or without the --invalid-backup option, mdadm complained it
could not restore the critical section and refused to assemble the
array.
> Once an array goes through reshapes, it can be a lot harder to work
> out the layout if you have to rescue the array by recreating it.
I am no longer going to rely on the array alone to keep my data safe.
Should this array ever fail again, there will be backups to recover
from.
Thanks,
Johan
On Sat, May 6, 2023 at 11:59 PM Wol <antlists@youngman.org.uk> wrote:
>
> On 06/05/2023 14:07, Jove wrote:
> > Hi Kuai,
> >
> > Just to confirm, the array seems fine after the reshape. Copying files now.
> >
> > Would it be best if I scrap this array and create a new one or is this
> > array safe to use in the long term? It had to use the --invalid-backup
> > flag to get it to reshape, so there might be corruption before that
> > resume point?
> >
> > I have to do a reshape anyway, to 5 raid devices.
> >
> I wouldn't think it necessary to scrap the array, but if you've backed
> it up and are happier doing so ...
>
> AIUI it was an external program squeezing in where it shouldn't that
> (quite literally) threw a spanner in the works and jammed things up. The
> array itself should be perfectly okay.
>
> As for the "invalid backup" problem, you should never have given it a
> backup in the first place, and (while I don't know the code) I very much
> expect it ignored the option completely. You have superblock 1.2, which
> has a chunk of space "reserved for internal use", one of which is to
> provide this backup.
>
> The only real good reason I can think of for scrapping and recreating
> the array is that it will give you a clean array, with ALL THE CURRENT
> DEFAULTS. This is important if anything goes wrong in future, if you
> have an array with a known creation date, that has not been "messed
> about" with since, it's easier to recover if you're really stupid and
> damage it and lose your records of the layout. Once an array goes
> through reshapes, it can be a lot harder to work out the layout if you
> have to rescue the array by recreating it.
>
> Cheers,
> Wol
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-05-06 13:07 ` Jove
2023-05-06 21:59 ` Wol
@ 2023-05-09 2:10 ` Yu Kuai
2023-05-09 20:18 ` Johan Verrept
1 sibling, 1 reply; 21+ messages in thread
From: Yu Kuai @ 2023-05-09 2:10 UTC (permalink / raw)
To: Jove, Yu Kuai
Cc: Wol, linux-raid, yukuai (C), songliubraving, Logan Gunthorpe
[-- Attachment #1: Type: text/plain, Size: 1278 bytes --]
Hi, Jove
在 2023/05/06 21:07, Jove 写道:
> Hi Kuai,
>
> Just to confirm, the array seems fine after the reshape. Copying files now.
>
> Would it be best if I scrap this array and create a new one or is this
> array safe to use in the long term? It had to use the --invalid-backup
> flag to get it to reshape, so there might be corruption before that
> resume point?
>
> I have to do a reshape anyway, to 5 raid devices.
>
>> In the meantime, I'll try to fix this deadlock, hope you don't mind a
>> reported-by tag.
>
> I would not, thank you.
>
> I still have the backup images of the drive in reshape. If you wish I
> can test any fix you create.
Here is the first verion of the fixed patch, I fail the io that is
waiting for reshape while reshape can't make progress. I tested in my
VM and it works as I expected. Can you give it a try to see if mdadm
can still assemble?
Thanks,
Kuai
>
>> I have no idea why systemd-udevd is accessing the array.
>
> My guess is it is accessing this array is because it checks it for the
> lvm layout so it can automatically create the /dev/mapper entries.
> With systemd-udevd disabled, these entries to not automatically
> appear.
>
> And thank you again for getting me my data back.
>
> Best regards,
>
> Johan
> .
>
[-- Attachment #2: 0001-md-fix-raid456-deadlock.patch --]
[-- Type: text/plain, Size: 5758 bytes --]
From 159ea7c8d591882dfbbdf30938c1c1d5bc9d4931 Mon Sep 17 00:00:00 2001
From: Yu Kuai <yukuai3@huawei.com>
Date: Tue, 9 May 2023 09:28:36 +0800
Subject: [PATCH] md: fix raid456 deadlock
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md.c | 20 ++++----------------
drivers/md/md.h | 18 ++++++++++++++++++
drivers/md/raid5.c | 32 +++++++++++++++++++++++++++++++-
3 files changed, 53 insertions(+), 17 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8e344b4b3444..462529e47f19 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -93,18 +93,6 @@ static int remove_and_add_spares(struct mddev *mddev,
struct md_rdev *this);
static void mddev_detach(struct mddev *mddev);
-enum md_ro_state {
- MD_RDWR,
- MD_RDONLY,
- MD_AUTO_READ,
- MD_MAX_STATE
-};
-
-static bool md_is_rdwr(struct mddev *mddev)
-{
- return (mddev->ro == MD_RDWR);
-}
-
/*
* Default number of read corrections we'll attempt on an rdev
* before ejecting it from the array. We divide the read error
@@ -360,10 +348,6 @@ EXPORT_SYMBOL_GPL(md_new_event);
static LIST_HEAD(all_mddevs);
static DEFINE_SPINLOCK(all_mddevs_lock);
-static bool is_md_suspended(struct mddev *mddev)
-{
- return percpu_ref_is_dying(&mddev->active_io);
-}
/* Rather than calling directly into the personality make_request function,
* IO requests come here first so that we can check if the device is
* being suspended pending a reconfiguration.
@@ -464,6 +448,10 @@ void mddev_suspend(struct mddev *mddev)
wake_up(&mddev->sb_wait);
set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
percpu_ref_kill(&mddev->active_io);
+
+ if (mddev->pers->prepare_suspend)
+ mddev->pers->prepare_suspend(mddev);
+
wait_event(mddev->sb_wait, percpu_ref_is_zero(&mddev->active_io));
mddev->pers->quiesce(mddev, 1);
clear_bit_unlock(MD_ALLOW_SB_UPDATE, &mddev->flags);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index fd8f260ed5f8..292b96a15890 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -536,6 +536,23 @@ struct mddev {
bool serialize_policy:1;
};
+enum md_ro_state {
+ MD_RDWR,
+ MD_RDONLY,
+ MD_AUTO_READ,
+ MD_MAX_STATE
+};
+
+static inline bool md_is_rdwr(struct mddev *mddev)
+{
+ return (mddev->ro == MD_RDWR);
+}
+
+static inline bool is_md_suspended(struct mddev *mddev)
+{
+ return percpu_ref_is_dying(&mddev->active_io);
+}
+
enum recovery_flags {
/*
* If neither SYNC or RESHAPE are set, then it is a recovery.
@@ -614,6 +631,7 @@ struct md_personality
int (*start_reshape) (struct mddev *mddev);
void (*finish_reshape) (struct mddev *mddev);
void (*update_reshape_pos) (struct mddev *mddev);
+ void (*prepare_suspend) (struct mddev *mddev);
/* quiesce suspends or resumes internal processing.
* 1 - stop new actions and wait for action io to complete
* 0 - return to normal behaviour
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 812a12e3e41a..5a24935c113d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -761,6 +761,7 @@ enum stripe_result {
STRIPE_RETRY,
STRIPE_SCHEDULE_AND_RETRY,
STRIPE_FAIL,
+ STRIPE_FAIL_AND_RETRY,
};
struct stripe_request_ctx {
@@ -5997,7 +5998,8 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
if (ahead_of_reshape(mddev, logical_sector,
conf->reshape_safe)) {
spin_unlock_irq(&conf->device_lock);
- return STRIPE_SCHEDULE_AND_RETRY;
+ ret = STRIPE_SCHEDULE_AND_RETRY;
+ goto out;
}
}
spin_unlock_irq(&conf->device_lock);
@@ -6076,6 +6078,18 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
out_release:
raid5_release_stripe(sh);
+out:
+ /*
+ * There is no point to wait for reshape because reshape can't make
+ * progress if the array is suspended or is not read write.
+ */
+ if (ret == STRIPE_SCHEDULE_AND_RETRY &&
+ (is_md_suspended(mddev) || !md_is_rdwr(mddev))) {
+ bi->bi_status = BLK_STS_IOERR;
+ ret = STRIPE_FAIL;
+ pr_err("md/raid456:%s: array is suspended or not read write, io accross reshape position failed, please try again after reshape.\n",
+ mdname(mddev));
+ }
return ret;
}
@@ -8654,6 +8668,19 @@ static void raid5_finish_reshape(struct mddev *mddev)
}
}
+static void raid5_prepare_suspend(struct mddev *mddev)
+{
+ struct r5conf *conf = mddev->private;
+
+ /*
+ * Before waiting for active_io to be done, fail all the io that is
+ * waiting for reshape because they can never be done after suspend.
+ *
+ * Perhaps it's better to let those io wait for resume than failing.
+ */
+ wake_up(&conf->wait_for_overlap);
+}
+
static void raid5_quiesce(struct mddev *mddev, int quiesce)
{
struct r5conf *conf = mddev->private;
@@ -9020,6 +9047,7 @@ static struct md_personality raid6_personality =
.check_reshape = raid6_check_reshape,
.start_reshape = raid5_start_reshape,
.finish_reshape = raid5_finish_reshape,
+ .prepare_suspend = raid5_prepare_suspend,
.quiesce = raid5_quiesce,
.takeover = raid6_takeover,
.change_consistency_policy = raid5_change_consistency_policy,
@@ -9044,6 +9072,7 @@ static struct md_personality raid5_personality =
.check_reshape = raid5_check_reshape,
.start_reshape = raid5_start_reshape,
.finish_reshape = raid5_finish_reshape,
+ .prepare_suspend = raid5_prepare_suspend,
.quiesce = raid5_quiesce,
.takeover = raid5_takeover,
.change_consistency_policy = raid5_change_consistency_policy,
@@ -9069,6 +9098,7 @@ static struct md_personality raid4_personality =
.check_reshape = raid5_check_reshape,
.start_reshape = raid5_start_reshape,
.finish_reshape = raid5_finish_reshape,
+ .prepare_suspend = raid5_prepare_suspend,
.quiesce = raid5_quiesce,
.takeover = raid4_takeover,
.change_consistency_policy = raid5_change_consistency_policy,
--
2.39.2
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-05-09 2:10 ` Yu Kuai
@ 2023-05-09 20:18 ` Johan Verrept
2023-05-10 1:13 ` Yu Kuai
0 siblings, 1 reply; 21+ messages in thread
From: Johan Verrept @ 2023-05-09 20:18 UTC (permalink / raw)
To: Yu Kuai, Jove
Cc: Wol, linux-raid, yukuai (C), songliubraving, Logan Gunthorpe
Hi Kuai,
> Here is the first verion of the fixed patch, I fail the io that is
> waiting for reshape while reshape can't make progress. I tested in my
> VM and it works as I expected. Can you give it a try to see if mdadm
> can still assemble?
Assemble seems to work fine and the reshape resumed.
I see this error appearing:
md/raid456:md0: array is suspended or not read write, io accross
reshape position failed, please try again after reshape.
From what I can see in your patch, this is what is expected.
Best regards,
Johan
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
2023-05-09 20:18 ` Johan Verrept
@ 2023-05-10 1:13 ` Yu Kuai
0 siblings, 0 replies; 21+ messages in thread
From: Yu Kuai @ 2023-05-10 1:13 UTC (permalink / raw)
To: Johan Verrept, Yu Kuai, Jove
Cc: Wol, linux-raid, songliubraving, Logan Gunthorpe, David Gilmour,
yukuai (C)
Hi, Johan
在 2023/05/10 4:18, Johan Verrept 写道:
>
> Hi Kuai,
>
>> Here is the first verion of the fixed patch, I fail the io that is
>> waiting for reshape while reshape can't make progress. I tested in my
>> VM and it works as I expected. Can you give it a try to see if mdadm
>> can still assemble?
>
> Assemble seems to work fine and the reshape resumed.
>
That's great, thanks for testing.
David, you can try this patsh as well, your case is different but
I think this patch will work.
Thanks,
Kuai
> I see this error appearing:
>
> md/raid456:md0: array is suspended or not read write, io accross
> reshape position failed, please try again after reshape.
>
> From what I can see in your patch, this is what is expected.
>
> Best regards,
>
> Johan
>
>
> .
>
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2023-05-10 1:13 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-23 19:09 Raid5 to raid6 grow interrupted, mdadm hangs on assemble command Jove
2023-04-23 19:19 ` Reindl Harald
2023-04-23 19:32 ` Jove
2023-04-24 7:02 ` Jove
2023-04-24 7:30 ` Wols Lists
2023-04-24 7:41 ` Wols Lists
2023-04-24 13:31 ` Jove
2023-04-24 21:29 ` Jove
2023-05-04 11:41 ` Yu Kuai
2023-05-04 18:02 ` Jove
2023-05-05 1:34 ` Yu Kuai
2023-05-05 6:58 ` Wol
2023-05-05 8:02 ` Yu Kuai
2023-05-05 15:47 ` Jove
2023-05-06 1:33 ` Yu Kuai
2023-05-06 13:07 ` Jove
2023-05-06 21:59 ` Wol
2023-05-07 11:30 ` Jove
2023-05-09 2:10 ` Yu Kuai
2023-05-09 20:18 ` Johan Verrept
2023-05-10 1:13 ` Yu Kuai
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.