* Assemblin journaled array fails @ 2020-05-09 10:54 Michal Soltys 2020-05-09 11:32 ` Michal Soltys ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Michal Soltys @ 2020-05-09 10:54 UTC (permalink / raw) To: linux-raid Looks like I have now another issue with journaled array that now fails to assemble - with related command used to assembly it (whether it would be mdadm --incremental called by udev, or manually from commandline) hanged and unkillable. The array was previously running with different journal that was later replaced with different better disks. The journal device itself is healthy and assembles fine. Assembling (the below dump is when mdadm vis udev was doing it): mdadm -D /dev/md126 /dev/md126: Version : 1.1 Creation Time : Tue Mar 5 19:28:58 2019 Raid Level : raid5 Used Dev Size : 18446744073709551615 Raid Devices : 4 Total Devices : 5 Persistence : Superblock is persistent Update Time : Sat May 9 09:34:58 2020 State : active, FAILED, Not Started Active Devices : 4 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Consistency Policy : journal Name : xs22:r5_big (local to host xs22) UUID : d5995d76:67d7fabd:05392f87:25a91a97 Events : 56283 Number Major Minor RaidDevice State - 0 0 0 removed - 0 0 1 removed - 0 0 2 removed - 0 0 3 removed - 8 145 3 sync /dev/sdj1 - 8 129 2 sync /dev/sdi1 - 9 127 - spare /dev/md/xs22:r1_journal_big - 8 113 1 sync /dev/sdh1 - 8 97 0 sync /dev/sdg1 mdadm -Esv: ARRAY /dev/md/r1_journal_big level=raid1 metadata=1.1 num-devices=2 UUID=c18b3996:49b21f49:3ee18e92:a8240f7f name=xs22:r1_journal_big devices=/dev/sdv1,/dev/sdu1 ARRAY /dev/md/r5_big level=raid5 metadata=1.1 num-devices=4 UUID=d5995d76:67d7fabd:05392f87:25a91a97 name=xs22:r5_big spares=1 devices=/dev/md/xs22:r1_journal_big,/dev/sdg1,/dev/sdh1,/dev/sdj1,/dev/sdi1 When I was changing journal some time ago (successfully, everything was working fine afterwards) this is what I did: echo [write-through] >/sys/md126/md/journal_mode mdadm /dev/md126 --fail /dev/md/r1_journal_big --remove /dev/md/r1_journal_big echo resync >/sys/md126/md/consistency_policy # here i created new journal from new disks, the name was the same mdadm --readonly /dev/md126 mdadm /dev/md126 --add-journal /dev/md/r1_journal_big echo [write-back] >/sys/md126/md/journal_mode # At this point mdadm confirmed everything went fine After the reboot I'm in the situation as outlined above. Any suggestion how to assemble this array now ? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-09 10:54 Assemblin journaled array fails Michal Soltys @ 2020-05-09 11:32 ` Michal Soltys 2020-05-09 13:15 ` antlists 2020-05-09 23:57 ` Michal Soltys 2 siblings, 0 replies; 46+ messages in thread From: Michal Soltys @ 2020-05-09 11:32 UTC (permalink / raw) To: linux-raid On 20/05/09 12:54, Michal Soltys wrote: > # At this point mdadm confirmed everything went fine > > After the reboot I'm in the situation as outlined above. Any suggestion > how to assemble this array now ? To expand on this: "after reboot" means a few days later - the array was operating fine during that time. Attempting to assembly the arrary manually from early initramfs (before any udev/etc. and with manually loaded modules), e.g.: mdadm -A /dev/md/r5_big /dev/sd[efgh]1 /dev/md/r1_journal_big shows: md/raid:md126: device sde1 operational as raid disk 0 md/raid:md126: device sdh1 operational as raid disk 3 md/raid:md126: device sdg1 operational as raid disk 2 md/raid:md126: device sdf1 operational as raid disk 1 md/raid:md126: raid level 5 active with 4 out of 4 devices, algorithm 2 But then mdadm is left hanging for good. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-09 10:54 Assemblin journaled array fails Michal Soltys 2020-05-09 11:32 ` Michal Soltys @ 2020-05-09 13:15 ` antlists 2020-05-09 13:20 ` Michal Soltys 2020-05-09 23:57 ` Michal Soltys 2 siblings, 1 reply; 46+ messages in thread From: antlists @ 2020-05-09 13:15 UTC (permalink / raw) To: Michal Soltys, linux-raid On 09/05/2020 11:54, Michal Soltys wrote: > After the reboot I'm in the situation as outlined above. Any suggestion > how to assemble this array now ? mdadm -v ? uname -a ? https://raid.wiki.kernel.org/index.php/Asking_for_help Dunno what the problem is, and I'm hesitant to recommend it, but re-assembling the array without a journal might work. You have made sure that bitmaps aren't enabled? You can't have both, but for historical reasons you can end up with both enabled and then the resulting array won't start. Cheers, Wol ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-09 13:15 ` antlists @ 2020-05-09 13:20 ` Michal Soltys 2020-05-09 13:32 ` Michal Soltys 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-05-09 13:20 UTC (permalink / raw) To: antlists, linux-raid On 20/05/09 15:15, antlists wrote: > On 09/05/2020 11:54, Michal Soltys wrote: >> After the reboot I'm in the situation as outlined above. Any >> suggestion how to assemble this array now ? > > mdadm -v ? mdadm - v4.1 - 2018-10-01 > > uname -a ? Linux xs22 5.4.0-0.bpo.4-amd64 #1 SMP Debian 5.4.19-1~bpo10+1 (2020-03-09) x86_64 GNU/Linux > > https://raid.wiki.kernel.org/index.php/Asking_for_help > > Dunno what the problem is, and I'm hesitant to recommend it, but > re-assembling the array without a journal might work. I would - if possible - want to avoid this, as the journal is write-back. So as a last resort only. > > You have made sure that bitmaps aren't enabled? You can't have both, but > for historical reasons you can end up with both enabled and then the > resulting array won't start. > Yea, I know - the bitmap is not enabled on this array. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-09 13:20 ` Michal Soltys @ 2020-05-09 13:32 ` Michal Soltys 0 siblings, 0 replies; 46+ messages in thread From: Michal Soltys @ 2020-05-09 13:32 UTC (permalink / raw) To: antlists, linux-raid On 20/05/09 15:20, Michal Soltys wrote: > On 20/05/09 15:15, antlists wrote: >> On 09/05/2020 11:54, Michal Soltys wrote: >>> After the reboot I'm in the situation as outlined above. Any >>> suggestion how to assemble this array now ? >> >> mdadm -v ? > > mdadm - v4.1 - 2018-10-01 > >> >> uname -a ? > > Linux xs22 5.4.0-0.bpo.4-amd64 #1 SMP Debian 5.4.19-1~bpo10+1 > (2020-03-09) x86_64 GNU/Linux > >> >> https://raid.wiki.kernel.org/index.php/Asking_for_help >> >> Dunno what the problem is, and I'm hesitant to recommend it, but >> re-assembling the array without a journal might work. > > I would - if possible - want to avoid this, as the journal is > write-back. So as a last resort only. > Looking at the around-reboot logs, the machine might have eaten hard power loss (despite all ups processes working, but I'll be checking that later) The things that so far are really weird: 1) the size reported via mdadm -D: mdadm -D /dev/md126 /dev/md126: Version : 1.1 Creation Time : Tue Mar 5 19:28:58 2019 Raid Level : raid5 Used Dev Size : 18446744073709551615 This is clearly incorrect. 2) the mdadm --incremental is in R state eating all 1 cpu, but otherwise cannot be terminated or killed (so sitting on some lock ?) root 403 1 0 2744 3392 33 15:05 ? S 0:01 /lib/systemd/systemd-udevd --daemon --resolve-names=never root 571 403 98 740 1976 13 15:05 ? R 19:07 \_ /sbin/mdadm --incremental --export /dev/md125 --offroot /dev/disk/by-id/md-name-xs22:r1_journal_big /dev/disk/by-id/md-uuid-c18b3996:49b21f49:3ee18e92:a8240f7f /dev/md/r1_journal_big This is probably the same case as with manual mdadm -A that I tried from initramfs before udevd. 3) The 4 removed lines in mdadm -D output Number Major Minor RaidDevice State - 0 0 0 removed - 0 0 1 removed - 0 0 2 removed - 0 0 3 removed No device was removed from this array besides the old journal what I wrote in the first mail. 4) The journal device is listed as actual spare instead of journal - 8 145 3 sync /dev/sdj1 - 8 129 2 sync /dev/sdi1 - 8 113 1 sync /dev/sdh1 - 8 97 0 sync /dev/sdg1 - 9 125 - spare /dev/md/r1_journal_big All mdadm -Ev of each component device looks fine though: /dev/sdj1: Magic : a92b4efc Version : 1.1 Feature Map : 0x200 Array UUID : d5995d76:67d7fabd:05392f87:25a91a97 Name : xs22:r5_big (local to host xs22) Creation Time : Tue Mar 5 19:28:58 2019 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 7812237856 (3725.17 GiB 3999.87 GB) Array Size : 11718355968 (11175.50 GiB 11999.60 GB) Used Dev Size : 7812237312 (3725.17 GiB 3999.87 GB) Data Offset : 262144 sectors Super Offset : 0 sectors Unused Space : before=262064 sectors, after=544 sectors State : clean Device UUID : 5a73a457:b16dd591:90eceb56:2f8e1cb3 Update Time : Sat May 9 15:05:22 2020 Bad Block Log : 512 entries available at offset 72 sectors Checksum : dfa92cad - correct Events : 56289 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 3 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdi1: Magic : a92b4efc Version : 1.1 Feature Map : 0x200 Array UUID : d5995d76:67d7fabd:05392f87:25a91a97 Name : xs22:r5_big (local to host xs22) Creation Time : Tue Mar 5 19:28:58 2019 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 7812237856 (3725.17 GiB 3999.87 GB) Array Size : 11718355968 (11175.50 GiB 11999.60 GB) Used Dev Size : 7812237312 (3725.17 GiB 3999.87 GB) Data Offset : 262144 sectors Super Offset : 0 sectors Unused Space : before=262064 sectors, after=544 sectors State : clean Device UUID : f933e8b6:5411d560:0467cb10:95971e4b Update Time : Sat May 9 15:05:22 2020 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 5fce14c7 - correct Events : 56289 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdh1: Magic : a92b4efc Version : 1.1 Feature Map : 0x200 Array UUID : d5995d76:67d7fabd:05392f87:25a91a97 Name : xs22:r5_big (local to host xs22) Creation Time : Tue Mar 5 19:28:58 2019 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 7812237856 (3725.17 GiB 3999.87 GB) Array Size : 11718355968 (11175.50 GiB 11999.60 GB) Used Dev Size : 7812237312 (3725.17 GiB 3999.87 GB) Data Offset : 262144 sectors Super Offset : 0 sectors Unused Space : before=262064 sectors, after=544 sectors State : clean Device UUID : 8ea58384:43683121:d1017eed:1f565786 Update Time : Sat May 9 15:05:22 2020 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 5b136a2 - correct Events : 56289 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 1 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdg1: Magic : a92b4efc Version : 1.1 Feature Map : 0x200 Array UUID : d5995d76:67d7fabd:05392f87:25a91a97 Name : xs22:r5_big (local to host xs22) Creation Time : Tue Mar 5 19:28:58 2019 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 7812237856 (3725.17 GiB 3999.87 GB) Array Size : 11718355968 (11175.50 GiB 11999.60 GB) Used Dev Size : 7812237312 (3725.17 GiB 3999.87 GB) Data Offset : 262144 sectors Super Offset : 0 sectors Unused Space : before=262064 sectors, after=544 sectors State : clean Device UUID : e38f2cd0:b9716e4e:b7fc0631:0a4e352d Update Time : Sat May 9 15:05:22 2020 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 68fe1d3c - correct Events : 56289 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 0 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) And finally journal seems fine as well: /dev/md/r1_journal_big: Magic : a92b4efc Version : 1.1 Feature Map : 0x200 Array UUID : d5995d76:67d7fabd:05392f87:25a91a97 Name : xs22:r5_big (local to host xs22) Creation Time : Tue Mar 5 19:28:58 2019 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 536344576 (255.75 GiB 274.61 GB) Array Size : 11718355968 (11175.50 GiB 11999.60 GB) Used Dev Size : 7812237312 (3725.17 GiB 3999.87 GB) Data Offset : 262144 sectors Super Offset : 0 sectors Unused Space : before=261872 sectors, after=0 sectors State : clean Device UUID : c3a6f2f6:7dd26b0c:08a31ad7:cc8ed2a9 Update Time : Sat May 9 15:05:22 2020 Bad Block Log : 512 entries available at offset 264 sectors Checksum : c854904f - correct Events : 56289 Layout : left-symmetric Chunk Size : 512K Device Role : Journal Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-09 10:54 Assemblin journaled array fails Michal Soltys 2020-05-09 11:32 ` Michal Soltys 2020-05-09 13:15 ` antlists @ 2020-05-09 23:57 ` Michal Soltys 2020-05-11 11:11 ` Michal Soltys 2 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-05-09 23:57 UTC (permalink / raw) To: linux-raid Anyway, I did some tests with manually snapshotted component devices (using dm snapshot target to not touch underlying devices). The raid manages to force assemble in read-only mode with missing journal device, so we probably will be able to recover most data underneath this way (as a last resort). The situation I'm in now is likely from uncelan shutdown after all (why the machine failed to react to ups properly is another subject). I'd still want to find out why is - apparently - a journal device giving issues (contrary to what I'd expect it to do ...), with notable mention of: 1) mdadm hangs (unkillable, so I presume in kernel somewhere) and eats 1 cpu when trying to assemble the raid with journal device present; once it happens I can't do anything with the array (stop, run, etc.) and can only reboot the server to "fix" that 2) mdadm -D shows nonsensical device size after assembly attempt (Used Dev Size : 18446744073709551615) 3) the journal device (which itself is md raid1 consisting of 2 ssds) assembles, checks (0 mismatch_cnt) fine - and overall looks ok. From other interesting things, I also attempted to assemble the raid with snapshotted journal. From what I can see it does attempt to do something, judging from: dmsetup status: snap_jo2: 0 536870912 snapshot 40/33554432 16 snap_sdi1: 0 7812500000 snapshot 25768/83886080 112 snap_jo1: 0 536870912 snapshot 40/33554432 16 snap_sdg1: 0 7812500000 snapshot 25456/83886080 112 snap_sdj1: 0 7812500000 snapshot 25928/83886080 112 snap_sdh1: 0 7812500000 snapshot 25352/83886080 112 But it doesn't move from those values (with mdadm doing nothing eating 100% cpu as mentioned earlier). Any suggestions how to proceed would very be appreciated. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-09 23:57 ` Michal Soltys @ 2020-05-11 11:11 ` Michal Soltys 2020-05-12 1:27 ` Song Liu 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-05-11 11:11 UTC (permalink / raw) To: linux-raid; +Cc: song On 5/10/20 1:57 AM, Michal Soltys wrote: > Anyway, I did some tests with manually snapshotted component devices > (using dm snapshot target to not touch underlying devices). > > The raid manages to force assemble in read-only mode with missing > journal device, so we probably will be able to recover most data > underneath this way (as a last resort). > > The situation I'm in now is likely from uncelan shutdown after all (why > the machine failed to react to ups properly is another subject). > > I'd still want to find out why is - apparently - a journal device giving > issues (contrary to what I'd expect it to do ...), with notable mention of: > > 1) mdadm hangs (unkillable, so I presume in kernel somewhere) and eats 1 > cpu when trying to assemble the raid with journal device present; once > it happens I can't do anything with the array (stop, run, etc.) and can > only reboot the server to "fix" that > > 2) mdadm -D shows nonsensical device size after assembly attempt (Used > Dev Size : 18446744073709551615) > > 3) the journal device (which itself is md raid1 consisting of 2 ssds) > assembles, checks (0 mismatch_cnt) fine - and overall looks ok. > > > From other interesting things, I also attempted to assemble the raid > with snapshotted journal. From what I can see it does attempt to do > something, judging from: > > dmsetup status: > > snap_jo2: 0 536870912 snapshot 40/33554432 16 > snap_sdi1: 0 7812500000 snapshot 25768/83886080 112 > snap_jo1: 0 536870912 snapshot 40/33554432 16 > snap_sdg1: 0 7812500000 snapshot 25456/83886080 112 > snap_sdj1: 0 7812500000 snapshot 25928/83886080 112 > snap_sdh1: 0 7812500000 snapshot 25352/83886080 112 > > But it doesn't move from those values (with mdadm doing nothing eating > 100% cpu as mentioned earlier). > > > Any suggestions how to proceed would very be appreciated. I've added Song to the CC. If you have any suggestions how to proceed/debug this (mdadm stuck somewhere in kernel as far as I can see - while attempting to assembly it). For the record, I can assemble the raid successfully w/o journal (using snapshotted component devices as above), and we did recover some stuff this way from some filesystems - but for some other ones I'd like to keep that option as the very last resort. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-11 11:11 ` Michal Soltys @ 2020-05-12 1:27 ` Song Liu 2020-05-13 10:58 ` Michal Soltys 0 siblings, 1 reply; 46+ messages in thread From: Song Liu @ 2020-05-12 1:27 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid On Mon, May 11, 2020 at 4:13 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 5/10/20 1:57 AM, Michal Soltys wrote: > > Anyway, I did some tests with manually snapshotted component devices > > (using dm snapshot target to not touch underlying devices). > > > > The raid manages to force assemble in read-only mode with missing > > journal device, so we probably will be able to recover most data > > underneath this way (as a last resort). > > > > The situation I'm in now is likely from uncelan shutdown after all (why > > the machine failed to react to ups properly is another subject). > > > > I'd still want to find out why is - apparently - a journal device giving > > issues (contrary to what I'd expect it to do ...), with notable mention of: > > > > 1) mdadm hangs (unkillable, so I presume in kernel somewhere) and eats 1 > > cpu when trying to assemble the raid with journal device present; once > > it happens I can't do anything with the array (stop, run, etc.) and can > > only reboot the server to "fix" that > > > > 2) mdadm -D shows nonsensical device size after assembly attempt (Used > > Dev Size : 18446744073709551615) > > > > 3) the journal device (which itself is md raid1 consisting of 2 ssds) > > assembles, checks (0 mismatch_cnt) fine - and overall looks ok. > > > > > > From other interesting things, I also attempted to assemble the raid > > with snapshotted journal. From what I can see it does attempt to do > > something, judging from: > > > > dmsetup status: > > > > snap_jo2: 0 536870912 snapshot 40/33554432 16 > > snap_sdi1: 0 7812500000 snapshot 25768/83886080 112 > > snap_jo1: 0 536870912 snapshot 40/33554432 16 > > snap_sdg1: 0 7812500000 snapshot 25456/83886080 112 > > snap_sdj1: 0 7812500000 snapshot 25928/83886080 112 > > snap_sdh1: 0 7812500000 snapshot 25352/83886080 112 > > > > But it doesn't move from those values (with mdadm doing nothing eating > > 100% cpu as mentioned earlier). > > > > > > Any suggestions how to proceed would very be appreciated. > > > I've added Song to the CC. If you have any suggestions how to > proceed/debug this (mdadm stuck somewhere in kernel as far as I can see > - while attempting to assembly it). > > For the record, I can assemble the raid successfully w/o journal (using > snapshotted component devices as above), and we did recover some stuff > this way from some filesystems - but for some other ones I'd like to > keep that option as the very last resort. Sorry for delayed response. A few questions. For these two outputs: #1 Name : xs22:r5_big (local to host xs22) UUID : d5995d76:67d7fabd:05392f87:25a91a97 Events : 56283 Number Major Minor RaidDevice State - 0 0 0 removed - 0 0 1 removed - 0 0 2 removed - 0 0 3 removed - 8 145 3 sync /dev/sdj1 - 8 129 2 sync /dev/sdi1 - 9 127 - spare /dev/md/xs22:r1_journal_big - 8 113 1 sync /dev/sdh1 - 8 97 0 sync /dev/sdg1 #2 /dev/md/r1_journal_big: Magic : a92b4efc Version : 1.1 Feature Map : 0x200 Array UUID : d5995d76:67d7fabd:05392f87:25a91a97 Name : xs22:r5_big (local to host xs22) Creation Time : Tue Mar 5 19:28:58 2019 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 536344576 (255.75 GiB 274.61 GB) Array Size : 11718355968 (11175.50 GiB 11999.60 GB) Used Dev Size : 7812237312 (3725.17 GiB 3999.87 GB) Data Offset : 262144 sectors Super Offset : 0 sectors Unused Space : before=261872 sectors, after=0 sectors State : clean Device UUID : c3a6f2f6:7dd26b0c:08a31ad7:cc8ed2a9 Update Time : Sat May 9 15:05:22 2020 Bad Block Log : 512 entries available at offset 264 sectors Checksum : c854904f - correct Events : 56289 Layout : left-symmetric Chunk Size : 512K Device Role : Journal Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) Are these captured back to back? I am asking because they showed different "Events" number. Also, when mdadm -A hangs, could you please capture /proc/$(pidof mdadm)/stack ? 18446744073709551615 is 0xffffffffffffffffL, so it is not initialized by data from the disk. I suspect we hang somewhere before this value is initialized. Thanks, Song ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-12 1:27 ` Song Liu @ 2020-05-13 10:58 ` Michal Soltys 2020-05-13 16:17 ` Song Liu 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-05-13 10:58 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 5/12/20 3:27 AM, Song Liu wrote: > On Mon, May 11, 2020 at 4:13 AM Michal Soltys <msoltyspl@yandex.pl> wrote: >> >> On 5/10/20 1:57 AM, Michal Soltys wrote: >>> Anyway, I did some tests with manually snapshotted component devices >>> (using dm snapshot target to not touch underlying devices). >>> >>> The raid manages to force assemble in read-only mode with missing >>> journal device, so we probably will be able to recover most data >>> underneath this way (as a last resort). >>> >>> The situation I'm in now is likely from uncelan shutdown after all (why >>> the machine failed to react to ups properly is another subject). >>> >>> I'd still want to find out why is - apparently - a journal device giving >>> issues (contrary to what I'd expect it to do ...), with notable mention of: >>> >>> 1) mdadm hangs (unkillable, so I presume in kernel somewhere) and eats 1 >>> cpu when trying to assemble the raid with journal device present; once >>> it happens I can't do anything with the array (stop, run, etc.) and can >>> only reboot the server to "fix" that >>> >>> 2) mdadm -D shows nonsensical device size after assembly attempt (Used >>> Dev Size : 18446744073709551615) >>> >>> 3) the journal device (which itself is md raid1 consisting of 2 ssds) >>> assembles, checks (0 mismatch_cnt) fine - and overall looks ok. >>> >>> >>> From other interesting things, I also attempted to assemble the raid >>> with snapshotted journal. From what I can see it does attempt to do >>> something, judging from: >>> >>> dmsetup status: >>> >>> snap_jo2: 0 536870912 snapshot 40/33554432 16 >>> snap_sdi1: 0 7812500000 snapshot 25768/83886080 112 >>> snap_jo1: 0 536870912 snapshot 40/33554432 16 >>> snap_sdg1: 0 7812500000 snapshot 25456/83886080 112 >>> snap_sdj1: 0 7812500000 snapshot 25928/83886080 112 >>> snap_sdh1: 0 7812500000 snapshot 25352/83886080 112 >>> >>> But it doesn't move from those values (with mdadm doing nothing eating >>> 100% cpu as mentioned earlier). >>> >>> >>> Any suggestions how to proceed would very be appreciated. >> >> >> I've added Song to the CC. If you have any suggestions how to >> proceed/debug this (mdadm stuck somewhere in kernel as far as I can see >> - while attempting to assembly it). >> >> For the record, I can assemble the raid successfully w/o journal (using >> snapshotted component devices as above), and we did recover some stuff >> this way from some filesystems - but for some other ones I'd like to >> keep that option as the very last resort. > > Sorry for delayed response. > > A few questions. > > For these two outputs: > #1 > Name : xs22:r5_big (local to host xs22) > UUID : d5995d76:67d7fabd:05392f87:25a91a97 > Events : 56283 > > Number Major Minor RaidDevice State > - 0 0 0 removed > - 0 0 1 removed > - 0 0 2 removed > - 0 0 3 removed > > - 8 145 3 sync /dev/sdj1 > - 8 129 2 sync /dev/sdi1 > - 9 127 - spare /dev/md/xs22:r1_journal_big > - 8 113 1 sync /dev/sdh1 > - 8 97 0 sync /dev/sdg1 > > #2 > /dev/md/r1_journal_big: > Magic : a92b4efc > Version : 1.1 > Feature Map : 0x200 > Array UUID : d5995d76:67d7fabd:05392f87:25a91a97 > Name : xs22:r5_big (local to host xs22) > Creation Time : Tue Mar 5 19:28:58 2019 > Raid Level : raid5 > Raid Devices : 4 > > Avail Dev Size : 536344576 (255.75 GiB 274.61 GB) > Array Size : 11718355968 (11175.50 GiB 11999.60 GB) > Used Dev Size : 7812237312 (3725.17 GiB 3999.87 GB) > Data Offset : 262144 sectors > Super Offset : 0 sectors > Unused Space : before=261872 sectors, after=0 sectors > State : clean > Device UUID : c3a6f2f6:7dd26b0c:08a31ad7:cc8ed2a9 > > Update Time : Sat May 9 15:05:22 2020 > Bad Block Log : 512 entries available at offset 264 sectors > Checksum : c854904f - correct > Events : 56289 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Journal > Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) > > > Are these captured back to back? I am asking because they showed different > "Events" number. Nah, they were captured between reboots. Back to back all event fields show identical value (at 56291 now). > > Also, when mdadm -A hangs, could you please capture /proc/$(pidof mdadm)/stack ? > The output is empty: xs22:/☠ ps -eF fww | grep mdadm root 10332 9362 97 740 1884 25 12:47 pts/1 R+ 6:59 | \_ mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdi1 /dev/sdg1 /dev/sdj1 /dev/sdh1 xs22:/☠ cd /proc/10332 xs22:/proc/10332☠ cat stack xs22:/proc/10332☠ > 18446744073709551615 is 0xffffffffffffffffL, so it is not initialized > by data from the disk. > I suspect we hang somewhere before this value is initialized. > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-13 10:58 ` Michal Soltys @ 2020-05-13 16:17 ` Song Liu 2020-05-14 11:07 ` Michal Soltys 0 siblings, 1 reply; 46+ messages in thread From: Song Liu @ 2020-05-13 16:17 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid > > > > Are these captured back to back? I am asking because they showed different > > "Events" number. > > Nah, they were captured between reboots. Back to back all event fields show identical value (at 56291 now). > > > > > Also, when mdadm -A hangs, could you please capture /proc/$(pidof mdadm)/stack ? > > > > The output is empty: > > xs22:/☠ ps -eF fww | grep mdadm > root 10332 9362 97 740 1884 25 12:47 pts/1 R+ 6:59 | \_ mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdi1 /dev/sdg1 /dev/sdj1 /dev/sdh1 > xs22:/☠ cd /proc/10332 > xs22:/proc/10332☠ cat stack > xs22:/proc/10332☠ Hmm... Could you please share the strace output of "mdadm -A" command? Like strace mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/xxx ... Thanks, Song ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-13 16:17 ` Song Liu @ 2020-05-14 11:07 ` Michal Soltys 2020-05-16 0:15 ` Song Liu 2020-06-26 8:28 ` Guoqing Jiang 0 siblings, 2 replies; 46+ messages in thread From: Michal Soltys @ 2020-05-14 11:07 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2389 bytes --] On 5/13/20 6:17 PM, Song Liu wrote: >>> >>> Are these captured back to back? I am asking because they showed different >>> "Events" number. >> >> Nah, they were captured between reboots. Back to back all event fields show identical value (at 56291 now). >> >>> >>> Also, when mdadm -A hangs, could you please capture /proc/$(pidof mdadm)/stack ? >>> >> >> The output is empty: >> >> xs22:/☠ ps -eF fww | grep mdadm >> root 10332 9362 97 740 1884 25 12:47 pts/1 R+ 6:59 | \_ mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdi1 /dev/sdg1 /dev/sdj1 /dev/sdh1 >> xs22:/☠ cd /proc/10332 >> xs22:/proc/10332☠ cat stack >> xs22:/proc/10332☠ > > Hmm... Could you please share the strace output of "mdadm -A" command? Like > > strace mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/xxx ... > > Thanks, > Song I did one more thing - ran cat on that stack in another terminal every 0.1 seconds, this is what was found: cat: /proc//stack: No such file or directory cat: /proc//stack: No such file or directory cat: /proc//stack: No such file or directory [<0>] submit_bio_wait+0x5b/0x80 [<0>] r5l_recovery_read_page+0x1b6/0x200 [raid456] [<0>] r5l_recovery_verify_data_checksum+0x19/0x70 [raid456] [<0>] r5l_start+0x99e/0xe70 [raid456] [<0>] md_start.part.48+0x2e/0x50 [md_mod] [<0>] do_md_run+0x64/0x100 [md_mod] [<0>] md_ioctl+0xe7d/0x17d0 [md_mod] [<0>] blkdev_ioctl+0x4d0/0xa00 [<0>] block_ioctl+0x39/0x40 [<0>] do_vfs_ioctl+0xa4/0x630 [<0>] ksys_ioctl+0x60/0x90 [<0>] __x64_sys_ioctl+0x16/0x20 [<0>] do_syscall_64+0x52/0x160 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 cat: /proc//stack: No such file or directory [<0>] submit_bio_wait+0x5b/0x80 [<0>] r5l_recovery_read_page+0x1b6/0x200 [raid456] [<0>] r5l_recovery_verify_data_checksum+0x19/0x70 [raid456] [<0>] r5l_start+0x99e/0xe70 [raid456] [<0>] md_start.part.48+0x2e/0x50 [md_mod] [<0>] do_md_run+0x64/0x100 [md_mod] [<0>] md_ioctl+0xe7d/0x17d0 [md_mod] [<0>] blkdev_ioctl+0x4d0/0xa00 [<0>] block_ioctl+0x39/0x40 [<0>] do_vfs_ioctl+0xa4/0x630 [<0>] ksys_ioctl+0x60/0x90 [<0>] __x64_sys_ioctl+0x16/0x20 [<0>] do_syscall_64+0x52/0x160 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 cat: /proc//stack: No such file or directory cat: /proc//stack: No such file or directory cat: /proc//stack: No such file or directory cat: /proc//stack: No such file or directory Strace output attached. Thanks. [-- Attachment #2: mdadm.strace.log --] [-- Type: text/x-log, Size: 46654 bytes --] 3650 execve("/sbin/mdadm", ["mdadm", "-A", "/dev/md/r5_big", "/dev/md/r1_journal_big", "/dev/sdi1", "/dev/sdg1", "/dev/sdj1", "/dev/sdh1"], 0x7ffea6946570 /* 39 vars */) = 0 3650 brk(NULL) = 0x55885c8e2000 3650 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 3650 fstat(3, {st_mode=S_IFREG|0644, st_size=63242, ...}) = 0 3650 mmap(NULL, 63242, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fbb17321000 3650 close(3) = 0 3650 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3 3650 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000\21\0\0\0\0\0\0"..., 832) = 832 3650 fstat(3, {st_mode=S_IFREG|0644, st_size=14592, ...}) = 0 3650 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbb1731f000 3650 mmap(NULL, 16656, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fbb1731a000 3650 mmap(0x7fbb1731b000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7fbb1731b000 3650 mmap(0x7fbb1731c000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7fbb1731c000 3650 mmap(0x7fbb1731d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7fbb1731d000 3650 close(3) = 0 3650 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 3650 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260A\2\0\0\0\0\0"..., 832) = 832 3650 fstat(3, {st_mode=S_IFREG|0755, st_size=1824496, ...}) = 0 3650 mmap(NULL, 1837056, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fbb17159000 3650 mprotect(0x7fbb1717b000, 1658880, PROT_NONE) = 0 3650 mmap(0x7fbb1717b000, 1343488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7fbb1717b000 3650 mmap(0x7fbb172c3000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16a000) = 0x7fbb172c3000 3650 mmap(0x7fbb17310000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b6000) = 0x7fbb17310000 3650 mmap(0x7fbb17316000, 14336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fbb17316000 3650 close(3) = 0 3650 mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbb17156000 3650 arch_prctl(ARCH_SET_FS, 0x7fbb17156740) = 0 3650 mprotect(0x7fbb17310000, 16384, PROT_READ) = 0 3650 mprotect(0x7fbb1731d000, 4096, PROT_READ) = 0 3650 mprotect(0x55885b2e6000, 4096, PROT_READ) = 0 3650 mprotect(0x7fbb17358000, 4096, PROT_READ) = 0 3650 munmap(0x7fbb17321000, 63242) = 0 3650 getpid() = 3650 3650 uname({sysname="Linux", nodename="xs22", ...}) = 0 3650 brk(NULL) = 0x55885c8e2000 3650 brk(0x55885c903000) = 0x55885c903000 3650 openat(AT_FDCWD, "/dev/md/r5_big", O_RDONLY) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/etc/mdadm/mdadm.conf", O_RDONLY) = 3 3650 fstat(3, {st_mode=S_IFREG|0644, st_size=1729, ...}) = 0 3650 fstat(3, {st_mode=S_IFREG|0644, st_size=1729, ...}) = 0 3650 read(3, "# mdadm.conf\n#\n# Please refer to"..., 4096) = 1729 3650 read(3, "", 4096) = 0 3650 fstat(3, {st_mode=S_IFREG|0644, st_size=1729, ...}) = 0 3650 close(3) = 0 3650 openat(AT_FDCWD, "/etc/mdadm/mdadm.conf.d", O_RDONLY) = -1 ENOENT (No such file or directory) 3650 uname({sysname="Linux", nodename="xs22", ...}) = 0 3650 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 3650 fstat(3, {st_mode=S_IFREG|0644, st_size=63242, ...}) = 0 3650 mmap(NULL, 63242, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fbb17321000 3650 close(3) = 0 3650 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/tls/x86_64/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/x86_64-linux-gnu/tls/x86_64/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/tls/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/x86_64-linux-gnu/tls/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/tls/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/x86_64-linux-gnu/tls/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/tls/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/x86_64-linux-gnu/tls", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/x86_64/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/x86_64-linux-gnu/x86_64/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/x86_64-linux-gnu/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/x86_64-linux-gnu/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/x86_64-linux-gnu", {st_mode=S_IFDIR|0755, st_size=6218, ...}) = 0 3650 openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/tls/x86_64/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/x86_64-linux-gnu/tls/x86_64/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/tls/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/x86_64-linux-gnu/tls/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/tls/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/x86_64-linux-gnu/tls/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/tls/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/x86_64-linux-gnu/tls", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/x86_64/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/x86_64-linux-gnu/x86_64/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/x86_64-linux-gnu/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/x86_64-linux-gnu/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/x86_64-linux-gnu", {st_mode=S_IFDIR|0755, st_size=38826, ...}) = 0 3650 openat(AT_FDCWD, "/lib/tls/x86_64/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/tls/x86_64/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/tls/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/tls/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/tls/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/tls/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/tls/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/tls", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/x86_64/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/x86_64/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/lib", {st_mode=S_IFDIR|0755, st_size=326, ...}) = 0 3650 openat(AT_FDCWD, "/usr/lib/tls/x86_64/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/tls/x86_64/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/tls/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/tls/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/tls/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/tls/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/tls/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/tls", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/x86_64/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/x86_64/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/x86_64/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib/x86_64", 0x7ffc70457820) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/libdlm_lt.so.3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 stat("/usr/lib", {st_mode=S_IFDIR|0755, st_size=1262, ...}) = 0 3650 munmap(0x7fbb17321000, 63242) = 0 3650 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 3650 fstat(3, {st_mode=S_IFREG|0644, st_size=63242, ...}) = 0 3650 mmap(NULL, 63242, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fbb17321000 3650 close(3) = 0 3650 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libcmap.so.4", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libcmap.so.4", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/lib/libcmap.so.4", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/usr/lib/libcmap.so.4", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) 3650 munmap(0x7fbb17321000, 63242) = 0 3650 geteuid() = 0 3650 openat(AT_FDCWD, "/dev/md/r1_journal_big", O_RDONLY|O_DIRECT) = 3 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 readlink("/sys/dev/block/9:125", "../../devices/virtual/block/md12"..., 199) = 33 3650 stat("/sys/block/md125/md", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 openat(AT_FDCWD, "/sys/block/md125/md/metadata_version", O_RDONLY) = 4 3650 read(4, "1.1\n", 4096) = 4 3650 close(4) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 lseek(3, 274742575104, SEEK_SET) = 274742575104 3650 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 lseek(3, 274742632448, SEEK_SET) = 274742632448 3650 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 3650 lseek(3, 0, SEEK_CUR) = 274742636544 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 lseek(3, 0, SEEK_SET) = 0 3650 read(3, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(3, 0, SEEK_CUR) = 4096 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 lseek(3, 4096, SEEK_SET) = 4096 3650 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 3650 lseek(3, 0, SEEK_CUR) = 8192 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 lseek(3, 0, SEEK_SET) = 0 3650 read(3, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(3, 0, SEEK_CUR) = 4096 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=1073741824, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 lseek(3, 274742640128, SEEK_SET) = 274742640128 3650 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 3650 ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=1073741824, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=1073741824, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 openat(AT_FDCWD, "/sys/bus/pci/drivers/isci", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/sys/bus/pci/drivers/ahci", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4 3650 fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 getdents64(4, /* 9 entries */, 32768) = 264 3650 readlink("/sys/bus/pci/drivers/ahci/./subsystem", 0x7ffc70456530, 4096) = -1 ENOENT (No such file or directory) 3650 readlink("/sys/bus/pci/drivers/ahci/../subsystem", 0x7ffc70456530, 4096) = -1 ENOENT (No such file or directory) 3650 readlink("/sys/bus/pci/drivers/ahci/uevent/subsystem", 0x7ffc70456530, 4096) = -1 ENOTDIR (Not a directory) 3650 readlink("/sys/bus/pci/drivers/ahci/0000:00:1f.2/subsystem", "../../../bus/pci", 4096) = 16 3650 openat(AT_FDCWD, "/sys/bus/pci/drivers/ahci/0000:00:1f.2/vendor", O_RDONLY) = 5 3650 read(5, "0x8086\n", 7) = 7 3650 close(5) = 0 3650 openat(AT_FDCWD, "/sys/bus/pci/drivers/ahci/0000:00:1f.2/device", O_RDONLY) = 5 3650 lseek(5, 0, SEEK_SET) = 0 3650 read(5, "0x1d02\n", 50) = 7 3650 close(5) = 0 3650 openat(AT_FDCWD, "/sys/bus/pci/drivers/ahci/0000:00:1f.2/class", O_RDONLY) = 5 3650 lseek(5, 0, SEEK_SET) = 0 3650 read(5, "0x010601\n", 50) = 9 3650 close(5) = 0 3650 lstat("/sys", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 3650 lstat("/sys/bus", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 lstat("/sys/bus/pci", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 lstat("/sys/bus/pci/drivers", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 lstat("/sys/bus/pci/drivers/ahci", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 lstat("/sys/bus/pci/drivers/ahci/0000:00:1f.2", {st_mode=S_IFLNK|0777, st_size=0, ...}) = 0 3650 readlink("/sys/bus/pci/drivers/ahci/0000:00:1f.2", "../../../../devices/pci0000:00/0"..., 4095) = 43 3650 lstat("/sys/devices", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 lstat("/sys/devices/pci0000:00", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 lstat("/sys/devices/pci0000:00/0000:00:1f.2", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 readlink("/sys/bus/pci/drivers/ahci/bind/subsystem", 0x7ffc70456530, 4096) = -1 ENOTDIR (Not a directory) 3650 readlink("/sys/bus/pci/drivers/ahci/new_id/subsystem", 0x7ffc70456530, 4096) = -1 ENOTDIR (Not a directory) 3650 readlink("/sys/bus/pci/drivers/ahci/remove_id/subsystem", 0x7ffc70456530, 4096) = -1 ENOTDIR (Not a directory) 3650 readlink("/sys/bus/pci/drivers/ahci/unbind/subsystem", 0x7ffc70456530, 4096) = -1 ENOTDIR (Not a directory) 3650 readlink("/sys/bus/pci/drivers/ahci/module/subsystem", 0x7ffc70456530, 4096) = -1 ENOENT (No such file or directory) 3650 getdents64(4, /* 0 entries */, 32768) = 0 3650 close(4) = 0 3650 openat(AT_FDCWD, "/sys/bus/pci/drivers/vmd", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/sys/bus/pci/drivers/nvme", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory) 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 lstat("/sys", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 3650 lstat("/sys/dev", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 lstat("/sys/dev/block", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 lstat("/sys/dev/block/9:125", {st_mode=S_IFLNK|0777, st_size=0, ...}) = 0 3650 readlink("/sys/dev/block/9:125", "../../devices/virtual/block/md12"..., 4095) = 33 3650 lstat("/sys/devices", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 lstat("/sys/devices/virtual", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 lstat("/sys/devices/virtual/block", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 lstat("/sys/devices/virtual/block/md125", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 lstat("/sys/devices/virtual/block/md125/device", 0x7ffc704573c0) = -1 ENOENT (No such file or directory) 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 lseek(3, 274742639616, SEEK_SET) = 274742639616 3650 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 3650 lseek(3, 0, SEEK_SET) = 0 3650 read(3, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 512) = 512 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 lseek(3, 0, SEEK_SET) = 0 3650 read(3, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 512) = 512 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 lseek(3, 274742632448, SEEK_SET) = 274742632448 3650 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 3650 lseek(3, 0, SEEK_CUR) = 274742636544 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 lseek(3, 0, SEEK_SET) = 0 3650 read(3, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(3, 0, SEEK_CUR) = 4096 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 lseek(3, 4096, SEEK_SET) = 4096 3650 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 3650 lseek(3, 0, SEEK_CUR) = 8192 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 lseek(3, 0, SEEK_SET) = 0 3650 read(3, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(3, 0, SEEK_CUR) = 4096 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [274742640640]) = 0 3650 lseek(3, 0, SEEK_SET) = 0 3650 read(3, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(3, 0, SEEK_CUR) = 4096 3650 close(3) = 0 3650 openat(AT_FDCWD, "/dev/md/r1_journal_big", O_RDONLY|O_EXCL|O_DIRECT) = 3 3650 close(3) = 0 3650 stat("/sys/dev/block/9:125/partition", 0x7ffc704575c0) = -1 ENOENT (No such file or directory) 3650 openat(AT_FDCWD, "/dev/sdi1", O_RDONLY|O_DIRECT) = 3 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x81), ...}) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x81), ...}) = 0 3650 readlink("/sys/dev/block/8:129", "../../devices/pci0000:00/0000:00"..., 199) = 141 3650 stat("/sys/block/sdi/md", 0x7ffc704556d0) = -1 ENOENT (No such file or directory) 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x81), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [4000000000000]) = 0 3650 lseek(3, 0, SEEK_SET) = 0 3650 read(3, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(3, 0, SEEK_CUR) = 4096 3650 close(3) = 0 3650 openat(AT_FDCWD, "/dev/sdi1", O_RDONLY|O_EXCL|O_DIRECT) = 3 3650 close(3) = 0 3650 stat("/sys/dev/block/8:129/partition", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 3650 openat(AT_FDCWD, "/dev/sdg1", O_RDONLY|O_DIRECT) = 3 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x61), ...}) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x61), ...}) = 0 3650 readlink("/sys/dev/block/8:97", "../../devices/pci0000:00/0000:00"..., 199) = 141 3650 stat("/sys/block/sdg/md", 0x7ffc704556d0) = -1 ENOENT (No such file or directory) 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x61), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [4000000000000]) = 0 3650 lseek(3, 0, SEEK_SET) = 0 3650 read(3, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(3, 0, SEEK_CUR) = 4096 3650 close(3) = 0 3650 openat(AT_FDCWD, "/dev/sdg1", O_RDONLY|O_EXCL|O_DIRECT) = 3 3650 close(3) = 0 3650 stat("/sys/dev/block/8:97/partition", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 3650 openat(AT_FDCWD, "/dev/sdj1", O_RDONLY|O_DIRECT) = 3 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x91), ...}) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x91), ...}) = 0 3650 readlink("/sys/dev/block/8:145", "../../devices/pci0000:00/0000:00"..., 199) = 141 3650 stat("/sys/block/sdj/md", 0x7ffc704556d0) = -1 ENOENT (No such file or directory) 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x91), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [4000000000000]) = 0 3650 lseek(3, 0, SEEK_SET) = 0 3650 read(3, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(3, 0, SEEK_CUR) = 4096 3650 close(3) = 0 3650 openat(AT_FDCWD, "/dev/sdj1", O_RDONLY|O_EXCL|O_DIRECT) = 3 3650 close(3) = 0 3650 stat("/sys/dev/block/8:145/partition", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 3650 openat(AT_FDCWD, "/dev/sdh1", O_RDONLY|O_DIRECT) = 3 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x71), ...}) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x71), ...}) = 0 3650 readlink("/sys/dev/block/8:113", "../../devices/pci0000:00/0000:00"..., 199) = 141 3650 stat("/sys/block/sdh/md", 0x7ffc704556d0) = -1 ENOENT (No such file or directory) 3650 ioctl(3, BLKSSZGET, [512]) = 0 3650 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x71), ...}) = 0 3650 ioctl(3, BLKGETSIZE64, [4000000000000]) = 0 3650 lseek(3, 0, SEEK_SET) = 0 3650 read(3, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(3, 0, SEEK_CUR) = 4096 3650 close(3) = 0 3650 openat(AT_FDCWD, "/dev/sdh1", O_RDONLY|O_EXCL|O_DIRECT) = 3 3650 close(3) = 0 3650 stat("/sys/dev/block/8:113/partition", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 3650 mkdir("/run/mdadm", 0755) = -1 EEXIST (File exists) 3650 openat(AT_FDCWD, "/run/mdadm/map.lock", O_RDWR|O_CREAT|O_TRUNC, 0600) = 3 3650 fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) 3650 flock(3, LOCK_EX) = 0 3650 fstat(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 3650 openat(AT_FDCWD, "/run/mdadm/map", O_RDONLY) = 4 3650 fcntl(4, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) 3650 fstat(4, {st_mode=S_IFREG|0600, st_size=191, ...}) = 0 3650 read(4, "md125 1.1 96398bc1:491fb249:928e"..., 4096) = 191 3650 read(4, "", 4096) = 0 3650 close(4) = 0 3650 stat("/dev/.udev", 0x7ffc704574d0) = -1 ENOENT (No such file or directory) 3650 stat("/run/udev", {st_mode=S_IFDIR|0755, st_size=160, ...}) = 0 3650 openat(AT_FDCWD, "/proc/mdstat", O_RDONLY) = 4 3650 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 3650 fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 3650 read(4, "Personalities : [linear] [multip"..., 1024) = 654 3650 read(4, "", 1024) = 0 3650 close(4) = 0 3650 openat(AT_FDCWD, "/proc/mdstat", O_RDONLY) = 4 3650 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 3650 fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 3650 read(4, "Personalities : [linear] [multip"..., 1024) = 654 3650 read(4, "", 1024) = 0 3650 close(4) = 0 3650 openat(AT_FDCWD, "/proc/mdstat", O_RDONLY) = 4 3650 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 3650 fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 3650 read(4, "Personalities : [linear] [multip"..., 1024) = 654 3650 read(4, "", 1024) = 0 3650 close(4) = 0 3650 openat(AT_FDCWD, "/proc/mdstat", O_RDONLY) = 4 3650 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 3650 fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 3650 read(4, "Personalities : [linear] [multip"..., 1024) = 654 3650 read(4, "", 1024) = 0 3650 close(4) = 0 3650 openat(AT_FDCWD, "/sys/block/md124/dev", O_RDONLY) = -1 ENOENT (No such file or directory) 3650 getpid() = 3650 3650 mknod("/dev/.tmp.md.3650:9:124", S_IFBLK|0600, makedev(0x9, 0x7c)) = 0 3650 openat(AT_FDCWD, "/dev/.tmp.md.3650:9:124", O_RDWR|O_EXCL|O_DIRECT) = 4 3650 unlink("/dev/.tmp.md.3650:9:124") = 0 3650 fstat(4, {st_mode=S_IFBLK|0600, st_rdev=makedev(0x9, 0x7c), ...}) = 0 3650 readlink("/sys/dev/block/9:124", "../../devices/virtual/block/md12"..., 199) = 33 3650 openat(AT_FDCWD, "/proc/mdstat", O_RDONLY) = 5 3650 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 3650 fstat(5, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 3650 read(5, "Personalities : [linear] [multip"..., 1024) = 654 3650 read(5, "", 1024) = 0 3650 close(5) = 0 3650 ioctl(4, STOP_ARRAY, 0) = 0 3650 openat(AT_FDCWD, "/dev/md/r1_journal_big", O_RDWR|O_EXCL|O_DIRECT) = 5 3650 ioctl(5, BLKSSZGET, [512]) = 0 3650 fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 ioctl(5, BLKGETSIZE64, [274742640640]) = 0 3650 lseek(5, 0, SEEK_SET) = 0 3650 read(5, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(5, 0, SEEK_CUR) = 4096 3650 fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x9, 0x7d), ...}) = 0 3650 close(5) = 0 3650 openat(AT_FDCWD, "/dev/sdi1", O_RDWR|O_EXCL|O_DIRECT) = 5 3650 ioctl(5, BLKSSZGET, [512]) = 0 3650 fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x81), ...}) = 0 3650 ioctl(5, BLKGETSIZE64, [4000000000000]) = 0 3650 lseek(5, 0, SEEK_SET) = 0 3650 read(5, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(5, 0, SEEK_CUR) = 4096 3650 fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x81), ...}) = 0 3650 close(5) = 0 3650 openat(AT_FDCWD, "/dev/sdg1", O_RDWR|O_EXCL|O_DIRECT) = 5 3650 ioctl(5, BLKSSZGET, [512]) = 0 3650 fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x61), ...}) = 0 3650 ioctl(5, BLKGETSIZE64, [4000000000000]) = 0 3650 lseek(5, 0, SEEK_SET) = 0 3650 read(5, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(5, 0, SEEK_CUR) = 4096 3650 fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x61), ...}) = 0 3650 close(5) = 0 3650 openat(AT_FDCWD, "/dev/sdj1", O_RDWR|O_EXCL|O_DIRECT) = 5 3650 ioctl(5, BLKSSZGET, [512]) = 0 3650 fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x91), ...}) = 0 3650 ioctl(5, BLKGETSIZE64, [4000000000000]) = 0 3650 lseek(5, 0, SEEK_SET) = 0 3650 read(5, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(5, 0, SEEK_CUR) = 4096 3650 fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x91), ...}) = 0 3650 close(5) = 0 3650 openat(AT_FDCWD, "/dev/sdh1", O_RDWR|O_EXCL|O_DIRECT) = 5 3650 ioctl(5, BLKSSZGET, [512]) = 0 3650 fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x71), ...}) = 0 3650 ioctl(5, BLKGETSIZE64, [4000000000000]) = 0 3650 lseek(5, 0, SEEK_SET) = 0 3650 read(5, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(5, 0, SEEK_CUR) = 4096 3650 fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x71), ...}) = 0 3650 close(5) = 0 3650 openat(AT_FDCWD, "/dev/sdg1", O_RDONLY|O_EXCL|O_DIRECT) = 5 3650 ioctl(5, BLKSSZGET, [512]) = 0 3650 fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(0x8, 0x61), ...}) = 0 3650 ioctl(5, BLKGETSIZE64, [4000000000000]) = 0 3650 lseek(5, 0, SEEK_SET) = 0 3650 read(5, "\374N+\251\1\0\0\0\0\2\0\0\0\0\0\0\325\231]vg\327\372\275\59/\207%\251\32\227"..., 4096) = 4096 3650 lseek(5, 0, SEEK_CUR) = 4096 3650 close(5) = 0 3650 fstat(4, {st_mode=S_IFBLK|0600, st_rdev=makedev(0x9, 0x7c), ...}) = 0 3650 readlink("/sys/dev/block/9:124", "../../devices/virtual/block/md12"..., 199) = 33 3650 stat("/sys/block/md124/md", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 3650 fstat(4, {st_mode=S_IFBLK|0600, st_rdev=makedev(0x9, 0x7c), ...}) = 0 3650 readlink("/sys/dev/block/9:124", "../../devices/virtual/block/md12"..., 199) = 33 3650 mkdir("/run/mdadm", 0755) = -1 EEXIST (File exists) 3650 openat(AT_FDCWD, "/run/mdadm/map.new", O_RDWR|O_CREAT, 0600) = 5 3650 fcntl(5, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) 3650 fstat(5, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 3650 write(5, "md124 1.1 765d99d5:bdfad767:872f"..., 252) = 252 3650 close(5) = 0 3650 rename("/run/mdadm/map.new", "/run/mdadm/map") = 0 3650 ioctl(4, SET_ARRAY_INFO, 0x7ffc70457950) = 0 3650 openat(AT_FDCWD, "/dev/sdh1", O_RDWR|O_EXCL|O_DIRECT) = 5 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=0, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=1, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=2, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=3, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=4, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=5, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=6, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=7, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=8, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=9, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=10, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=11, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=12, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=13, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=14, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=15, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 close(5) = 0 3650 ioctl(4, ADD_NEW_DISK, 0x55885c8e7cc8) = 0 3650 openat(AT_FDCWD, "/dev/sdi1", O_RDWR|O_EXCL|O_DIRECT) = 5 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=0, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=1, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=2, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=3, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=4, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=5, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=6, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=7, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=8, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=9, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=10, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=11, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=12, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=13, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=14, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=15, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 close(5) = 0 3650 ioctl(4, ADD_NEW_DISK, 0x55885c8e7710) = 0 3650 openat(AT_FDCWD, "/dev/sdj1", O_RDWR|O_EXCL|O_DIRECT) = 5 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=0, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=1, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=2, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=3, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=4, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=5, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=6, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=7, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=8, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=9, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=10, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=11, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=12, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=13, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=14, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=15, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 close(5) = 0 3650 ioctl(4, ADD_NEW_DISK, 0x55885c8e7ae0) = 0 3650 openat(AT_FDCWD, "/dev/md/r1_journal_big", O_RDWR|O_EXCL|O_DIRECT) = 5 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=0, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=1, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=2, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=3, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=4, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=5, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=6, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=7, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=8, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=9, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=10, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=11, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=12, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=13, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=14, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=15, devname="", volname=""}}) = -1 ENXIO (No such device or address) 3650 close(5) = 0 3650 ioctl(4, ADD_NEW_DISK, 0x55885c8e7528) = 0 3650 openat(AT_FDCWD, "/dev/sdg1", O_RDWR|O_EXCL|O_DIRECT) = 5 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=0, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=1, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=2, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=3, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=4, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=5, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=6, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=7, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=8, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=9, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=10, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=11, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=12, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=13, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=14, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 ioctl(5, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=15, devname="", volname=""}}) = -1 EINVAL (Invalid argument) 3650 close(5) = 0 3650 ioctl(4, ADD_NEW_DISK, 0x55885c8e78f8) = 0 3650 ioctl(4, RUN_ARRAY ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-14 11:07 ` Michal Soltys @ 2020-05-16 0:15 ` Song Liu [not found] ` <d0340d7b-6b3a-4fd3-e446-5f0967132ef6@yandex.pl> 2020-06-26 8:28 ` Guoqing Jiang 1 sibling, 1 reply; 46+ messages in thread From: Song Liu @ 2020-05-16 0:15 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid On Thu, May 14, 2020 at 4:07 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 5/13/20 6:17 PM, Song Liu wrote: > >>> > >>> Are these captured back to back? I am asking because they showed different > >>> "Events" number. > >> > >> Nah, they were captured between reboots. Back to back all event fields show identical value (at 56291 now). > >> > >>> > >>> Also, when mdadm -A hangs, could you please capture /proc/$(pidof mdadm)/stack ? > >>> > >> > >> The output is empty: > >> > >> xs22:/☠ ps -eF fww | grep mdadm > >> root 10332 9362 97 740 1884 25 12:47 pts/1 R+ 6:59 | \_ mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdi1 /dev/sdg1 /dev/sdj1 /dev/sdh1 > >> xs22:/☠ cd /proc/10332 > >> xs22:/proc/10332☠ cat stack > >> xs22:/proc/10332☠ > > > > Hmm... Could you please share the strace output of "mdadm -A" command? Like > > > > strace mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/xxx ... > > > > Thanks, > > Song > > I did one more thing - ran cat on that stack in another terminal every > 0.1 seconds, this is what was found: > > cat: /proc//stack: No such file or directory > cat: /proc//stack: No such file or directory > cat: /proc//stack: No such file or directory > [<0>] submit_bio_wait+0x5b/0x80 > [<0>] r5l_recovery_read_page+0x1b6/0x200 [raid456] > [<0>] r5l_recovery_verify_data_checksum+0x19/0x70 [raid456] > [<0>] r5l_start+0x99e/0xe70 [raid456] > [<0>] md_start.part.48+0x2e/0x50 [md_mod] > [<0>] do_md_run+0x64/0x100 [md_mod] > [<0>] md_ioctl+0xe7d/0x17d0 [md_mod] > [<0>] blkdev_ioctl+0x4d0/0xa00 > [<0>] block_ioctl+0x39/0x40 > [<0>] do_vfs_ioctl+0xa4/0x630 > [<0>] ksys_ioctl+0x60/0x90 > [<0>] __x64_sys_ioctl+0x16/0x20 > [<0>] do_syscall_64+0x52/0x160 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > cat: /proc//stack: No such file or directory > [<0>] submit_bio_wait+0x5b/0x80 > [<0>] r5l_recovery_read_page+0x1b6/0x200 [raid456] > [<0>] r5l_recovery_verify_data_checksum+0x19/0x70 [raid456] > [<0>] r5l_start+0x99e/0xe70 [raid456] > [<0>] md_start.part.48+0x2e/0x50 [md_mod] > [<0>] do_md_run+0x64/0x100 [md_mod] > [<0>] md_ioctl+0xe7d/0x17d0 [md_mod] > [<0>] blkdev_ioctl+0x4d0/0xa00 > [<0>] block_ioctl+0x39/0x40 > [<0>] do_vfs_ioctl+0xa4/0x630 > [<0>] ksys_ioctl+0x60/0x90 > [<0>] __x64_sys_ioctl+0x16/0x20 > [<0>] do_syscall_64+0x52/0x160 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > cat: /proc//stack: No such file or directory > cat: /proc//stack: No such file or directory > cat: /proc//stack: No such file or directory > cat: /proc//stack: No such file or directory > > > Strace output attached. So mdadm hangs at "ioctl(4, RUN_ARRAY", right? Does /proc/<pid of hanging mdadm>/stack always shows the same output above? Thanks, Song ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <d0340d7b-6b3a-4fd3-e446-5f0967132ef6@yandex.pl>]
* Re: Assemblin journaled array fails [not found] ` <d0340d7b-6b3a-4fd3-e446-5f0967132ef6@yandex.pl> @ 2020-05-18 23:55 ` Song Liu 2020-05-20 12:08 ` Michal Soltys 2020-05-25 16:23 ` Michal Soltys 0 siblings, 2 replies; 46+ messages in thread From: Song Liu @ 2020-05-18 23:55 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid On Mon, May 18, 2020 at 4:43 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 5/16/20 2:15 AM, Song Liu wrote: > > On Thu, May 14, 2020 at 4:07 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > >> > >> On 5/13/20 6:17 PM, Song Liu wrote: > >>>>> > >>>>> Are these captured back to back? I am asking because they showed different > >>>>> "Events" number. > >>>> > >>>> Nah, they were captured between reboots. Back to back all event fields show identical value (at 56291 now). > >>>> > >>>>> > >>>>> Also, when mdadm -A hangs, could you please capture /proc/$(pidof mdadm)/stack ? > >>>>> > >>>> > >>>> The output is empty: > >>>> > >>>> xs22:/☠ ps -eF fww | grep mdadm > >>>> root 10332 9362 97 740 1884 25 12:47 pts/1 R+ 6:59 | \_ mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdi1 /dev/sdg1 /dev/sdj1 /dev/sdh1 > >>>> xs22:/☠ cd /proc/10332 > >>>> xs22:/proc/10332☠ cat stack > >>>> xs22:/proc/10332☠ > >>> > >>> Hmm... Could you please share the strace output of "mdadm -A" command? Like > >>> > >>> strace mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/xxx ... > >>> > >>> Thanks, > >>> Song > >> > >> I did one more thing - ran cat on that stack in another terminal every > >> 0.1 seconds, this is what was found: > >> > >> cat: /proc//stack: No such file or directory > >> cat: /proc//stack: No such file or directory > >> cat: /proc//stack: No such file or directory > >> [<0>] submit_bio_wait+0x5b/0x80 > >> [<0>] r5l_recovery_read_page+0x1b6/0x200 [raid456] > >> [<0>] r5l_recovery_verify_data_checksum+0x19/0x70 [raid456] > >> [<0>] r5l_start+0x99e/0xe70 [raid456] > >> [<0>] md_start.part.48+0x2e/0x50 [md_mod] > >> [<0>] do_md_run+0x64/0x100 [md_mod] > >> [<0>] md_ioctl+0xe7d/0x17d0 [md_mod] > >> [<0>] blkdev_ioctl+0x4d0/0xa00 > >> [<0>] block_ioctl+0x39/0x40 > >> [<0>] do_vfs_ioctl+0xa4/0x630 > >> [<0>] ksys_ioctl+0x60/0x90 > >> [<0>] __x64_sys_ioctl+0x16/0x20 > >> [<0>] do_syscall_64+0x52/0x160 > >> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >> cat: /proc//stack: No such file or directory > >> [<0>] submit_bio_wait+0x5b/0x80 > >> [<0>] r5l_recovery_read_page+0x1b6/0x200 [raid456] > >> [<0>] r5l_recovery_verify_data_checksum+0x19/0x70 [raid456] > >> [<0>] r5l_start+0x99e/0xe70 [raid456] > >> [<0>] md_start.part.48+0x2e/0x50 [md_mod] > >> [<0>] do_md_run+0x64/0x100 [md_mod] > >> [<0>] md_ioctl+0xe7d/0x17d0 [md_mod] > >> [<0>] blkdev_ioctl+0x4d0/0xa00 > >> [<0>] block_ioctl+0x39/0x40 > >> [<0>] do_vfs_ioctl+0xa4/0x630 > >> [<0>] ksys_ioctl+0x60/0x90 > >> [<0>] __x64_sys_ioctl+0x16/0x20 > >> [<0>] do_syscall_64+0x52/0x160 > >> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >> cat: /proc//stack: No such file or directory > >> cat: /proc//stack: No such file or directory > >> cat: /proc//stack: No such file or directory > >> cat: /proc//stack: No such file or directory > >> > >> > >> Strace output attached. > > > > So mdadm hangs at "ioctl(4, RUN_ARRAY", right? > > > > Yes, it's consistent across attempts. > > > Does /proc/<pid of hanging mdadm>/stack always shows the same output above? > > Having peeked closer - it actually tries to do something somehow - > attached the stack file dumped every ~0.01 seconds. > > After a while I can hit non-empty cat /proc/<pid>/stack every 5-15 > seconds this way. > > At one point "hanged" mdadm has been running this way for 24hr+ without > any practical effect in the end. > Thanks for these information. It looks like the kernel had some issue in recovery. Could you please try the following: 1. check dmesg, we should see one of the following at some point: pr_info("md/raid:%s: starting from clean shutdown\n", or pr_info("md/raid:%s: recovering %d data-only stripes and %d data-parity stripes\n", 2. try use bcc/bpftrace to trace r5l_recovery_read_page(), specifically, the 4th argument. With bcc, it is something like: trace.py -M 100 'r5l_recovery_read_page() "%llx", arg4' -M above limits the number of outputs to 100 lines. We may need to increase the limit or remove the constraint. If the system doesn't have bcc/bpftrace. You can also try with kprobe. Thanks, Song ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-18 23:55 ` Song Liu @ 2020-05-20 12:08 ` Michal Soltys 2020-05-25 16:23 ` Michal Soltys 1 sibling, 0 replies; 46+ messages in thread From: Michal Soltys @ 2020-05-20 12:08 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 5/19/20 1:55 AM, Song Liu wrote: > > Thanks for these information. > > It looks like the kernel had some issue in recovery. Could you please > try the following: > > 1. check dmesg, we should see one of the following at some point: > pr_info("md/raid:%s: starting from clean shutdown\n", > or > pr_info("md/raid:%s: recovering %d data-only stripes and %d > data-parity stripes\n", > > 2. try use bcc/bpftrace to trace r5l_recovery_read_page(), > specifically, the 4th argument. > With bcc, it is something like: > > trace.py -M 100 'r5l_recovery_read_page() "%llx", arg4' > > -M above limits the number of outputs to 100 lines. We may need to > increase the limit or > remove the constraint. If the system doesn't have bcc/bpftrace. You > can also try with > kprobe. > > Thanks, > Song > Will do the above tests in coming days. As for dmesg, it was always clean after mdadm's invocation (that's why I never noticed it), but today I found on the machine that the following popped around *19* hours after mdadm assembly was started: Note - since the last dmesg entry yesterday, nothing new showed up in dmesg, and the mdadm is still running. ps a -o command,lstart | grep mdadm strace -f -o mdadm.strace.l Mon May 18 13:25:45 2020 mdadm -A /dev/md/r5_big /de Mon May 18 13:25:45 2020 [May19 09:53] md/raid:md124: recovering 24667 data-only stripes and 50212 data-parity stripes [May19 09:57] INFO: task mdadm:4698 blocked for more than 120 seconds. [ +0.007114] Not tainted 5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1 [ +0.007881] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.008754] mdadm D 0 4698 4695 0x00004003 [ +0.006128] Call Trace: [ +0.002599] ? __schedule+0x2e6/0x6f0 [ +0.004222] schedule+0x2f/0xa0 [ +0.003519] r5l_start.cold.43+0x544/0x549 [raid456] [ +0.005560] ? finish_wait+0x80/0x80 [ +0.003901] md_start.part.48+0x2e/0x50 [md_mod] [ +0.005175] do_md_run+0x64/0x100 [md_mod] [ +0.004582] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.004675] blkdev_ioctl+0x4d0/0xa00 [ +0.004104] block_ioctl+0x39/0x40 [ +0.003806] do_vfs_ioctl+0xa4/0x630 [ +0.004004] ksys_ioctl+0x60/0x90 [ +0.003697] __x64_sys_ioctl+0x16/0x20 [ +0.003966] do_syscall_64+0x52/0x160 [ +0.004100] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.005658] RIP: 0033:0x7efcb9969427 [ +0.004118] Code: Bad RIP value. [ +0.003611] RSP: 002b:00007ffd902f2508 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.008461] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007efcb9969427 [ +0.007964] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.007965] RBP: 00007ffd902f2680 R08: 0000561686dc6f40 R09: 00005616858b9b60 [ +0.007968] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.008034] R13: 0000000000000000 R14: 00007ffd902f2e30 R15: 0000000000000004 [May19 09:59] INFO: task mdadm:4698 blocked for more than 241 seconds. [ +0.007113] Not tainted 5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1 [ +0.031011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.032417] mdadm D 0 4698 4695 0x00004003 [ +0.030883] Call Trace: [ +0.027037] ? __schedule+0x2e6/0x6f0 [ +0.027109] schedule+0x2f/0xa0 [ +0.026651] r5l_start.cold.43+0x544/0x549 [raid456] [ +0.028842] ? finish_wait+0x80/0x80 [ +0.026646] md_start.part.48+0x2e/0x50 [md_mod] [ +0.027846] do_md_run+0x64/0x100 [md_mod] [ +0.027423] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.027029] blkdev_ioctl+0x4d0/0xa00 [ +0.027716] block_ioctl+0x39/0x40 [ +0.028195] do_vfs_ioctl+0xa4/0x630 [ +0.028868] ksys_ioctl+0x60/0x90 [ +0.028845] __x64_sys_ioctl+0x16/0x20 [ +0.030159] do_syscall_64+0x52/0x160 [ +0.029944] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.029988] RIP: 0033:0x7efcb9969427 [ +0.027728] Code: Bad RIP value. [ +0.025732] RSP: 002b:00007ffd902f2508 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.028386] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007efcb9969427 [ +0.028462] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028854] RBP: 00007ffd902f2680 R08: 0000561686dc6f40 R09: 00005616858b9b60 [ +0.027692] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028838] R13: 0000000000000000 R14: 00007ffd902f2e30 R15: 0000000000000004 [May19 10:01] INFO: task mdadm:4698 blocked for more than 362 seconds. [ +0.027842] Not tainted 5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1 [ +0.028293] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.029933] mdadm D 0 4698 4695 0x00004003 [ +0.027836] Call Trace: [ +0.025804] ? __schedule+0x2e6/0x6f0 [ +0.025857] schedule+0x2f/0xa0 [ +0.025504] r5l_start.cold.43+0x544/0x549 [raid456] [ +0.026469] ? finish_wait+0x80/0x80 [ +0.026253] md_start.part.48+0x2e/0x50 [md_mod] [ +0.026693] do_md_run+0x64/0x100 [md_mod] [ +0.027043] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.027063] blkdev_ioctl+0x4d0/0xa00 [ +0.025390] block_ioctl+0x39/0x40 [ +0.026334] do_vfs_ioctl+0xa4/0x630 [ +0.025508] ksys_ioctl+0x60/0x90 [ +0.024969] __x64_sys_ioctl+0x16/0x20 [ +0.024461] do_syscall_64+0x52/0x160 [ +0.025266] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.025898] RIP: 0033:0x7efcb9969427 [ +0.025336] Code: Bad RIP value. [ +0.023623] RSP: 002b:00007ffd902f2508 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.029437] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007efcb9969427 [ +0.028688] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028993] RBP: 00007ffd902f2680 R08: 0000561686dc6f40 R09: 00005616858b9b60 [ +0.028810] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028715] R13: 0000000000000000 R14: 00007ffd902f2e30 R15: 0000000000000004 [May19 10:03] INFO: task mdadm:4698 blocked for more than 483 seconds. [ +0.028312] Not tainted 5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1 [ +0.029922] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.029811] mdadm D 0 4698 4695 0x00004003 [ +0.028094] Call Trace: [ +0.024963] ? __schedule+0x2e6/0x6f0 [ +0.025823] schedule+0x2f/0xa0 [ +0.029291] r5l_start.cold.43+0x544/0x549 [raid456] [ +0.031421] ? finish_wait+0x80/0x80 [ +0.028282] md_start.part.48+0x2e/0x50 [md_mod] [ +0.026902] do_md_run+0x64/0x100 [md_mod] [ +0.026903] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.026249] blkdev_ioctl+0x4d0/0xa00 [ +0.026441] block_ioctl+0x39/0x40 [ +0.026720] do_vfs_ioctl+0xa4/0x630 [ +0.025734] ksys_ioctl+0x60/0x90 [ +0.025040] __x64_sys_ioctl+0x16/0x20 [ +0.025255] do_syscall_64+0x52/0x160 [ +0.024974] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.026676] RIP: 0033:0x7efcb9969427 [ +0.025491] Code: Bad RIP value. [ +0.023661] RSP: 002b:00007ffd902f2508 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.029253] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007efcb9969427 [ +0.027865] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.029299] RBP: 00007ffd902f2680 R08: 0000561686dc6f40 R09: 00005616858b9b60 [ +0.028662] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028661] R13: 0000000000000000 R14: 00007ffd902f2e30 R15: 0000000000000004 [May19 10:05] INFO: task mdadm:4698 blocked for more than 604 seconds. [ +0.028120] Not tainted 5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1 [ +0.028921] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.030050] mdadm D 0 4698 4695 0x00004003 [ +0.028201] Call Trace: [ +0.025739] ? __schedule+0x2e6/0x6f0 [ +0.025130] schedule+0x2f/0xa0 [ +0.025862] r5l_start.cold.43+0x544/0x549 [raid456] [ +0.026863] ? finish_wait+0x80/0x80 [ +0.026362] md_start.part.48+0x2e/0x50 [md_mod] [ +0.028033] do_md_run+0x64/0x100 [md_mod] [ +0.026302] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.027130] blkdev_ioctl+0x4d0/0xa00 [ +0.025276] block_ioctl+0x39/0x40 [ +0.025804] do_vfs_ioctl+0xa4/0x630 [ +0.025543] ksys_ioctl+0x60/0x90 [ +0.025133] __x64_sys_ioctl+0x16/0x20 [ +0.025289] do_syscall_64+0x52/0x160 [ +0.025019] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.027423] RIP: 0033:0x7efcb9969427 [ +0.025241] Code: Bad RIP value. [ +0.024188] RSP: 002b:00007ffd902f2508 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.029330] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007efcb9969427 [ +0.028777] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.029321] RBP: 00007ffd902f2680 R08: 0000561686dc6f40 R09: 00005616858b9b60 [ +0.030312] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028756] R13: 0000000000000000 R14: 00007ffd902f2e30 R15: 0000000000000004 [May19 10:07] INFO: task mdadm:4698 blocked for more than 724 seconds. [ +0.028161] Not tainted 5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1 [ +0.029025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.029438] mdadm D 0 4698 4695 0x00004003 [ +0.027906] Call Trace: [ +0.024749] ? __schedule+0x2e6/0x6f0 [ +0.025400] schedule+0x2f/0xa0 [ +0.027335] r5l_start.cold.43+0x544/0x549 [raid456] [ +0.030223] ? finish_wait+0x80/0x80 [ +0.028959] md_start.part.48+0x2e/0x50 [md_mod] [ +0.029618] do_md_run+0x64/0x100 [md_mod] [ +0.026927] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.026187] blkdev_ioctl+0x4d0/0xa00 [ +0.026722] block_ioctl+0x39/0x40 [ +0.026070] do_vfs_ioctl+0xa4/0x630 [ +0.026268] ksys_ioctl+0x60/0x90 [ +0.025397] __x64_sys_ioctl+0x16/0x20 [ +0.025492] do_syscall_64+0x52/0x160 [ +0.025293] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.026295] RIP: 0033:0x7efcb9969427 [ +0.025152] Code: Bad RIP value. [ +0.023763] RSP: 002b:00007ffd902f2508 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.029591] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007efcb9969427 [ +0.027850] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028603] RBP: 00007ffd902f2680 R08: 0000561686dc6f40 R09: 00005616858b9b60 [ +0.028714] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028870] R13: 0000000000000000 R14: 00007ffd902f2e30 R15: 0000000000000004 [May19 10:09] INFO: task mdadm:4698 blocked for more than 845 seconds. [ +0.028077] Not tainted 5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1 [ +0.029226] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.029321] mdadm D 0 4698 4695 0x00004003 [May19 10:10] Call Trace: [ +0.025422] ? __schedule+0x2e6/0x6f0 [ +0.025490] schedule+0x2f/0xa0 [ +0.026722] r5l_start.cold.43+0x544/0x549 [raid456] [ +0.026858] ? finish_wait+0x80/0x80 [ +0.026399] md_start.part.48+0x2e/0x50 [md_mod] [ +0.027444] do_md_run+0x64/0x100 [md_mod] [ +0.026228] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.027008] blkdev_ioctl+0x4d0/0xa00 [ +0.025656] block_ioctl+0x39/0x40 [ +0.026068] do_vfs_ioctl+0xa4/0x630 [ +0.025899] ksys_ioctl+0x60/0x90 [ +0.025539] __x64_sys_ioctl+0x16/0x20 [ +0.025885] do_syscall_64+0x52/0x160 [ +0.025075] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.026798] RIP: 0033:0x7efcb9969427 [ +0.024981] Code: Bad RIP value. [ +0.024604] RSP: 002b:00007ffd902f2508 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.028628] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007efcb9969427 [ +0.028698] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028732] RBP: 00007ffd902f2680 R08: 0000561686dc6f40 R09: 00005616858b9b60 [ +0.027649] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.029800] R13: 0000000000000000 R14: 00007ffd902f2e30 R15: 0000000000000004 [May19 10:12] INFO: task mdadm:4698 blocked for more than 966 seconds. [ +0.030285] Not tainted 5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1 [ +0.032412] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.033870] mdadm D 0 4698 4695 0x00004003 [ +0.031620] Call Trace: [ +0.028343] ? __schedule+0x2e6/0x6f0 [ +0.028456] schedule+0x2f/0xa0 [ +0.025514] r5l_start.cold.43+0x544/0x549 [raid456] [ +0.028027] ? finish_wait+0x80/0x80 [ +0.025278] md_start.part.48+0x2e/0x50 [md_mod] [ +0.028173] do_md_run+0x64/0x100 [md_mod] [ +0.027243] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.026750] blkdev_ioctl+0x4d0/0xa00 [ +0.026450] block_ioctl+0x39/0x40 [ +0.025187] do_vfs_ioctl+0xa4/0x630 [ +0.025689] ksys_ioctl+0x60/0x90 [ +0.024076] __x64_sys_ioctl+0x16/0x20 [ +0.025253] do_syscall_64+0x52/0x160 [ +0.024310] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.027527] RIP: 0033:0x7efcb9969427 [ +0.024193] Code: Bad RIP value. [ +0.024483] RSP: 002b:00007ffd902f2508 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.028585] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007efcb9969427 [ +0.029009] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028403] RBP: 00007ffd902f2680 R08: 0000561686dc6f40 R09: 00005616858b9b60 [ +0.027510] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028629] R13: 0000000000000000 R14: 00007ffd902f2e30 R15: 0000000000000004 [May19 10:14] INFO: task mdadm:4698 blocked for more than 1087 seconds. [ +0.028117] Not tainted 5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1 [ +0.029008] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.030891] mdadm D 0 4698 4695 0x00004003 [ +0.027018] Call Trace: [ +0.024698] ? __schedule+0x2e6/0x6f0 [ +0.025217] schedule+0x2f/0xa0 [ +0.025613] r5l_start.cold.43+0x544/0x549 [raid456] [ +0.027065] ? finish_wait+0x80/0x80 [ +0.026013] md_start.part.48+0x2e/0x50 [md_mod] [ +0.027664] do_md_run+0x64/0x100 [md_mod] [ +0.026160] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.027902] blkdev_ioctl+0x4d0/0xa00 [ +0.026085] block_ioctl+0x39/0x40 [ +0.026057] do_vfs_ioctl+0xa4/0x630 [ +0.025886] ksys_ioctl+0x60/0x90 [ +0.024539] __x64_sys_ioctl+0x16/0x20 [ +0.025231] do_syscall_64+0x52/0x160 [ +0.025571] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.026934] RIP: 0033:0x7efcb9969427 [ +0.024585] Code: Bad RIP value. [ +0.024635] RSP: 002b:00007ffd902f2508 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.029237] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007efcb9969427 [ +0.031792] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.031758] RBP: 00007ffd902f2680 R08: 0000561686dc6f40 R09: 00005616858b9b60 [ +0.031567] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028862] R13: 0000000000000000 R14: 00007ffd902f2e30 R15: 0000000000000004 [May19 10:16] INFO: task mdadm:4698 blocked for more than 1208 seconds. [ +0.028273] Not tainted 5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1 [ +0.029463] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.031789] mdadm D 0 4698 4695 0x00004003 [ +0.028022] Call Trace: [ +0.024606] ? __schedule+0x2e6/0x6f0 [ +0.026123] schedule+0x2f/0xa0 [ +0.024892] r5l_start.cold.43+0x544/0x549 [raid456] [ +0.027620] ? finish_wait+0x80/0x80 [ +0.025492] md_start.part.48+0x2e/0x50 [md_mod] [ +0.027706] do_md_run+0x64/0x100 [md_mod] [ +0.026976] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.026705] blkdev_ioctl+0x4d0/0xa00 [ +0.027531] block_ioctl+0x39/0x40 [ +0.024963] do_vfs_ioctl+0xa4/0x630 [ +0.025568] ksys_ioctl+0x60/0x90 [ +0.024137] __x64_sys_ioctl+0x16/0x20 [ +0.025538] do_syscall_64+0x52/0x160 [ +0.024200] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.026633] RIP: 0033:0x7efcb9969427 [ +0.024619] Code: Bad RIP value. [ +0.024698] RSP: 002b:00007ffd902f2508 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.029381] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007efcb9969427 [ +0.028794] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028724] RBP: 00007ffd902f2680 R08: 0000561686dc6f40 R09: 00005616858b9b60 [ +0.028828] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028030] R13: 0000000000000000 R14: 00007ffd902f2e30 R15: 0000000000000004 ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-18 23:55 ` Song Liu 2020-05-20 12:08 ` Michal Soltys @ 2020-05-25 16:23 ` Michal Soltys 2020-05-26 23:37 ` Song Liu 1 sibling, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-05-25 16:23 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 5/19/20 1:55 AM, Song Liu wrote: > > 2. try use bcc/bpftrace to trace r5l_recovery_read_page(), > specifically, the 4th argument. > With bcc, it is something like: > > trace.py -M 100 'r5l_recovery_read_page() "%llx", arg4' > > -M above limits the number of outputs to 100 lines. We may need to > increase the limit or > remove the constraint. If the system doesn't have bcc/bpftrace. You > can also try with > kprobe. > Trace keeps outputting the following data (with steadily growing 4th argument): PID TID COMM FUNC - 3456 3456 mdadm r5l_recovery_read_page 98f65b8 3456 3456 mdadm r5l_recovery_read_page 98f65c0 3456 3456 mdadm r5l_recovery_read_page 98f65c8 3456 3456 mdadm r5l_recovery_read_page 98f65d0 3456 3456 mdadm r5l_recovery_read_page 98f65d8 3456 3456 mdadm r5l_recovery_read_page 98f65e0 3456 3456 mdadm r5l_recovery_read_page 98f65e8 3456 3456 mdadm r5l_recovery_read_page 98f65f0 3456 3456 mdadm r5l_recovery_read_page 98f65f8 3456 3456 mdadm r5l_recovery_read_page 98f6600 3456 3456 mdadm r5l_recovery_read_page 98f65c0 3456 3456 mdadm r5l_recovery_read_page 98f65c8 3456 3456 mdadm r5l_recovery_read_page 98f65d0 3456 3456 mdadm r5l_recovery_read_page 98f65d8 3456 3456 mdadm r5l_recovery_read_page 98f65e0 3456 3456 mdadm r5l_recovery_read_page 98f65e8 3456 3456 mdadm r5l_recovery_read_page 98f65f0 3456 3456 mdadm r5l_recovery_read_page 98f65f8 3456 3456 mdadm r5l_recovery_read_page 98f6600 3456 3456 mdadm r5l_recovery_read_page 98f6608 3456 3456 mdadm r5l_recovery_read_page 98f6610 ... a few minutes later ... PID TID COMM FUNC - 3456 3456 mdadm r5l_recovery_read_page 9b69b60 3456 3456 mdadm r5l_recovery_read_page 9b69b68 3456 3456 mdadm r5l_recovery_read_page 9b69b70 3456 3456 mdadm r5l_recovery_read_page 9b69b78 3456 3456 mdadm r5l_recovery_read_page 9b69b80 3456 3456 mdadm r5l_recovery_read_page 9b69b88 3456 3456 mdadm r5l_recovery_read_page 9b69b90 3456 3456 mdadm r5l_recovery_read_page 9b69b98 3456 3456 mdadm r5l_recovery_read_page 9b69ba0 3456 3456 mdadm r5l_recovery_read_page 9b69ba8 3456 3456 mdadm r5l_recovery_read_page 9b69bb0 3456 3456 mdadm r5l_recovery_read_page 9b69bb8 3456 3456 mdadm r5l_recovery_read_page 9b69bc0 3456 3456 mdadm r5l_recovery_read_page 9b69bc8 3456 3456 mdadm r5l_recovery_read_page 9b69b90 3456 3456 mdadm r5l_recovery_read_page 9b69b98 3456 3456 mdadm r5l_recovery_read_page 9b69ba0 3456 3456 mdadm r5l_recovery_read_page 9b69ba8 3456 3456 mdadm r5l_recovery_read_page 9b69bb0 3456 3456 mdadm r5l_recovery_read_page 9b69bb8 ... and so on ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-25 16:23 ` Michal Soltys @ 2020-05-26 23:37 ` Song Liu 2020-05-27 13:36 ` Michal Soltys 0 siblings, 1 reply; 46+ messages in thread From: Song Liu @ 2020-05-26 23:37 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid On Mon, May 25, 2020 at 9:23 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 5/19/20 1:55 AM, Song Liu wrote: > > > > 2. try use bcc/bpftrace to trace r5l_recovery_read_page(), > > specifically, the 4th argument. > > With bcc, it is something like: > > > > trace.py -M 100 'r5l_recovery_read_page() "%llx", arg4' > > > > -M above limits the number of outputs to 100 lines. We may need to > > increase the limit or > > remove the constraint. If the system doesn't have bcc/bpftrace. You > > can also try with > > kprobe. > > > > > Trace keeps outputting the following data (with steadily growing 4th > argument): > > PID TID COMM FUNC - > 3456 3456 mdadm r5l_recovery_read_page 98f65b8 > 3456 3456 mdadm r5l_recovery_read_page 98f65c0 > 3456 3456 mdadm r5l_recovery_read_page 98f65c8 > 3456 3456 mdadm r5l_recovery_read_page 98f65d0 > 3456 3456 mdadm r5l_recovery_read_page 98f65d8 > 3456 3456 mdadm r5l_recovery_read_page 98f65e0 > 3456 3456 mdadm r5l_recovery_read_page 98f65e8 > 3456 3456 mdadm r5l_recovery_read_page 98f65f0 > 3456 3456 mdadm r5l_recovery_read_page 98f65f8 > 3456 3456 mdadm r5l_recovery_read_page 98f6600 > 3456 3456 mdadm r5l_recovery_read_page 98f65c0 > 3456 3456 mdadm r5l_recovery_read_page 98f65c8 > 3456 3456 mdadm r5l_recovery_read_page 98f65d0 > 3456 3456 mdadm r5l_recovery_read_page 98f65d8 > 3456 3456 mdadm r5l_recovery_read_page 98f65e0 > 3456 3456 mdadm r5l_recovery_read_page 98f65e8 > 3456 3456 mdadm r5l_recovery_read_page 98f65f0 > 3456 3456 mdadm r5l_recovery_read_page 98f65f8 > 3456 3456 mdadm r5l_recovery_read_page 98f6600 > 3456 3456 mdadm r5l_recovery_read_page 98f6608 > 3456 3456 mdadm r5l_recovery_read_page 98f6610 > > ... a few minutes later ... > > PID TID COMM FUNC - > 3456 3456 mdadm r5l_recovery_read_page 9b69b60 > 3456 3456 mdadm r5l_recovery_read_page 9b69b68 > > 3456 3456 mdadm r5l_recovery_read_page 9b69b70 > 3456 3456 mdadm r5l_recovery_read_page 9b69b78 > > 3456 3456 mdadm r5l_recovery_read_page 9b69b80 > 3456 3456 mdadm r5l_recovery_read_page 9b69b88 > 3456 3456 mdadm r5l_recovery_read_page 9b69b90 > > > 3456 3456 mdadm r5l_recovery_read_page 9b69b98 > 3456 3456 mdadm r5l_recovery_read_page 9b69ba0 > 3456 3456 mdadm r5l_recovery_read_page 9b69ba8 > > > 3456 3456 mdadm r5l_recovery_read_page 9b69bb0 > > > 3456 3456 mdadm r5l_recovery_read_page 9b69bb8 > 3456 3456 mdadm r5l_recovery_read_page 9b69bc0 > > 3456 3456 mdadm r5l_recovery_read_page 9b69bc8 > 3456 3456 mdadm r5l_recovery_read_page 9b69b90 > > > 3456 3456 mdadm r5l_recovery_read_page 9b69b98 > 3456 3456 mdadm r5l_recovery_read_page 9b69ba0 > 3456 3456 mdadm r5l_recovery_read_page 9b69ba8 > 3456 3456 mdadm r5l_recovery_read_page 9b69bb0 > 3456 3456 mdadm r5l_recovery_read_page 9b69bb8 > > ... and so on Looks like the kernel has processed about 1.2GB of journal (9b69bb8 - 98f65b8 sectors). And the limit is min(1/4 disk size, 10GB). I just checked the code, it should stop once it hits checksum mismatch. Does it keep going after half hour or so? Thanks, Song ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-26 23:37 ` Song Liu @ 2020-05-27 13:36 ` Michal Soltys 2020-05-28 10:18 ` Michal Soltys 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-05-27 13:36 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 5/27/20 1:37 AM, Song Liu wrote: > On Mon, May 25, 2020 at 9:23 AM Michal Soltys <msoltyspl@yandex.pl> wrote: >> >> On 5/19/20 1:55 AM, Song Liu wrote: >>> >>> 2. try use bcc/bpftrace to trace r5l_recovery_read_page(), >>> specifically, the 4th argument. >>> With bcc, it is something like: >>> >>> trace.py -M 100 'r5l_recovery_read_page() "%llx", arg4' >>> >>> -M above limits the number of outputs to 100 lines. We may need to >>> increase the limit or >>> remove the constraint. If the system doesn't have bcc/bpftrace. You >>> can also try with >>> kprobe. >>> >> >> > > Looks like the kernel has processed about 1.2GB of journal (9b69bb8 - > 98f65b8 sectors). > And the limit is min(1/4 disk size, 10GB). > > I just checked the code, it should stop once it hits checksum > mismatch. Does it keep going > after half hour or so? It keeps going so far (mdadm started few hours ago): xs22:/home/msl☠ head -n 4 trace.out TIME PID TID COMM FUNC - 12:58:21 40739 40739 mdadm r5l_recovery_read_page 98f65b8 12:58:21 40739 40739 mdadm r5l_recovery_read_page 98f65c0 12:58:21 40739 40739 mdadm r5l_recovery_read_page 98f65c8 xs22:/home/msl☠ tail -n 4 trace.out 15:34:50 40739 40739 mdadm r5l_recovery_read_page bc04e88 15:34:50 40739 40739 mdadm r5l_recovery_read_page bc04e90 15:34:50 40739 40739 mdadm r5l_recovery_read_page bc04e98 ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-27 13:36 ` Michal Soltys @ 2020-05-28 10:18 ` Michal Soltys 2020-05-28 15:06 ` Song Liu 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-05-28 10:18 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 5/27/20 3:36 PM, Michal Soltys wrote: > On 5/27/20 1:37 AM, Song Liu wrote: >> >> Looks like the kernel has processed about 1.2GB of journal (9b69bb8 - >> 98f65b8 sectors). >> And the limit is min(1/4 disk size, 10GB). >> >> I just checked the code, it should stop once it hits checksum >> mismatch. Does it keep going >> after half hour or so? > > It keeps going so far (mdadm started few hours ago): > > xs22:/home/msl☠ head -n 4 trace.out > TIME PID TID COMM FUNC - > 12:58:21 40739 40739 mdadm r5l_recovery_read_page 98f65b8 > 12:58:21 40739 40739 mdadm r5l_recovery_read_page 98f65c0 > 12:58:21 40739 40739 mdadm r5l_recovery_read_page 98f65c8 > xs22:/home/msl☠ tail -n 4 trace.out > 15:34:50 40739 40739 mdadm r5l_recovery_read_page bc04e88 > 15:34:50 40739 40739 mdadm r5l_recovery_read_page bc04e90 > 15:34:50 40739 40739 mdadm r5l_recovery_read_page bc04e98 > To expand on the above (mdadm was started yesterday): - it went on for ~19+ hours - like previously - the last piece in trace output: xs22:/home/msl☠ tail -n 4 trace.out ; echo 09:08:12 40739 40739 mdadm r5l_recovery_read_page 1852d8b0 09:08:12 40739 40739 mdadm r5l_recovery_read_page 1852d8b8 09:08:12 40739 40739 mdadm r5l_recovery_read_page 1852d8c0 09:08:12 40739 40739 mdadm r5l_recovery_read_page 1852d8c8 So it went through ~120gb of journal (the journal device is 256gb). Non-assembling raid issue aside (assuming it's unrelated) - why does it take such enormous amount of time ? That's average of like 1.7mb/s Around 09:08 today, the recovery line popped in the dmesg as previously: [May27 12:58] md: md124 stopped. [ +0.006779] md/raid:md124: device sdf1 operational as raid disk 0 [ +0.006929] md/raid:md124: device sdj1 operational as raid disk 3 [ +0.006903] md/raid:md124: device sdi1 operational as raid disk 2 [ +0.006914] md/raid:md124: device sdh1 operational as raid disk 1 [ +0.007426] md/raid:md124: raid level 5 active with 4 out of 4 devices, algorithm 2 [May28 09:08] md/raid:md124: recovering 24667 data-only stripes and 50212 data-parity stripes [May28 09:11] INFO: task mdadm:40739 blocked for more than 120 seconds. [ +0.007304] Not tainted 5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1 [ +0.008008] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.008858] mdadm D 0 40739 40073 0x00004002 [ +0.006222] Call Trace: [ +0.002870] ? __schedule+0x2e6/0x6f0 [ +0.004194] schedule+0x2f/0xa0 [ +0.003631] r5l_start.cold.43+0x544/0x549 [raid456] [ +0.005669] ? finish_wait+0x80/0x80 [ +0.004108] md_start.part.48+0x2e/0x50 [md_mod] [ +0.005271] do_md_run+0x64/0x100 [md_mod] [ +0.004696] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.004783] blkdev_ioctl+0x4d0/0xa00 [ +0.004211] block_ioctl+0x39/0x40 [ +0.003896] do_vfs_ioctl+0xa4/0x630 [ +0.004119] ksys_ioctl+0x60/0x90 [ +0.003824] __x64_sys_ioctl+0x16/0x20 [ +0.004288] do_syscall_64+0x52/0x160 [ +0.004202] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.005765] RIP: 0033:0x7f5e2e777427 [ +0.004111] Code: Bad RIP value. [ +0.003708] RSP: 002b:00007ffd39152d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.008562] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f5e2e777427 [ +0.008083] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.008081] RBP: 00007ffd39152ef0 R08: 000055641fb6ff40 R09: 000055641de6ab60 [ +0.008301] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000004 [ +0.008085] R13: 0000000000000000 R14: 00007ffd391536a0 R15: 0000000000000004 [May28 09:13] INFO: task mdadm:40739 blocked for more than 241 seconds. The last dmesg call trace is from few hours ago: [May28 09:29] INFO: task mdadm:40739 blocked for more than 1208 seconds. [ +0.030944] Not tainted 5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1 [ +0.031885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.031088] mdadm D 0 40739 40073 0x00004002 [ +0.028702] Call Trace: [ +0.024943] ? __schedule+0x2e6/0x6f0 [ +0.025448] schedule+0x2f/0xa0 [ +0.025868] r5l_start.cold.43+0x544/0x549 [raid456] [ +0.027650] ? finish_wait+0x80/0x80 [ +0.026630] md_start.part.48+0x2e/0x50 [md_mod] [ +0.027959] do_md_run+0x64/0x100 [md_mod] [ +0.026500] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.027326] blkdev_ioctl+0x4d0/0xa00 [ +0.025951] block_ioctl+0x39/0x40 [ +0.026311] do_vfs_ioctl+0xa4/0x630 [ +0.026721] ksys_ioctl+0x60/0x90 [ +0.025582] __x64_sys_ioctl+0x16/0x20 [ +0.025665] do_syscall_64+0x52/0x160 [ +0.024993] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.027166] RIP: 0033:0x7f5e2e777427 [ +0.024500] Code: Bad RIP value. [ +0.024675] RSP: 002b:00007ffd39152d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.028617] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f5e2e777427 [ +0.029175] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028890] RBP: 00007ffd39152ef0 R08: 000055641fb6ff40 R09: 000055641de6ab60 [ +0.027863] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000004 [ +0.028962] R13: 0000000000000000 R14: 00007ffd391536a0 R15: 0000000000000004 ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-28 10:18 ` Michal Soltys @ 2020-05-28 15:06 ` Song Liu 2020-05-28 15:36 ` Michal Soltys 2020-05-29 12:09 ` Michal Soltys 0 siblings, 2 replies; 46+ messages in thread From: Song Liu @ 2020-05-28 15:06 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid On Thu, May 28, 2020 at 3:18 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 5/27/20 3:36 PM, Michal Soltys wrote: > > On 5/27/20 1:37 AM, Song Liu wrote: > >> [...] > To expand on the above (mdadm was started yesterday): > > - it went on for ~19+ hours - like previously > - the last piece in trace output: > > xs22:/home/msl☠ tail -n 4 trace.out ; echo > 09:08:12 40739 40739 mdadm r5l_recovery_read_page 1852d8b0 > 09:08:12 40739 40739 mdadm r5l_recovery_read_page 1852d8b8 > 09:08:12 40739 40739 mdadm r5l_recovery_read_page 1852d8c0 > 09:08:12 40739 40739 mdadm r5l_recovery_read_page 1852d8c8 > > So it went through ~120gb of journal (the journal device is 256gb). This (processing more than 10GB) should not happen. So there must be something wrong... > Non-assembling raid issue aside (assuming it's unrelated) - why does it > take such enormous amount of time ? That's average of like 1.7mb/s This is slow because we are processing 4kB at a time. But it is slower than my expectation. What are the speeds measured at boot? Something like the following in dmesg: [ 3.851064] raid6: avx2x4 gen() 17328 MB/s [ 3.877404] raid6: avx2x4 xor() 11033 MB/s [ 3.903410] raid6: avx2x2 gen() 14710 MB/s [ 3.929410] raid6: avx2x2 xor() 8921 MB/s [ 3.955408] raid6: avx2x1 gen() 12687 MB/s [ 3.981405] raid6: avx2x1 xor() 8414 MB/s [ 4.007409] raid6: sse2x4 gen() 9105 MB/s [ 4.033403] raid6: sse2x4 xor() 5724 MB/s [ 4.059408] raid6: sse2x2 gen() 7570 MB/s [ 4.085410] raid6: sse2x2 xor() 4880 MB/s [ 4.111409] raid6: sse2x1 gen() 6484 MB/s [ 4.137402] raid6: sse2x1 xor() 4412 MB/s For the next step, could you please try the following the beginning of --assemble run? trace.py -M 10 'r::r5l_recovery_verify_data_checksum()(retval != 0)' Thanks, Song PS: Some key functions here are inlined and thus not easily traceable. We will probably need to patch the kernel for further debugging. > > Around 09:08 today, the recovery line popped in the dmesg as previously: > > [May27 12:58] md: md124 stopped. > [ +0.006779] md/raid:md124: device sdf1 operational as raid disk 0 > [ +0.006929] md/raid:md124: device sdj1 operational as raid disk 3 > [ +0.006903] md/raid:md124: device sdi1 operational as raid disk 2 > [ +0.006914] md/raid:md124: device sdh1 operational as raid disk 1 > [ +0.007426] md/raid:md124: raid level 5 active with 4 out of 4 > devices, algorithm 2 > [May28 09:08] md/raid:md124: recovering 24667 data-only stripes and > 50212 data-parity stripes ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-28 15:06 ` Song Liu @ 2020-05-28 15:36 ` Michal Soltys 2020-05-29 12:09 ` Michal Soltys 1 sibling, 0 replies; 46+ messages in thread From: Michal Soltys @ 2020-05-28 15:36 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 5/28/20 5:06 PM, Song Liu wrote: > On Thu, May 28, 2020 at 3:18 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > This is slow because we are processing 4kB at a time. But it is slower than > my expectation. What are the speeds measured at boot? Something like the > following in dmesg: > > [ 3.851064] raid6: avx2x4 gen() 17328 MB/s > [ 3.877404] raid6: avx2x4 xor() 11033 MB/s > [ 3.903410] raid6: avx2x2 gen() 14710 MB/s > [ 3.929410] raid6: avx2x2 xor() 8921 MB/s > [ 3.955408] raid6: avx2x1 gen() 12687 MB/s > [ 3.981405] raid6: avx2x1 xor() 8414 MB/s > [ 4.007409] raid6: sse2x4 gen() 9105 MB/s > [ 4.033403] raid6: sse2x4 xor() 5724 MB/s > [ 4.059408] raid6: sse2x2 gen() 7570 MB/s > [ 4.085410] raid6: sse2x2 xor() 4880 MB/s > [ 4.111409] raid6: sse2x1 gen() 6484 MB/s > [ 4.137402] raid6: sse2x1 xor() 4412 MB/s > These are results for this machine: [ 17.707948] raid6: sse2x4 gen() 12196 MB/s [ 17.871946] raid6: sse2x4 xor() 8169 MB/s [ 18.039947] raid6: sse2x2 gen() 10035 MB/s [ 18.195950] raid6: sse2x2 xor() 6954 MB/s [ 18.343951] raid6: sse2x1 gen() 7983 MB/s [ 18.495948] raid6: sse2x1 xor() 6089 MB/s [ 18.577606] raid6: using algorithm sse2x4 gen() 12196 MB/s [ 18.675268] raid6: .... xor() 8169 MB/s, rmw enabled [ 18.774428] raid6: using ssse3x2 recovery algorithm > For the next step, could you please try the following the beginning of > --assemble > run? > > trace.py -M 10 'r::r5l_recovery_verify_data_checksum()(retval != 0)' > > Thanks, > Song Will do and report. > > PS: Some key functions here are inlined and thus not easily traceable. We will > probably need to patch the kernel for further debugging. > Won't be a problem. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-28 15:06 ` Song Liu 2020-05-28 15:36 ` Michal Soltys @ 2020-05-29 12:09 ` Michal Soltys 2020-05-29 21:03 ` Song Liu 1 sibling, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-05-29 12:09 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 5/28/20 5:06 PM, Song Liu wrote: > > For the next step, could you please try the following the beginning of > --assemble > run? > > trace.py -M 10 'r::r5l_recovery_verify_data_checksum()(retval != 0)' > > Thanks, > Song > How long should I keep this one running ? So far - after ~1hr - nothing got reported. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-29 12:09 ` Michal Soltys @ 2020-05-29 21:03 ` Song Liu 2020-05-29 21:57 ` Song Liu 0 siblings, 1 reply; 46+ messages in thread From: Song Liu @ 2020-05-29 21:03 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid On Fri, May 29, 2020 at 5:09 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 5/28/20 5:06 PM, Song Liu wrote: > > > > For the next step, could you please try the following the beginning of > > --assemble > > run? > > > > trace.py -M 10 'r::r5l_recovery_verify_data_checksum()(retval != 0)' > > > > Thanks, > > Song > > > > How long should I keep this one running ? So far - after ~1hr - nothing > got reported. We can stop it. It didn't really hit any data checksum error in early phase of the recovery. So it did found a long long journal to recover. Thanks, Song ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-29 21:03 ` Song Liu @ 2020-05-29 21:57 ` Song Liu 2020-06-03 10:12 ` Michal Soltys 0 siblings, 1 reply; 46+ messages in thread From: Song Liu @ 2020-05-29 21:57 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 747 bytes --] On Fri, May 29, 2020 at 2:03 PM Song Liu <song@kernel.org> wrote: > > On Fri, May 29, 2020 at 5:09 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > > > On 5/28/20 5:06 PM, Song Liu wrote: > > > > > > For the next step, could you please try the following the beginning of > > > --assemble > > > run? > > > > > > trace.py -M 10 'r::r5l_recovery_verify_data_checksum()(retval != 0)' > > > > > > Thanks, > > > Song > > > > > > > How long should I keep this one running ? So far - after ~1hr - nothing > > got reported. > > We can stop it. It didn't really hit any data checksum error in early phase > of the recovery. So it did found a long long journal to recover. Could you please try the assemble again with the attached patch? Thanks, Song [-- Attachment #2: 0001-debug-long-running-journal-recovery.patch --] [-- Type: application/octet-stream, Size: 3266 bytes --] From 3160568a69f36abadf43be73ecc79d37c717956b Mon Sep 17 00:00:00 2001 From: Song Liu <songliubraving@fb.com> Date: Fri, 29 May 2020 14:53:28 -0700 Subject: [PATCH] debug long running journal recovery --- drivers/md/raid5-cache.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 9b6da759dca2..bf49c4bdc4c8 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -2253,6 +2253,9 @@ static int r5c_recovery_flush_log(struct r5l_log *log, ret = r5c_recovery_analyze_meta_block(log, ctx, &ctx->cached_list); + if (ret) + pr_info("r5c_recovery_analyze_meta_block() returns %d at seq %llu\n", + ret, ctx->seq); /* * -EAGAIN means mismatch in data block, in this case, we still * try scan the next metablock @@ -2260,6 +2263,9 @@ static int r5c_recovery_flush_log(struct r5l_log *log, if (ret && ret != -EAGAIN) break; /* ret == -EINVAL or -ENOMEM */ ctx->seq++; + + if (ctx->seq % 100000 == 0) + pr_info("%s processing ctx->seq %llu\n", __func__, ctx->seq); ctx->pos = r5l_ring_add(log, ctx->pos, ctx->meta_total_blocks); } @@ -2358,6 +2364,7 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log, struct mddev *mddev = log->rdev->mddev; struct page *page; sector_t next_checkpoint = MaxSector; + int total_sh_written = 0; page = alloc_page(GFP_KERNEL); if (!page) { @@ -2418,9 +2425,13 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log, ctx->pos = write_pos; ctx->seq += 1; next_checkpoint = sh->log_start; + if (total_sh_written++ % 1000 == 0) + pr_info("%s rewritten %d stripes to the journal, current ctx->pos %llu ctx->seq %llu\n", + __func__, total_sh_written, ctx->pos, ctx->seq); } log->next_checkpoint = next_checkpoint; __free_page(page); + pr_info("%s done\n", __func__); return 0; } @@ -2431,6 +2442,7 @@ static void r5c_recovery_flush_data_only_stripes(struct r5l_log *log, struct r5conf *conf = mddev->private; struct stripe_head *sh, *next; + pr_info("%s enter\n", __func__); if (ctx->data_only_stripes == 0) return; @@ -2443,9 +2455,11 @@ static void r5c_recovery_flush_data_only_stripes(struct r5l_log *log, raid5_release_stripe(sh); } + pr_info("%s before wait_event\n", __func__); /* reuse conf->wait_for_quiescent in recovery */ wait_event(conf->wait_for_quiescent, atomic_read(&conf->active_stripes) == 0); + pr_info("%s after wait_event\n", __func__); log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH; } @@ -2463,6 +2477,8 @@ static int r5l_recovery_log(struct r5l_log *log) ctx->pos = log->last_checkpoint; ctx->seq = log->last_cp_seq; + pr_info("%s starting from ctx->pos %llu ctx->seq %llu", + __func__, ctx->pos, ctx->seq); INIT_LIST_HEAD(&ctx->cached_list); ctx->meta_page = alloc_page(GFP_KERNEL); @@ -2484,6 +2500,9 @@ static int r5l_recovery_log(struct r5l_log *log) pos = ctx->pos; ctx->seq += 10000; + pr_info("%s finished scanning with ctx->pos %llu ctx->seq %llu", + __func__, ctx->pos, ctx->seq); + if ((ctx->data_only_stripes == 0) && (ctx->data_parity_stripes == 0)) pr_info("md/raid:%s: starting from clean shutdown\n", mdname(mddev)); -- 2.24.1 ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-29 21:57 ` Song Liu @ 2020-06-03 10:12 ` Michal Soltys 2020-06-03 22:07 ` Song Liu 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-06-03 10:12 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 5/29/20 11:57 PM, Song Liu wrote: >> >> We can stop it. It didn't really hit any data checksum error in early phase >> of the recovery. So it did found a long long journal to recover. > > Could you please try the assemble again with the attached patch? > > Thanks, > Song > [Jun 2 16:08] md: md124 stopped. [ +0.008974] md/raid:md124: device sdf1 operational as raid disk 0 [ +0.006875] md/raid:md124: device sdj1 operational as raid disk 3 [ +0.006826] md/raid:md124: device sdh1 operational as raid disk 2 [ +0.006814] md/raid:md124: device sdg1 operational as raid disk 1 [ +0.007398] md/raid:md124: raid level 5 active with 4 out of 4 devices, algorithm 2 [ +0.008562] r5l_recovery_log starting from ctx->pos 160392632 ctx->seq 860687132 [ +11.215777] r5c_recovery_flush_log processing ctx->seq 860700000 [Jun 2 16:31] r5c_recovery_flush_log processing ctx->seq 860800000 [Jun 2 16:57] r5c_recovery_flush_log processing ctx->seq 860900000 [Jun 2 17:11] r5c_recovery_flush_log processing ctx->seq 861000000 [Jun 2 17:24] r5c_recovery_flush_log processing ctx->seq 861100000 [Jun 2 17:39] r5c_recovery_flush_log processing ctx->seq 861200000 [Jun 2 17:51] r5c_recovery_flush_log processing ctx->seq 861300000 [Jun 2 18:02] r5c_recovery_flush_log processing ctx->seq 861400000 [Jun 2 18:12] r5c_recovery_flush_log processing ctx->seq 861500000 [Jun 2 18:25] r5c_recovery_flush_log processing ctx->seq 861600000 [Jun 2 18:36] r5c_recovery_flush_log processing ctx->seq 861700000 [Jun 2 18:49] r5c_recovery_flush_log processing ctx->seq 861800000 [Jun 2 19:02] r5c_recovery_flush_log processing ctx->seq 861900000 [Jun 2 19:14] r5c_recovery_flush_log processing ctx->seq 862000000 [Jun 2 19:26] r5c_recovery_flush_log processing ctx->seq 862100000 [Jun 2 19:53] r5c_recovery_flush_log processing ctx->seq 862200000 [Jun 2 20:10] r5c_recovery_flush_log processing ctx->seq 862300000 [Jun 2 20:22] r5c_recovery_flush_log processing ctx->seq 862400000 [Jun 2 20:35] r5c_recovery_flush_log processing ctx->seq 862500000 [Jun 2 20:47] r5c_recovery_flush_log processing ctx->seq 862600000 [Jun 2 21:02] r5c_recovery_flush_log processing ctx->seq 862700000 [Jun 2 21:17] r5c_recovery_flush_log processing ctx->seq 862800000 [Jun 2 21:28] r5c_recovery_flush_log processing ctx->seq 862900000 [Jun 2 21:42] r5c_recovery_flush_log processing ctx->seq 863000000 [Jun 2 21:53] r5c_recovery_flush_log processing ctx->seq 863100000 [Jun 2 22:03] r5c_recovery_flush_log processing ctx->seq 863200000 [Jun 2 22:16] r5c_recovery_flush_log processing ctx->seq 863300000 [Jun 2 22:29] r5c_recovery_flush_log processing ctx->seq 863400000 [Jun 2 22:43] r5c_recovery_flush_log processing ctx->seq 863500000 [Jun 2 22:56] r5c_recovery_flush_log processing ctx->seq 863600000 [Jun 2 23:09] r5c_recovery_flush_log processing ctx->seq 863700000 [Jun 2 23:21] r5c_recovery_flush_log processing ctx->seq 863800000 [Jun 2 23:33] r5c_recovery_flush_log processing ctx->seq 863900000 [Jun 3 00:02] r5c_recovery_flush_log processing ctx->seq 864000000 [Jun 3 00:22] r5c_recovery_flush_log processing ctx->seq 864100000 [Jun 3 00:55] r5c_recovery_flush_log processing ctx->seq 864200000 [Jun 3 01:28] r5c_recovery_flush_log processing ctx->seq 864300000 [Jun 3 02:01] r5c_recovery_flush_log processing ctx->seq 864400000 [Jun 3 02:27] r5c_recovery_flush_log processing ctx->seq 864500000 [Jun 3 02:45] r5c_recovery_flush_log processing ctx->seq 864600000 [Jun 3 02:57] r5c_recovery_flush_log processing ctx->seq 864700000 [Jun 3 03:12] r5c_recovery_flush_log processing ctx->seq 864800000 [Jun 3 03:33] r5c_recovery_flush_log processing ctx->seq 864900000 [Jun 3 03:51] r5c_recovery_flush_log processing ctx->seq 865000000 [Jun 3 04:11] r5c_recovery_flush_log processing ctx->seq 865100000 [Jun 3 04:28] r5c_recovery_flush_log processing ctx->seq 865200000 [Jun 3 04:48] r5c_recovery_flush_log processing ctx->seq 865300000 [Jun 3 05:07] r5c_recovery_flush_log processing ctx->seq 865400000 [Jun 3 05:24] r5c_recovery_flush_log processing ctx->seq 865500000 [Jun 3 05:41] r5c_recovery_flush_log processing ctx->seq 865600000 [Jun 3 06:02] r5c_recovery_flush_log processing ctx->seq 865700000 [Jun 3 06:25] r5c_recovery_flush_log processing ctx->seq 865800000 [Jun 3 06:43] r5c_recovery_flush_log processing ctx->seq 865900000 [Jun 3 07:04] r5c_recovery_flush_log processing ctx->seq 866000000 [Jun 3 07:21] r5c_recovery_flush_log processing ctx->seq 866100000 [Jun 3 07:42] r5c_recovery_flush_log processing ctx->seq 866200000 [Jun 3 08:06] r5c_recovery_flush_log processing ctx->seq 866300000 [Jun 3 08:24] r5c_recovery_flush_log processing ctx->seq 866400000 [Jun 3 08:44] r5c_recovery_flush_log processing ctx->seq 866500000 [Jun 3 08:58] r5l_recovery_log finished scanning with ctx->pos 408082848 ctx->seq 866583360 [ +0.000003] md/raid:md124: recovering 24667 data-only stripes and 50212 data-parity stripes [ +0.037320] r5c_recovery_rewrite_data_only_stripes rewritten 1 stripes to the journal, current ctx->pos 408082864 ctx->seq 866583361 [ +0.182558] r5c_recovery_rewrite_data_only_stripes rewritten 1001 stripes to the journal, current ctx->pos 408099592 ctx->seq 866584361 [ +0.180980] r5c_recovery_rewrite_data_only_stripes rewritten 2001 stripes to the journal, current ctx->pos 408115904 ctx->seq 866585361 [ +0.196148] r5c_recovery_rewrite_data_only_stripes rewritten 3001 stripes to the journal, current ctx->pos 408133600 ctx->seq 866586361 [ +0.221433] r5c_recovery_rewrite_data_only_stripes rewritten 4001 stripes to the journal, current ctx->pos 408153680 ctx->seq 866587361 [ +0.223732] r5c_recovery_rewrite_data_only_stripes rewritten 5001 stripes to the journal, current ctx->pos 408173152 ctx->seq 866588361 [ +0.228663] r5c_recovery_rewrite_data_only_stripes rewritten 6001 stripes to the journal, current ctx->pos 408192808 ctx->seq 866589361 [ +0.234246] r5c_recovery_rewrite_data_only_stripes rewritten 7001 stripes to the journal, current ctx->pos 408212760 ctx->seq 866590361 [ +0.242665] r5c_recovery_rewrite_data_only_stripes rewritten 8001 stripes to the journal, current ctx->pos 408233176 ctx->seq 866591361 [ +0.231829] r5c_recovery_rewrite_data_only_stripes rewritten 9001 stripes to the journal, current ctx->pos 408251696 ctx->seq 866592361 [ +0.250124] r5c_recovery_rewrite_data_only_stripes rewritten 10001 stripes to the journal, current ctx->pos 408270968 ctx->seq 866593361 [ +0.240402] r5c_recovery_rewrite_data_only_stripes rewritten 11001 stripes to the journal, current ctx->pos 408289976 ctx->seq 866594361 [ +0.250681] r5c_recovery_rewrite_data_only_stripes rewritten 12001 stripes to the journal, current ctx->pos 408309784 ctx->seq 866595361 [ +0.258173] r5c_recovery_rewrite_data_only_stripes rewritten 13001 stripes to the journal, current ctx->pos 408329888 ctx->seq 866596361 [ +0.235759] r5c_recovery_rewrite_data_only_stripes rewritten 14001 stripes to the journal, current ctx->pos 408349112 ctx->seq 866597361 [ +0.255747] r5c_recovery_rewrite_data_only_stripes rewritten 15001 stripes to the journal, current ctx->pos 408368984 ctx->seq 866598361 [ +0.255252] r5c_recovery_rewrite_data_only_stripes rewritten 16001 stripes to the journal, current ctx->pos 408386768 ctx->seq 866599361 [ +0.264333] r5c_recovery_rewrite_data_only_stripes rewritten 17001 stripes to the journal, current ctx->pos 408407072 ctx->seq 866600361 [ +0.261822] r5c_recovery_rewrite_data_only_stripes rewritten 18001 stripes to the journal, current ctx->pos 408425840 ctx->seq 866601361 [ +0.270390] r5c_recovery_rewrite_data_only_stripes rewritten 19001 stripes to the journal, current ctx->pos 408443976 ctx->seq 866602361 [ +0.266591] r5c_recovery_rewrite_data_only_stripes rewritten 20001 stripes to the journal, current ctx->pos 408461384 ctx->seq 866603361 [ +0.276480] r5c_recovery_rewrite_data_only_stripes rewritten 21001 stripes to the journal, current ctx->pos 408479568 ctx->seq 866604361 [ +0.272757] r5c_recovery_rewrite_data_only_stripes rewritten 22001 stripes to the journal, current ctx->pos 408496600 ctx->seq 866605361 [ +0.290148] r5c_recovery_rewrite_data_only_stripes rewritten 23001 stripes to the journal, current ctx->pos 408515472 ctx->seq 866606361 [ +0.274369] r5c_recovery_rewrite_data_only_stripes rewritten 24001 stripes to the journal, current ctx->pos 408532112 ctx->seq 866607361 [ +0.237468] r5c_recovery_rewrite_data_only_stripes done [ +0.062124] r5c_recovery_flush_data_only_stripes enter [ +0.063396] r5c_recovery_flush_data_only_stripes before wait_event [Jun 3 09:02] INFO: task mdadm:2858 blocked for more than 120 seconds. [ +0.060545] Tainted: G E 5.4.19-msl-00001-gbf39596faf12 #2 [ +0.062932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.068255] mdadm D 0 2858 2751 0x00004002 [ +0.065247] Call Trace: [ +0.056200] ? __schedule+0x2e6/0x6f0 [ +0.054586] schedule+0x2f/0xa0 [ +0.052803] r5l_start.cold.43+0xb10/0xfca [raid456] [ +0.053892] ? finish_wait+0x80/0x80 [ +0.051771] md_start.part.48+0x2e/0x50 [md_mod] [ +0.051705] do_md_run+0x64/0x100 [md_mod] [ +0.051111] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.049628] blkdev_ioctl+0x4d0/0xa00 [ +0.048213] block_ioctl+0x39/0x40 [ +0.047055] do_vfs_ioctl+0xa4/0x630 [ +0.046446] ksys_ioctl+0x60/0x90 [ +0.046983] __x64_sys_ioctl+0x16/0x20 [ +0.045661] do_syscall_64+0x52/0x160 [ +0.043602] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.045892] RIP: 0033:0x7f72920c3427 [ +0.043231] Code: Bad RIP value. [ +0.040553] RSP: 002b:00007ffc2d2045f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.046595] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f72920c3427 [ +0.044504] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.044056] RBP: 00007ffc2d204770 R08: 000055f715c6ef40 R09: 000055f715a7ab60 [ +0.043743] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.042295] R13: 0000000000000000 R14: 00007ffc2d204f20 R15: 0000000000000004 [Jun 3 09:04] INFO: task mdadm:2858 blocked for more than 243 seconds. [ +0.040957] Tainted: G E 5.4.19-msl-00001-gbf39596faf12 #2 [ +0.040821] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.041663] mdadm D 0 2858 2751 0x00004002 [ +0.038191] Call Trace: [ +0.033593] ? __schedule+0x2e6/0x6f0 [ +0.034067] schedule+0x2f/0xa0 [ +0.032670] r5l_start.cold.43+0xb10/0xfca [raid456] [ +0.033911] ? finish_wait+0x80/0x80 [ +0.031274] md_start.part.48+0x2e/0x50 [md_mod] [ +0.032020] do_md_run+0x64/0x100 [md_mod] [ +0.031191] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.030522] blkdev_ioctl+0x4d0/0xa00 [ +0.028305] block_ioctl+0x39/0x40 [ +0.026702] do_vfs_ioctl+0xa4/0x630 [ +0.027148] ksys_ioctl+0x60/0x90 [ +0.024758] __x64_sys_ioctl+0x16/0x20 [ +0.025464] do_syscall_64+0x52/0x160 [ +0.023894] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.026684] RIP: 0033:0x7f72920c3427 [ +0.024336] Code: Bad RIP value. [ +0.024689] RSP: 002b:00007ffc2d2045f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.028442] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f72920c3427 [ +0.028941] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028573] RBP: 00007ffc2d204770 R08: 000055f715c6ef40 R09: 000055f715a7ab60 [ +0.028022] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028933] R13: 0000000000000000 R14: 00007ffc2d204f20 R15: 0000000000000004 [Jun 3 09:06] INFO: task mdadm:2858 blocked for more than 364 seconds. [ +0.027274] Tainted: G E 5.4.19-msl-00001-gbf39596faf12 #2 [ +0.029471] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.030418] mdadm D 0 2858 2751 0x00004002 [ +0.028200] Call Trace: [ +0.024432] ? __schedule+0x2e6/0x6f0 [ +0.026054] schedule+0x2f/0xa0 [ +0.024534] r5l_start.cold.43+0xb10/0xfca [raid456] [ +0.029208] ? finish_wait+0x80/0x80 [ +0.025526] md_start.part.48+0x2e/0x50 [md_mod] [ +0.028478] do_md_run+0x64/0x100 [md_mod] [ +0.027473] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.026305] blkdev_ioctl+0x4d0/0xa00 [ +0.026246] block_ioctl+0x39/0x40 [ +0.025604] do_vfs_ioctl+0xa4/0x630 [ +0.025801] ksys_ioctl+0x60/0x90 [ +0.025177] __x64_sys_ioctl+0x16/0x20 [ +0.025306] do_syscall_64+0x52/0x160 [ +0.025793] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.026747] RIP: 0033:0x7f72920c3427 [ +0.025422] Code: Bad RIP value. [ +0.023953] RSP: 002b:00007ffc2d2045f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.029392] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f72920c3427 [ +0.027915] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028305] RBP: 00007ffc2d204770 R08: 000055f715c6ef40 R09: 000055f715a7ab60 [ +0.028820] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.029250] R13: 0000000000000000 R14: 00007ffc2d204f20 R15: 0000000000000004 [Jun 3 09:08] INFO: task mdadm:2858 blocked for more than 485 seconds. [ +0.026953] Tainted: G E 5.4.19-msl-00001-gbf39596faf12 #2 [ +0.030072] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.030385] mdadm D 0 2858 2751 0x00004002 [ +0.027879] Call Trace: [ +0.023949] ? __schedule+0x2e6/0x6f0 [ +0.025820] schedule+0x2f/0xa0 [ +0.024656] r5l_start.cold.43+0xb10/0xfca [raid456] [ +0.027525] ? finish_wait+0x80/0x80 [ +0.025534] md_start.part.48+0x2e/0x50 [md_mod] [ +0.027898] do_md_run+0x64/0x100 [md_mod] [ +0.027299] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.025926] blkdev_ioctl+0x4d0/0xa00 [ +0.026269] block_ioctl+0x39/0x40 [ +0.025262] do_vfs_ioctl+0xa4/0x630 [ +0.025799] ksys_ioctl+0x60/0x90 [ +0.024000] __x64_sys_ioctl+0x16/0x20 [ +0.025950] do_syscall_64+0x52/0x160 [ +0.024311] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.026696] RIP: 0033:0x7f72920c3427 [ +0.024260] Code: Bad RIP value. [ +0.024967] RSP: 002b:00007ffc2d2045f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.028541] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f72920c3427 [ +0.028687] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028792] RBP: 00007ffc2d204770 R08: 000055f715c6ef40 R09: 000055f715a7ab60 [ +0.028316] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028385] R13: 0000000000000000 R14: 00007ffc2d204f20 R15: 0000000000000004 [Jun 3 09:10] INFO: task mdadm:2858 blocked for more than 606 seconds. [ +0.026999] Tainted: G E 5.4.19-msl-00001-gbf39596faf12 #2 [ +0.029364] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.030553] mdadm D 0 2858 2751 0x00004002 [ +0.030993] Call Trace: [ +0.024346] ? __schedule+0x2e6/0x6f0 [ +0.025899] schedule+0x2f/0xa0 [ +0.024906] r5l_start.cold.43+0xb10/0xfca [raid456] [ +0.027654] ? finish_wait+0x80/0x80 [ +0.026325] md_start.part.48+0x2e/0x50 [md_mod] [ +0.026537] do_md_run+0x64/0x100 [md_mod] [ +0.027048] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.027328] blkdev_ioctl+0x4d0/0xa00 [ +0.026367] block_ioctl+0x39/0x40 [ +0.025746] do_vfs_ioctl+0xa4/0x630 [ +0.025654] ksys_ioctl+0x60/0x90 [ +0.024867] __x64_sys_ioctl+0x16/0x20 [ +0.025322] do_syscall_64+0x52/0x160 [ +0.024934] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.026441] RIP: 0033:0x7f72920c3427 [ +0.025195] Code: Bad RIP value. [ +0.023641] RSP: 002b:00007ffc2d2045f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.029224] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f72920c3427 [ +0.028522] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028499] RBP: 00007ffc2d204770 R08: 000055f715c6ef40 R09: 000055f715a7ab60 [ +0.028515] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028559] R13: 0000000000000000 R14: 00007ffc2d204f20 R15: 0000000000000004 [Jun 3 09:12] INFO: task mdadm:2858 blocked for more than 727 seconds. [ +0.027689] Tainted: G E 5.4.19-msl-00001-gbf39596faf12 #2 [ +0.029425] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.030441] mdadm D 0 2858 2751 0x00004002 [ +0.028318] Call Trace: [ +0.024240] ? __schedule+0x2e6/0x6f0 [ +0.026113] schedule+0x2f/0xa0 [ +0.024594] r5l_start.cold.43+0xb10/0xfca [raid456] [ +0.027742] ? finish_wait+0x80/0x80 [ +0.026482] md_start.part.48+0x2e/0x50 [md_mod] [ +0.027073] do_md_run+0x64/0x100 [md_mod] [ +0.026928] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.026385] blkdev_ioctl+0x4d0/0xa00 [ +0.026621] block_ioctl+0x39/0x40 [ +0.026416] do_vfs_ioctl+0xa4/0x630 [ +0.026045] ksys_ioctl+0x60/0x90 [ +0.025177] __x64_sys_ioctl+0x16/0x20 [ +0.025337] do_syscall_64+0x52/0x160 [ +0.025306] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.026347] RIP: 0033:0x7f72920c3427 [ +0.025397] Code: Bad RIP value. [ +0.023684] RSP: 002b:00007ffc2d2045f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.029548] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f72920c3427 [ +0.028548] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028609] RBP: 00007ffc2d204770 R08: 000055f715c6ef40 R09: 000055f715a7ab60 [ +0.028464] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028697] R13: 0000000000000000 R14: 00007ffc2d204f20 R15: 0000000000000004 [Jun 3 09:14] INFO: task mdadm:2858 blocked for more than 847 seconds. [ +0.027980] Tainted: G E 5.4.19-msl-00001-gbf39596faf12 #2 [ +0.028799] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.030554] mdadm D 0 2858 2751 0x00004002 [ +0.028257] Call Trace: [ +0.023759] ? __schedule+0x2e6/0x6f0 [ +0.025938] schedule+0x2f/0xa0 [ +0.025325] r5l_start.cold.43+0xb10/0xfca [raid456] [ +0.030832] ? finish_wait+0x80/0x80 [ +0.026344] md_start.part.48+0x2e/0x50 [md_mod] [ +0.026746] do_md_run+0x64/0x100 [md_mod] [ +0.027229] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.027051] blkdev_ioctl+0x4d0/0xa00 [ +0.025505] block_ioctl+0x39/0x40 [ +0.026085] do_vfs_ioctl+0xa4/0x630 [ +0.025163] ksys_ioctl+0x60/0x90 [ +0.029285] __x64_sys_ioctl+0x16/0x20 [ +0.029185] do_syscall_64+0x52/0x160 [ +0.027525] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.030066] RIP: 0033:0x7f72920c3427 [ +0.028916] Code: Bad RIP value. [ +0.026674] RSP: 002b:00007ffc2d2045f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.029693] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f72920c3427 [ +0.028949] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028409] RBP: 00007ffc2d204770 R08: 000055f715c6ef40 R09: 000055f715a7ab60 [ +0.028117] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028542] R13: 0000000000000000 R14: 00007ffc2d204f20 R15: 0000000000000004 [Jun 3 09:16] INFO: task mdadm:2858 blocked for more than 968 seconds. [ +0.028070] Tainted: G E 5.4.19-msl-00001-gbf39596faf12 #2 [ +0.028488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.030356] mdadm D 0 2858 2751 0x00004002 [ +0.028778] Call Trace: [ +0.025616] ? __schedule+0x2e6/0x6f0 [ +0.028651] schedule+0x2f/0xa0 [ +0.028842] r5l_start.cold.43+0xb10/0xfca [raid456] [ +0.029961] ? finish_wait+0x80/0x80 [ +0.028681] md_start.part.48+0x2e/0x50 [md_mod] [ +0.030107] do_md_run+0x64/0x100 [md_mod] [ +0.030568] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.027043] blkdev_ioctl+0x4d0/0xa00 [ +0.026720] block_ioctl+0x39/0x40 [ +0.025781] do_vfs_ioctl+0xa4/0x630 [ +0.024701] ksys_ioctl+0x60/0x90 [ +0.025065] __x64_sys_ioctl+0x16/0x20 [ +0.024444] do_syscall_64+0x52/0x160 [ +0.025220] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.025739] RIP: 0033:0x7f72920c3427 [ +0.025300] Code: Bad RIP value. [ +0.023582] RSP: 002b:00007ffc2d2045f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.029201] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f72920c3427 [ +0.028748] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.027968] RBP: 00007ffc2d204770 R08: 000055f715c6ef40 R09: 000055f715a7ab60 [ +0.028481] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028766] R13: 0000000000000000 R14: 00007ffc2d204f20 R15: 0000000000000004 [Jun 3 09:18] INFO: task mdadm:2858 blocked for more than 1089 seconds. [ +0.028054] Tainted: G E 5.4.19-msl-00001-gbf39596faf12 #2 [ +0.028400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.030904] mdadm D 0 2858 2751 0x00004002 [ +0.028077] Call Trace: [ +0.024791] ? __schedule+0x2e6/0x6f0 [ +0.025875] schedule+0x2f/0xa0 [ +0.025618] r5l_start.cold.43+0xb10/0xfca [raid456] [ +0.027710] ? finish_wait+0x80/0x80 [ +0.026291] md_start.part.48+0x2e/0x50 [md_mod] [ +0.026487] do_md_run+0x64/0x100 [md_mod] [ +0.028776] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.027002] blkdev_ioctl+0x4d0/0xa00 [ +0.025169] block_ioctl+0x39/0x40 [ +0.025991] do_vfs_ioctl+0xa4/0x630 [ +0.024424] ksys_ioctl+0x60/0x90 [ +0.025340] __x64_sys_ioctl+0x16/0x20 [ +0.024386] do_syscall_64+0x52/0x160 [ +0.025086] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.025950] RIP: 0033:0x7f72920c3427 [ +0.025552] Code: Bad RIP value. [ +0.023459] RSP: 002b:00007ffc2d2045f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.030089] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f72920c3427 [ +0.028770] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.027951] RBP: 00007ffc2d204770 R08: 000055f715c6ef40 R09: 000055f715a7ab60 [ +0.028993] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028520] R13: 0000000000000000 R14: 00007ffc2d204f20 R15: 0000000000000004 [Jun 3 09:20] INFO: task mdadm:2858 blocked for more than 1210 seconds. [ +0.028171] Tainted: G E 5.4.19-msl-00001-gbf39596faf12 #2 [ +0.028430] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.030316] mdadm D 0 2858 2751 0x00004002 [ +0.028004] Call Trace: [ +0.024696] ? __schedule+0x2e6/0x6f0 [ +0.025941] schedule+0x2f/0xa0 [ +0.025382] r5l_start.cold.43+0xb10/0xfca [raid456] [ +0.027699] ? finish_wait+0x80/0x80 [ +0.027329] md_start.part.48+0x2e/0x50 [md_mod] [ +0.026295] do_md_run+0x64/0x100 [md_mod] [ +0.027274] md_ioctl+0xe7d/0x17d0 [md_mod] [ +0.026747] blkdev_ioctl+0x4d0/0xa00 [ +0.025499] block_ioctl+0x39/0x40 [ +0.025981] do_vfs_ioctl+0xa4/0x630 [ +0.025238] ksys_ioctl+0x60/0x90 [ +0.024917] __x64_sys_ioctl+0x16/0x20 [ +0.025268] do_syscall_64+0x52/0x160 [ +0.025288] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ +0.026166] RIP: 0033:0x7f72920c3427 [ +0.027251] Code: Bad RIP value. [ +0.027211] RSP: 002b:00007ffc2d2045f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.031611] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f72920c3427 [ +0.029797] RDX: 0000000000000000 RSI: 00000000400c0930 RDI: 0000000000000004 [ +0.028235] RBP: 00007ffc2d204770 R08: 000055f715c6ef40 R09: 000055f715a7ab60 [ +0.027762] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ +0.028697] R13: 0000000000000000 R14: 00007ffc2d204f20 R15: 0000000000000004 ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-03 10:12 ` Michal Soltys @ 2020-06-03 22:07 ` Song Liu 2020-06-04 13:58 ` Michal Soltys 2020-06-05 12:26 ` Michal Soltys 0 siblings, 2 replies; 46+ messages in thread From: Song Liu @ 2020-06-03 22:07 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid Thanks for the trace. On Wed, Jun 3, 2020 at 3:12 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 5/29/20 11:57 PM, Song Liu wrote: > >> > >> We can stop it. It didn't really hit any data checksum error in early phase > >> of the recovery. So it did found a long long journal to recover. > > > > Could you please try the assemble again with the attached patch? > > > > Thanks, > > Song > > [...] > [Jun 3 07:42] r5c_recovery_flush_log processing ctx->seq 866200000 > [Jun 3 08:06] r5c_recovery_flush_log processing ctx->seq 866300000 > [Jun 3 08:24] r5c_recovery_flush_log processing ctx->seq 866400000 > [Jun 3 08:44] r5c_recovery_flush_log processing ctx->seq 866500000 > [Jun 3 08:58] r5l_recovery_log finished scanning with ctx->pos 408082848 ctx->seq 866583360 > [ +0.000003] md/raid:md124: recovering 24667 data-only stripes and 50212 data-parity stripes We are recovering many stripes. I guess this is because we somehow missed a metadata update before the crash. > [ +0.037320] r5c_recovery_rewrite_data_only_stripes rewritten 1 stripes to the journal, current ctx->pos 408082864 ctx->seq 866583361 > [ +0.182558] r5c_recovery_rewrite_data_only_stripes rewritten 1001 stripes to the journal, current ctx->pos 408099592 ctx->seq 866584361 > [ +0.180980] r5c_recovery_rewrite_data_only_stripes rewritten 2001 stripes to the journal, current ctx->pos 408115904 ctx->seq 866585361 > [ +0.196148] r5c_recovery_rewrite_data_only_stripes rewritten 3001 stripes to the journal, current ctx->pos 408133600 ctx->seq 866586361 > [ +0.221433] r5c_recovery_rewrite_data_only_stripes rewritten 4001 stripes to the journal, current ctx->pos 408153680 ctx->seq 866587361 > [ +0.223732] r5c_recovery_rewrite_data_only_stripes rewritten 5001 stripes to the journal, current ctx->pos 408173152 ctx->seq 866588361 > [ +0.228663] r5c_recovery_rewrite_data_only_stripes rewritten 6001 stripes to the journal, current ctx->pos 408192808 ctx->seq 866589361 > [ +0.234246] r5c_recovery_rewrite_data_only_stripes rewritten 7001 stripes to the journal, current ctx->pos 408212760 ctx->seq 866590361 > [ +0.242665] r5c_recovery_rewrite_data_only_stripes rewritten 8001 stripes to the journal, current ctx->pos 408233176 ctx->seq 866591361 > [ +0.231829] r5c_recovery_rewrite_data_only_stripes rewritten 9001 stripes to the journal, current ctx->pos 408251696 ctx->seq 866592361 > [ +0.250124] r5c_recovery_rewrite_data_only_stripes rewritten 10001 stripes to the journal, current ctx->pos 408270968 ctx->seq 866593361 > [ +0.240402] r5c_recovery_rewrite_data_only_stripes rewritten 11001 stripes to the journal, current ctx->pos 408289976 ctx->seq 866594361 > [ +0.250681] r5c_recovery_rewrite_data_only_stripes rewritten 12001 stripes to the journal, current ctx->pos 408309784 ctx->seq 866595361 > [ +0.258173] r5c_recovery_rewrite_data_only_stripes rewritten 13001 stripes to the journal, current ctx->pos 408329888 ctx->seq 866596361 > [ +0.235759] r5c_recovery_rewrite_data_only_stripes rewritten 14001 stripes to the journal, current ctx->pos 408349112 ctx->seq 866597361 > [ +0.255747] r5c_recovery_rewrite_data_only_stripes rewritten 15001 stripes to the journal, current ctx->pos 408368984 ctx->seq 866598361 > [ +0.255252] r5c_recovery_rewrite_data_only_stripes rewritten 16001 stripes to the journal, current ctx->pos 408386768 ctx->seq 866599361 > [ +0.264333] r5c_recovery_rewrite_data_only_stripes rewritten 17001 stripes to the journal, current ctx->pos 408407072 ctx->seq 866600361 > [ +0.261822] r5c_recovery_rewrite_data_only_stripes rewritten 18001 stripes to the journal, current ctx->pos 408425840 ctx->seq 866601361 > [ +0.270390] r5c_recovery_rewrite_data_only_stripes rewritten 19001 stripes to the journal, current ctx->pos 408443976 ctx->seq 866602361 > [ +0.266591] r5c_recovery_rewrite_data_only_stripes rewritten 20001 stripes to the journal, current ctx->pos 408461384 ctx->seq 866603361 > [ +0.276480] r5c_recovery_rewrite_data_only_stripes rewritten 21001 stripes to the journal, current ctx->pos 408479568 ctx->seq 866604361 > [ +0.272757] r5c_recovery_rewrite_data_only_stripes rewritten 22001 stripes to the journal, current ctx->pos 408496600 ctx->seq 866605361 > [ +0.290148] r5c_recovery_rewrite_data_only_stripes rewritten 23001 stripes to the journal, current ctx->pos 408515472 ctx->seq 866606361 > [ +0.274369] r5c_recovery_rewrite_data_only_stripes rewritten 24001 stripes to the journal, current ctx->pos 408532112 ctx->seq 866607361 > [ +0.237468] r5c_recovery_rewrite_data_only_stripes done > [ +0.062124] r5c_recovery_flush_data_only_stripes enter > [ +0.063396] r5c_recovery_flush_data_only_stripes before wait_event The hang happens at expected place. > [Jun 3 09:02] INFO: task mdadm:2858 blocked for more than 120 seconds. > [ +0.060545] Tainted: G E 5.4.19-msl-00001-gbf39596faf12 #2 > [ +0.062932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Could you please try disable the timeout message with echo 0 > /proc/sys/kernel/hung_task_timeout_secs And during this wait (after message "r5c_recovery_flush_data_only_stripes before wait_event"), checks whether the raid disks (not the journal disk) are taking IOs (using tools like iostat). Thanks, Song ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-03 22:07 ` Song Liu @ 2020-06-04 13:58 ` Michal Soltys 2020-06-05 12:26 ` Michal Soltys 1 sibling, 0 replies; 46+ messages in thread From: Michal Soltys @ 2020-06-04 13:58 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 6/4/20 12:07 AM, Song Liu wrote: > > The hang happens at expected place. > >> [Jun 3 09:02] INFO: task mdadm:2858 blocked for more than 120 seconds. >> [ +0.060545] Tainted: G E 5.4.19-msl-00001-gbf39596faf12 #2 >> [ +0.062932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > Could you please try disable the timeout message with > > echo 0 > /proc/sys/kernel/hung_task_timeout_secs > > And during this wait (after message > "r5c_recovery_flush_data_only_stripes before wait_event"), > checks whether the raid disks (not the journal disk) are taking IOs > (using tools like iostat). > Will report tommorow (machine was restarted, so gotta wait 19+ hours again until r5c_recovery_flush_log / processing gets its part of the job completed). Non-assembling raid issue aside - any idea why is it so inhumanly slow ? It's not really much of an use in a production scenario in this state. Following as every-10 seconds stats from journal device after the assembly of the main raid started. Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn md125 3.00 3072.00 0.00 30720 0 md125 2.80 2867.20 0.00 28672 0 md125 2.10 2150.40 0.00 21504 0 md125 1.90 1945.60 0.00 19456 0 md125 2.00 1920.40 0.00 19204 0 md125 1.30 1331.20 0.00 13312 0 md125 1.50 1536.00 0.00 15360 0 ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-03 22:07 ` Song Liu 2020-06-04 13:58 ` Michal Soltys @ 2020-06-05 12:26 ` Michal Soltys 2020-06-09 9:36 ` Michal Soltys 1 sibling, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-06-05 12:26 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 6/4/20 12:07 AM, Song Liu wrote: > > The hang happens at expected place. > >> [Jun 3 09:02] INFO: task mdadm:2858 blocked for more than 120 seconds. >> [ +0.060545] Tainted: G E 5.4.19-msl-00001-gbf39596faf12 #2 >> [ +0.062932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > Could you please try disable the timeout message with > > echo 0 > /proc/sys/kernel/hung_task_timeout_secs > > And during this wait (after message > "r5c_recovery_flush_data_only_stripes before wait_event"), > checks whether the raid disks (not the journal disk) are taking IOs > (using tools like iostat). > No activity on component drives. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-05 12:26 ` Michal Soltys @ 2020-06-09 9:36 ` Michal Soltys 2020-06-09 18:36 ` Song Liu 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-06-09 9:36 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 6/5/20 2:26 PM, Michal Soltys wrote: > On 6/4/20 12:07 AM, Song Liu wrote: >> >> The hang happens at expected place. >> >>> [Jun 3 09:02] INFO: task mdadm:2858 blocked for more than 120 seconds. >>> [ +0.060545] Tainted: G E >>> 5.4.19-msl-00001-gbf39596faf12 #2 >>> [ +0.062932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >> >> Could you please try disable the timeout message with >> >> echo 0 > /proc/sys/kernel/hung_task_timeout_secs >> >> And during this wait (after message >> "r5c_recovery_flush_data_only_stripes before wait_event"), >> checks whether the raid disks (not the journal disk) are taking IOs >> (using tools like iostat). >> > > No activity on component drives. To expand on that - while there is no i/o activity whatsoever at the component drives (as well as journal), the cpu is of course still fully loaded (5 days so far): UID PID PPID C SZ RSS PSR STIME TTY TIME CMD root 8129 6755 15 740 1904 10 Jun04 pts/2 17:42:34 mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdj1 /dev/sdi1 /dev/sdg1 /dev/sdh1 root 8147 2 84 0 0 30 Jun04 ? 4-02:09:47 [md124_raid5] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-09 9:36 ` Michal Soltys @ 2020-06-09 18:36 ` Song Liu 2020-06-09 21:51 ` Michal Soltys 0 siblings, 1 reply; 46+ messages in thread From: Song Liu @ 2020-06-09 18:36 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid On Tue, Jun 9, 2020 at 2:36 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 6/5/20 2:26 PM, Michal Soltys wrote: > > On 6/4/20 12:07 AM, Song Liu wrote: > >> > >> The hang happens at expected place. > >> > >>> [Jun 3 09:02] INFO: task mdadm:2858 blocked for more than 120 seconds. > >>> [ +0.060545] Tainted: G E > >>> 5.4.19-msl-00001-gbf39596faf12 #2 > >>> [ +0.062932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>> disables this message. > >> > >> Could you please try disable the timeout message with > >> > >> echo 0 > /proc/sys/kernel/hung_task_timeout_secs > >> > >> And during this wait (after message > >> "r5c_recovery_flush_data_only_stripes before wait_event"), > >> checks whether the raid disks (not the journal disk) are taking IOs > >> (using tools like iostat). > >> > > > > No activity on component drives. > > To expand on that - while there is no i/o activity whatsoever at the component drives (as well as journal), the cpu is of course still fully loaded (5 days so far): > > UID PID PPID C SZ RSS PSR STIME TTY TIME CMD > root 8129 6755 15 740 1904 10 Jun04 pts/2 17:42:34 mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdj1 /dev/sdi1 /dev/sdg1 /dev/sdh1 > root 8147 2 84 0 0 30 Jun04 ? 4-02:09:47 [md124_raid5] I guess the md thread stuck at some stripe. Does the kernel have CONFIG_DYNAMIC_DEBUG enabled? If so, could you please try enable some pr_debug() in function handle_stripe()? Thanks, Song ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-09 18:36 ` Song Liu @ 2020-06-09 21:51 ` Michal Soltys 2020-06-16 11:11 ` Michal Soltys 2020-06-17 0:48 ` Song Liu 0 siblings, 2 replies; 46+ messages in thread From: Michal Soltys @ 2020-06-09 21:51 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 20/06/09 20:36, Song Liu wrote: > On Tue, Jun 9, 2020 at 2:36 AM Michal Soltys <msoltyspl@yandex.pl> wrote: >> >> On 6/5/20 2:26 PM, Michal Soltys wrote: >> > On 6/4/20 12:07 AM, Song Liu wrote: >> >> >> >> The hang happens at expected place. >> >> >> >>> [Jun 3 09:02] INFO: task mdadm:2858 blocked for more than 120 seconds. >> >>> [ +0.060545] Tainted: G E >> >>> 5.4.19-msl-00001-gbf39596faf12 #2 >> >>> [ +0.062932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> >>> disables this message. >> >> >> >> Could you please try disable the timeout message with >> >> >> >> echo 0 > /proc/sys/kernel/hung_task_timeout_secs >> >> >> >> And during this wait (after message >> >> "r5c_recovery_flush_data_only_stripes before wait_event"), >> >> checks whether the raid disks (not the journal disk) are taking IOs >> >> (using tools like iostat). >> >> >> > >> > No activity on component drives. >> >> To expand on that - while there is no i/o activity whatsoever at the component drives (as well as journal), the cpu is of course still fully loaded (5 days so far): >> >> UID PID PPID C SZ RSS PSR STIME TTY TIME CMD >> root 8129 6755 15 740 1904 10 Jun04 pts/2 17:42:34 mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdj1 /dev/sdi1 /dev/sdg1 /dev/sdh1 >> root 8147 2 84 0 0 30 Jun04 ? 4-02:09:47 [md124_raid5] > > I guess the md thread stuck at some stripe. Does the kernel have > CONFIG_DYNAMIC_DEBUG enabled? If so, could you please try enable some pr_debug() > in function handle_stripe()? > > Thanks, > Song > Massive spam in dmesg with messages like these: [464836.603033] handling stripe 1551540328, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603036] handling stripe 1551540336, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603038] handling stripe 1551540344, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603041] handling stripe 1551540352, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603045] handling stripe 1551540360, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603049] handling stripe 1551540368, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603052] handling stripe 1551540376, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603056] handling stripe 1551540384, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603059] handling stripe 1551540392, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603062] handling stripe 1551540400, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603065] handling stripe 1551540408, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603068] handling stripe 1551540416, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603071] handling stripe 1551540424, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603075] handling stripe 1551540432, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603077] handling stripe 1551540440, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 [464836.603078] handling stripe 1551540448, state=0x41 cnt=1, pd_idx=3, qd_idx=-1 , check:0, reconstruct:0 ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-09 21:51 ` Michal Soltys @ 2020-06-16 11:11 ` Michal Soltys 2020-06-17 0:48 ` Song Liu 1 sibling, 0 replies; 46+ messages in thread From: Michal Soltys @ 2020-06-16 11:11 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 6/9/20 11:51 PM, Michal Soltys wrote: > On 20/06/09 20:36, Song Liu wrote: >> On Tue, Jun 9, 2020 at 2:36 AM Michal Soltys <msoltyspl@yandex.pl> wrote: >>> >>> On 6/5/20 2:26 PM, Michal Soltys wrote: >>> > On 6/4/20 12:07 AM, Song Liu wrote: >>> >> >>> >> The hang happens at expected place. >>> >> >>> >>> [Jun 3 09:02] INFO: task mdadm:2858 blocked for more than 120 >>> seconds. >>> >>> [ +0.060545] Tainted: G E >>> >>> 5.4.19-msl-00001-gbf39596faf12 #2 >>> >>> [ +0.062932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> >>> disables this message. >>> >> >>> >> Could you please try disable the timeout message with >>> >> >>> >> echo 0 > /proc/sys/kernel/hung_task_timeout_secs >>> >> >>> >> And during this wait (after message >>> >> "r5c_recovery_flush_data_only_stripes before wait_event"), >>> >> checks whether the raid disks (not the journal disk) are taking IOs >>> >> (using tools like iostat). >>> >> >>> > >>> > No activity on component drives. >>> >>> To expand on that - while there is no i/o activity whatsoever at the >>> component drives (as well as journal), the cpu is of course still >>> fully loaded (5 days so far): >>> >>> UID PID PPID C SZ RSS PSR STIME TTY TIME CMD >>> root 8129 6755 15 740 1904 10 Jun04 pts/2 17:42:34 mdadm >>> -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdj1 /dev/sdi1 >>> /dev/sdg1 /dev/sdh1 >>> root 8147 2 84 0 0 30 Jun04 ? 4-02:09:47 >>> [md124_raid5] >> >> I guess the md thread stuck at some stripe. Does the kernel have >> CONFIG_DYNAMIC_DEBUG enabled? If so, could you please try enable some >> pr_debug() >> in function handle_stripe()? >> >> Thanks, >> Song >> > > Massive spam in dmesg with messages like these: > > [464836.603033] handling stripe 1551540328, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603036] handling stripe 1551540336, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603038] handling stripe 1551540344, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > <cut> So what should be the next step in debugging/fixing this ? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-09 21:51 ` Michal Soltys 2020-06-16 11:11 ` Michal Soltys @ 2020-06-17 0:48 ` Song Liu 2020-06-17 12:53 ` Michal Soltys 1 sibling, 1 reply; 46+ messages in thread From: Song Liu @ 2020-06-17 0:48 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid On Tue, Jun 9, 2020 at 2:51 PM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 20/06/09 20:36, Song Liu wrote: > > On Tue, Jun 9, 2020 at 2:36 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > >> > >> On 6/5/20 2:26 PM, Michal Soltys wrote: > >> > On 6/4/20 12:07 AM, Song Liu wrote: > >> >> > >> >> The hang happens at expected place. > >> >> > >> >>> [Jun 3 09:02] INFO: task mdadm:2858 blocked for more than 120 seconds. > >> >>> [ +0.060545] Tainted: G E > >> >>> 5.4.19-msl-00001-gbf39596faf12 #2 > >> >>> [ +0.062932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >> >>> disables this message. > >> >> > >> >> Could you please try disable the timeout message with > >> >> > >> >> echo 0 > /proc/sys/kernel/hung_task_timeout_secs > >> >> > >> >> And during this wait (after message > >> >> "r5c_recovery_flush_data_only_stripes before wait_event"), > >> >> checks whether the raid disks (not the journal disk) are taking IOs > >> >> (using tools like iostat). > >> >> > >> > > >> > No activity on component drives. > >> > >> To expand on that - while there is no i/o activity whatsoever at the component drives (as well as journal), the cpu is of course still fully loaded (5 days so far): > >> > >> UID PID PPID C SZ RSS PSR STIME TTY TIME CMD > >> root 8129 6755 15 740 1904 10 Jun04 pts/2 17:42:34 mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdj1 /dev/sdi1 /dev/sdg1 /dev/sdh1 > >> root 8147 2 84 0 0 30 Jun04 ? 4-02:09:47 [md124_raid5] > > > > I guess the md thread stuck at some stripe. Does the kernel have > > CONFIG_DYNAMIC_DEBUG enabled? If so, could you please try enable some pr_debug() > > in function handle_stripe()? > > > > Thanks, > > Song > > > > Massive spam in dmesg with messages like these: > > [464836.603033] handling stripe 1551540328, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603036] handling stripe 1551540336, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603038] handling stripe 1551540344, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603041] handling stripe 1551540352, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603045] handling stripe 1551540360, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603049] handling stripe 1551540368, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603052] handling stripe 1551540376, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603056] handling stripe 1551540384, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603059] handling stripe 1551540392, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603062] handling stripe 1551540400, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603065] handling stripe 1551540408, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603068] handling stripe 1551540416, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603071] handling stripe 1551540424, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603075] handling stripe 1551540432, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603077] handling stripe 1551540440, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 > [464836.603078] handling stripe 1551540448, state=0x41 cnt=1, pd_idx=3, > qd_idx=-1 > , check:0, reconstruct:0 I am really sorry for the delay. A few questions: 1. There are two pr_debug() calls in handle_stripe(): pr_debug("handling stripe %llu, state=%#lx cnt=%d, " pr_debug("locked=%d uptodate=%d to_read=%d" Did you enable all of them? Or only the first one? 2. In the log above, we have pd_idx = 3 for all these stripes. Is this the case for all the stripes? If that's the case, is it a raid4 instead of raid5? Thanks, Song ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-17 0:48 ` Song Liu @ 2020-06-17 12:53 ` Michal Soltys 2020-06-17 17:11 ` Song Liu 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-06-17 12:53 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2771 bytes --] On 6/17/20 2:48 AM, Song Liu wrote: > On Tue, Jun 9, 2020 at 2:51 PM Michal Soltys <msoltyspl@yandex.pl> wrote: >> >> On 20/06/09 20:36, Song Liu wrote: >>> On Tue, Jun 9, 2020 at 2:36 AM Michal Soltys <msoltyspl@yandex.pl> wrote: >>>> >>>> On 6/5/20 2:26 PM, Michal Soltys wrote: >>>>> On 6/4/20 12:07 AM, Song Liu wrote: >>>>>> >>>>>> The hang happens at expected place. >>>>>> >>>>>>> [Jun 3 09:02] INFO: task mdadm:2858 blocked for more than 120 seconds. >>>>>>> [ +0.060545] Tainted: G E >>>>>>> 5.4.19-msl-00001-gbf39596faf12 #2 >>>>>>> [ +0.062932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>>>>>> disables this message. >>>>>> >>>>>> Could you please try disable the timeout message with >>>>>> >>>>>> echo 0 > /proc/sys/kernel/hung_task_timeout_secs >>>>>> >>>>>> And during this wait (after message >>>>>> "r5c_recovery_flush_data_only_stripes before wait_event"), >>>>>> checks whether the raid disks (not the journal disk) are taking IOs >>>>>> (using tools like iostat). >>>>>> >>>>> >>>>> No activity on component drives. >>>> >>>> To expand on that - while there is no i/o activity whatsoever at the component drives (as well as journal), the cpu is of course still fully loaded (5 days so far): >>>> >>>> UID PID PPID C SZ RSS PSR STIME TTY TIME CMD >>>> root 8129 6755 15 740 1904 10 Jun04 pts/2 17:42:34 mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdj1 /dev/sdi1 /dev/sdg1 /dev/sdh1 >>>> root 8147 2 84 0 0 30 Jun04 ? 4-02:09:47 [md124_raid5] >>> >>> I guess the md thread stuck at some stripe. Does the kernel have >>> CONFIG_DYNAMIC_DEBUG enabled? If so, could you please try enable some pr_debug() >>> in function handle_stripe()? >>> >>> Thanks, >>> Song >>> >> >> Massive spam in dmesg with messages like these: >> >> [464836.603033] handling stripe 1551540328, state=0x41 cnt=1, pd_idx=3, >> qd_idx=-1 >> , check:0, reconstruct:0 >> <cut> >> > > I am really sorry for the delay. A few questions: > No worries. > 1. There are two pr_debug() calls in handle_stripe(): > pr_debug("handling stripe %llu, state=%#lx cnt=%d, " > pr_debug("locked=%d uptodate=%d to_read=%d" > > Did you enable all of them? Or only the first one? I enabled all of them (I think), to be precise: echo -n 'func handle_stripe +p' >/sys/kernel/debug/dynamic_debug/control Haven't seen any `locked` lines though. > > 2. In the log above, we have pd_idx = 3 for all these stripes. Is this > the case for > all the stripes? If that's the case, is it a raid4 instead of raid5? > Just a coincidence, pd_idx is in 0..3 range. The rest of the values are constant from what I can see. I've attached longer piece of dmesg output. [-- Attachment #2: dmesg.txt.xz --] [-- Type: application/x-xz, Size: 18160 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-17 12:53 ` Michal Soltys @ 2020-06-17 17:11 ` Song Liu 2020-06-19 11:35 ` Michal Soltys 0 siblings, 1 reply; 46+ messages in thread From: Song Liu @ 2020-06-17 17:11 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2770 bytes --] On Wed, Jun 17, 2020 at 5:53 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 6/17/20 2:48 AM, Song Liu wrote: > > On Tue, Jun 9, 2020 at 2:51 PM Michal Soltys <msoltyspl@yandex.pl> wrote: > >> > >> On 20/06/09 20:36, Song Liu wrote: > >>> On Tue, Jun 9, 2020 at 2:36 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > >>>> > >>>> On 6/5/20 2:26 PM, Michal Soltys wrote: > >>>>> On 6/4/20 12:07 AM, Song Liu wrote: > >>>>>> > >>>>>> The hang happens at expected place. > >>>>>> > >>>>>>> [Jun 3 09:02] INFO: task mdadm:2858 blocked for more than 120 seconds. > >>>>>>> [ +0.060545] Tainted: G E > >>>>>>> 5.4.19-msl-00001-gbf39596faf12 #2 > >>>>>>> [ +0.062932] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>>>>>> disables this message. > >>>>>> > >>>>>> Could you please try disable the timeout message with > >>>>>> > >>>>>> echo 0 > /proc/sys/kernel/hung_task_timeout_secs > >>>>>> > >>>>>> And during this wait (after message > >>>>>> "r5c_recovery_flush_data_only_stripes before wait_event"), > >>>>>> checks whether the raid disks (not the journal disk) are taking IOs > >>>>>> (using tools like iostat). > >>>>>> > >>>>> > >>>>> No activity on component drives. > >>>> > >>>> To expand on that - while there is no i/o activity whatsoever at the component drives (as well as journal), the cpu is of course still fully loaded (5 days so far): > >>>> > >>>> UID PID PPID C SZ RSS PSR STIME TTY TIME CMD > >>>> root 8129 6755 15 740 1904 10 Jun04 pts/2 17:42:34 mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdj1 /dev/sdi1 /dev/sdg1 /dev/sdh1 > >>>> root 8147 2 84 0 0 30 Jun04 ? 4-02:09:47 [md124_raid5] > >>> > >>> I guess the md thread stuck at some stripe. Does the kernel have > >>> CONFIG_DYNAMIC_DEBUG enabled? If so, could you please try enable some pr_debug() > >>> in function handle_stripe()? > >>> > >>> Thanks, > >>> Song > >>> > >> > >> Massive spam in dmesg with messages like these: > >> > >> [464836.603033] handling stripe 1551540328, state=0x41 cnt=1, pd_idx=3, > >> qd_idx=-1 > >> , check:0, reconstruct:0 > >> <cut> > >> > > > > I am really sorry for the delay. A few questions: > > > > No worries. > > > 1. There are two pr_debug() calls in handle_stripe(): > > pr_debug("handling stripe %llu, state=%#lx cnt=%d, " > > pr_debug("locked=%d uptodate=%d to_read=%d" > > > > Did you enable all of them? Or only the first one? > > I enabled all of them (I think), to be precise: > > echo -n 'func handle_stripe +p' >/sys/kernel/debug/dynamic_debug/control > > Haven't seen any `locked` lines though. That's a little weird, and probably explains why we stuck. Could you please try the attached patch? Thanks, Song [-- Attachment #2: 0001-Debug-raid5-recovery-hang.patch --] [-- Type: application/octet-stream, Size: 1726 bytes --] From 4c60a692fffc1eb51a4811d36eec6076b779c6c6 Mon Sep 17 00:00:00 2001 From: Song Liu <songliubraving@fb.com> Date: Wed, 17 Jun 2020 09:56:06 -0700 Subject: [PATCH] Debug raid5 recovery hang --- drivers/md/raid5.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index ab8067f9ce8c6..ca7476fac3e9d 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4726,6 +4726,20 @@ static void handle_stripe(struct stripe_head *sh) if (test_bit(STRIPE_LOG_TRAPPED, &sh->state)) goto finish; + pr_debug("locked=%d uptodate=%d to_read=%d" + " to_write=%d failed=%d failed_num=%d,%d\n", + s.locked, s.uptodate, s.to_read, s.to_write, s.failed, + s.failed_num[0], s.failed_num[1]); + + pr_debug("handle_bad_blocks=%d MD_SB_CHANGE_PENDING=%d blocked_rdev=%px\n", + s.handle_bad_blocks, + test_bit(MD_SB_CHANGE_PENDING, &conf->mddev->sb_flags), + s.blocked_rdev); + + pr_debug("syncing=%d expanding=%d expanded=%d" + " replacing=%d written=%d\n", + s.syncing, s.expanding, s.expanded, s.replacing, s.written); + if (s.handle_bad_blocks || test_bit(MD_SB_CHANGE_PENDING, &conf->mddev->sb_flags)) { set_bit(STRIPE_HANDLE, &sh->state); @@ -4748,10 +4762,6 @@ static void handle_stripe(struct stripe_head *sh) set_bit(STRIPE_BIOFILL_RUN, &sh->state); } - pr_debug("locked=%d uptodate=%d to_read=%d" - " to_write=%d failed=%d failed_num=%d,%d\n", - s.locked, s.uptodate, s.to_read, s.to_write, s.failed, - s.failed_num[0], s.failed_num[1]); /* * check if the array has lost more than max_degraded devices and, * if so, some requests might need to be failed. -- 2.24.1 ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-17 17:11 ` Song Liu @ 2020-06-19 11:35 ` Michal Soltys 2020-06-20 0:14 ` Song Liu 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-06-19 11:35 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 6/17/20 7:11 PM, Song Liu wrote: >> >>> 1. There are two pr_debug() calls in handle_stripe(): >>> pr_debug("handling stripe %llu, state=%#lx cnt=%d, " >>> pr_debug("locked=%d uptodate=%d to_read=%d" >>> >>> Did you enable all of them? Or only the first one? >> >> I enabled all of them (I think), to be precise: >> >> echo -n 'func handle_stripe +p' >/sys/kernel/debug/dynamic_debug/control >> >> Haven't seen any `locked` lines though. > > That's a little weird, and probably explains why we stuck. Could you > please try the attached patch? > > Thanks, > Song > I've started assembly with the above patch, the output can be inspected from this file: https://ufile.io/yx4bbcb4 This is ~5mb packed dmesg from start of the boot to the relevant parts, unpacks to ~150mb file. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-19 11:35 ` Michal Soltys @ 2020-06-20 0:14 ` Song Liu 2020-06-22 11:12 ` Michal Soltys 0 siblings, 1 reply; 46+ messages in thread From: Song Liu @ 2020-06-20 0:14 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1074 bytes --] On Fri, Jun 19, 2020 at 4:35 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 6/17/20 7:11 PM, Song Liu wrote: > >> > >>> 1. There are two pr_debug() calls in handle_stripe(): > >>> pr_debug("handling stripe %llu, state=%#lx cnt=%d, " > >>> pr_debug("locked=%d uptodate=%d to_read=%d" > >>> > >>> Did you enable all of them? Or only the first one? > >> > >> I enabled all of them (I think), to be precise: > >> > >> echo -n 'func handle_stripe +p' >/sys/kernel/debug/dynamic_debug/control > >> > >> Haven't seen any `locked` lines though. > > > > That's a little weird, and probably explains why we stuck. Could you > > please try the attached patch? > > > > Thanks, > > Song > > > > I've started assembly with the above patch, the output can be inspected > from this file: > > https://ufile.io/yx4bbcb4 > > This is ~5mb packed dmesg from start of the boot to the relevant parts, > unpacks to ~150mb file. Thanks for the trace. Looks like we may have some issues with MD_SB_CHANGE_PENDING. Could you please try the attached patch? Thanks, Song [-- Attachment #2: 0001-debug-MD_SB_CHANGE_PENDING.patch --] [-- Type: application/octet-stream, Size: 5015 bytes --] From 6c3758525c4abadc64839cf18ee3ed7ad40293da Mon Sep 17 00:00:00 2001 From: Song Liu <songliubraving@fb.com> Date: Fri, 19 Jun 2020 17:12:05 -0700 Subject: [PATCH] debug MD_SB_CHANGE_PENDING --- drivers/md/md.c | 20 ++++++++++++++++++++ drivers/md/raid5-cache.c | 1 + drivers/md/raid5.c | 2 ++ 3 files changed, 23 insertions(+) diff --git a/drivers/md/md.c b/drivers/md/md.c index f567f536b529b..dbbc8a50e2ed2 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -2670,6 +2670,8 @@ void md_update_sb(struct mddev *mddev, int force_change) bit_clear_unless(&mddev->sb_flags, BIT(MD_SB_CHANGE_PENDING), BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_CLEAN)); + pr_info("%s bit_clear_unless sb_flags %lx\n", __func__, + mddev->sb_flags); return; } } @@ -2697,6 +2699,8 @@ void md_update_sb(struct mddev *mddev, int force_change) clear_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); if (!mddev->external) { clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); + pr_info("%s clear MD_SB_CHANGE_PENDING\n", __func__); + rdev_for_each(rdev, mddev) { if (rdev->badblocks.changed) { rdev->badblocks.changed = 0; @@ -2815,6 +2819,9 @@ void md_update_sb(struct mddev *mddev, int force_change) if (mddev_is_clustered(mddev) && ret == 0) md_cluster_ops->metadata_update_finish(mddev); + if (mddev->in_sync != sync_req) + pr_info("%s bit_clear_unless sb_flags %lx\n", __func__, + mddev->sb_flags); if (mddev->in_sync != sync_req || !bit_clear_unless(&mddev->sb_flags, BIT(MD_SB_CHANGE_PENDING), BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_CLEAN))) @@ -4389,6 +4396,8 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len) if (st == active) { restart_array(mddev); clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); + pr_info("%s clear MD_SB_CHANGE_PENDING\n", __func__); + md_wakeup_thread(mddev->thread); wake_up(&mddev->sb_wait); } else /* st == clean */ { @@ -4463,6 +4472,8 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len) if (err) break; clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); + pr_info("%s clear MD_SB_CHANGE_PENDING\n", __func__); + wake_up(&mddev->sb_wait); err = 0; } else { @@ -8469,6 +8480,7 @@ bool md_write_start(struct mddev *mddev, struct bio *bi) mddev->in_sync = 0; set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); + pr_info("%s set MD_SB_CHANGE_PENDING\n", __func__); md_wakeup_thread(mddev->thread); did_change = 1; } @@ -8544,6 +8556,8 @@ void md_allow_write(struct mddev *mddev) mddev->in_sync = 0; set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); + pr_info("%s set MD_SB_CHANGE_PENDING\n", __func__); + if (mddev->safemode_delay && mddev->safemode == 0) mddev->safemode = 1; @@ -8948,6 +8962,8 @@ void md_do_sync(struct md_thread *thread) set_mask_bits(&mddev->sb_flags, 0, BIT(MD_SB_CHANGE_PENDING) | BIT(MD_SB_CHANGE_DEVS)); + pr_info("%s set MD_SB_CHANGE_PENDING\n", __func__); + if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && !test_bit(MD_RECOVERY_INTR, &mddev->recovery) && mddev->delta_disks > 0 && @@ -9197,6 +9213,8 @@ void md_check_recovery(struct mddev *mddev) clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); + pr_info("%s clear MD_SB_CHANGE_PENDING\n", __func__); + goto unlock; } @@ -9411,6 +9429,8 @@ int rdev_set_badblocks(struct md_rdev *rdev, sector_t s, int sectors, sysfs_notify_dirent_safe(rdev->sysfs_state); set_mask_bits(&mddev->sb_flags, 0, BIT(MD_SB_CHANGE_CLEAN) | BIT(MD_SB_CHANGE_PENDING)); + pr_info("%s set MD_SB_CHANGE_PENDING\n", __func__); + md_wakeup_thread(rdev->mddev->thread); return 1; } else diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 9b6da759dca25..60fa08f0170bc 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -1337,6 +1337,7 @@ static void r5l_write_super_and_discard_space(struct r5l_log *log, */ set_mask_bits(&mddev->sb_flags, 0, BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING)); + pr_info("%s set MD_SB_CHANGE_PENDING\n", __func__); if (!mddev_trylock(mddev)) return; md_update_sb(mddev, 1); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index ca7476fac3e9d..3240dee1da7bf 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -2704,6 +2704,8 @@ static void raid5_error(struct mddev *mddev, struct md_rdev *rdev) set_bit(Blocked, &rdev->flags); set_mask_bits(&mddev->sb_flags, 0, BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING)); + pr_info("%s set MD_SB_CHANGE_PENDING\n", __func__); + pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n" "md/raid:%s: Operation continuing on %d devices.\n", mdname(mddev), -- 2.24.1 ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-20 0:14 ` Song Liu @ 2020-06-22 11:12 ` Michal Soltys 2020-06-22 16:37 ` Song Liu 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-06-22 11:12 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 6/20/20 2:14 AM, Song Liu wrote: > On Fri, Jun 19, 2020 at 4:35 AM Michal Soltys <msoltyspl@yandex.pl> wrote: >> >> On 6/17/20 7:11 PM, Song Liu wrote: >>>> >>>>> 1. There are two pr_debug() calls in handle_stripe(): >>>>> pr_debug("handling stripe %llu, state=%#lx cnt=%d, " >>>>> pr_debug("locked=%d uptodate=%d to_read=%d" >>>>> >>>>> Did you enable all of them? Or only the first one? >>>> >>>> I enabled all of them (I think), to be precise: >>>> >>>> echo -n 'func handle_stripe +p' >/sys/kernel/debug/dynamic_debug/control >>>> >>>> Haven't seen any `locked` lines though. >>> >>> That's a little weird, and probably explains why we stuck. Could you >>> please try the attached patch? >>> >>> Thanks, >>> Song >>> >> >> I've started assembly with the above patch, the output can be inspected >> from this file: >> >> https://ufile.io/yx4bbcb4 >> >> This is ~5mb packed dmesg from start of the boot to the relevant parts, >> unpacks to ~150mb file. > > Thanks for the trace. Looks like we may have some issues with > MD_SB_CHANGE_PENDING. > Could you please try the attached patch? Should I run this along with pr_debugs from the previous patch enabled ? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-22 11:12 ` Michal Soltys @ 2020-06-22 16:37 ` Song Liu 2020-06-23 13:17 ` Michal Soltys 0 siblings, 1 reply; 46+ messages in thread From: Song Liu @ 2020-06-22 16:37 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid On Mon, Jun 22, 2020 at 4:12 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 6/20/20 2:14 AM, Song Liu wrote: > > On Fri, Jun 19, 2020 at 4:35 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > >> > >> On 6/17/20 7:11 PM, Song Liu wrote: > >>>> > >>>>> 1. There are two pr_debug() calls in handle_stripe(): > >>>>> pr_debug("handling stripe %llu, state=%#lx cnt=%d, " > >>>>> pr_debug("locked=%d uptodate=%d to_read=%d" > >>>>> > >>>>> Did you enable all of them? Or only the first one? > >>>> > >>>> I enabled all of them (I think), to be precise: > >>>> > >>>> echo -n 'func handle_stripe +p' >/sys/kernel/debug/dynamic_debug/control > >>>> > >>>> Haven't seen any `locked` lines though. > >>> > >>> That's a little weird, and probably explains why we stuck. Could you > >>> please try the attached patch? > >>> > >>> Thanks, > >>> Song > >>> > >> > >> I've started assembly with the above patch, the output can be inspected > >> from this file: > >> > >> https://ufile.io/yx4bbcb4 > >> > >> This is ~5mb packed dmesg from start of the boot to the relevant parts, > >> unpacks to ~150mb file. > > > > Thanks for the trace. Looks like we may have some issues with > > MD_SB_CHANGE_PENDING. > > Could you please try the attached patch? > > Should I run this along with pr_debugs from the previous patch enabled ? We don't need those pr_debug() here. Thanks, Song ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-22 16:37 ` Song Liu @ 2020-06-23 13:17 ` Michal Soltys 2020-06-23 23:13 ` Song Liu 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-06-23 13:17 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 6/22/20 6:37 PM, Song Liu wrote: >>> >>> Thanks for the trace. Looks like we may have some issues with >>> MD_SB_CHANGE_PENDING. >>> Could you please try the attached patch? >> >> Should I run this along with pr_debugs from the previous patch enabled ? > > We don't need those pr_debug() here. > > Thanks, > Song > So with this patch attached, there is no extra output whatsoever - once it finished getting past this point: [ +0.371752] r5c_recovery_rewrite_data_only_stripes rewritten 20001 stripes to the journal, current ctx->pos 408461384 ctx->seq 866603361 [ +0.395000] r5c_recovery_rewrite_data_only_stripes rewritten 21001 stripes to the journal, current ctx->pos 408479568 ctx->seq 866604361 [ +0.371255] r5c_recovery_rewrite_data_only_stripes rewritten 22001 stripes to the journal, current ctx->pos 408496600 ctx->seq 866605361 [ +0.401013] r5c_recovery_rewrite_data_only_stripes rewritten 23001 stripes to the journal, current ctx->pos 408515472 ctx->seq 866606361 [ +0.370543] r5c_recovery_rewrite_data_only_stripes rewritten 24001 stripes to the journal, current ctx->pos 408532112 ctx->seq 866607361 [ +0.319253] r5c_recovery_rewrite_data_only_stripes done [ +0.061560] r5c_recovery_flush_data_only_stripes enter [ +0.075697] r5c_recovery_flush_data_only_stripes before wait_event That is, besides 'task <....> blocked for' traces or unless pr_debug()s were enabled. There were a few 'md_write_start set MD_SB_CHANGE_PENDING' *before* that (all of them likely related to another raid that is active at the moment, as these were happening during that lengthy r5c_recovery_flush_log() process). ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-23 13:17 ` Michal Soltys @ 2020-06-23 23:13 ` Song Liu 2020-06-25 16:11 ` Michal Soltys 0 siblings, 1 reply; 46+ messages in thread From: Song Liu @ 2020-06-23 23:13 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid On Tue, Jun 23, 2020 at 6:17 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 6/22/20 6:37 PM, Song Liu wrote: > >>> > >>> Thanks for the trace. Looks like we may have some issues with > >>> MD_SB_CHANGE_PENDING. > >>> Could you please try the attached patch? > >> > >> Should I run this along with pr_debugs from the previous patch enabled ? > > > > We don't need those pr_debug() here. > > > > Thanks, > > Song > > > > So with this patch attached, there is no extra output whatsoever - once it finished getting past this point: > > [ +0.371752] r5c_recovery_rewrite_data_only_stripes rewritten 20001 stripes to the journal, current ctx->pos 408461384 ctx->seq 866603361 > [ +0.395000] r5c_recovery_rewrite_data_only_stripes rewritten 21001 stripes to the journal, current ctx->pos 408479568 ctx->seq 866604361 > [ +0.371255] r5c_recovery_rewrite_data_only_stripes rewritten 22001 stripes to the journal, current ctx->pos 408496600 ctx->seq 866605361 > [ +0.401013] r5c_recovery_rewrite_data_only_stripes rewritten 23001 stripes to the journal, current ctx->pos 408515472 ctx->seq 866606361 > [ +0.370543] r5c_recovery_rewrite_data_only_stripes rewritten 24001 stripes to the journal, current ctx->pos 408532112 ctx->seq 866607361 > [ +0.319253] r5c_recovery_rewrite_data_only_stripes done > [ +0.061560] r5c_recovery_flush_data_only_stripes enter > [ +0.075697] r5c_recovery_flush_data_only_stripes before wait_event > > That is, besides 'task <....> blocked for' traces or unless pr_debug()s were enabled. > > There were a few 'md_write_start set MD_SB_CHANGE_PENDING' *before* that (all of them likely related to another raid that is active at the moment, as these were happening during that lengthy r5c_recovery_flush_log() process). Hmm.. this is weird, as I think I marked every instance of set_bit MD_SB_CHANGE_PENDING. Would you mind confirm those are to the other array with something like: diff --git i/drivers/md/md.c w/drivers/md/md.c index dbbc8a50e2ed2..e91acfdcec032 100644 --- i/drivers/md/md.c +++ w/drivers/md/md.c @@ -8480,7 +8480,7 @@ bool md_write_start(struct mddev *mddev, struct bio *bi) mddev->in_sync = 0; set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); - pr_info("%s set MD_SB_CHANGE_PENDING\n", __func__); + pr_info("%s: md: %s set MD_SB_CHANGE_PENDING\n", __func__, mdname(mddev)); md_wakeup_thread(mddev->thread); did_change = 1; } Thanks, Song ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-23 23:13 ` Song Liu @ 2020-06-25 16:11 ` Michal Soltys 2020-07-06 8:07 ` Michal Soltys 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-06-25 16:11 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1123 bytes --] On 6/24/20 1:13 AM, Song Liu wrote: > On Tue, Jun 23, 2020 at 6:17 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > Hmm.. this is weird, as I think I marked every instance of set_bit > MD_SB_CHANGE_PENDING. > Would you mind confirm those are to the other array with something like: > > diff --git i/drivers/md/md.c w/drivers/md/md.c > index dbbc8a50e2ed2..e91acfdcec032 100644 > --- i/drivers/md/md.c > +++ w/drivers/md/md.c > @@ -8480,7 +8480,7 @@ bool md_write_start(struct mddev *mddev, struct bio *bi) > mddev->in_sync = 0; > set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); > set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); > - pr_info("%s set MD_SB_CHANGE_PENDING\n", __func__); > + pr_info("%s: md: %s set > MD_SB_CHANGE_PENDING\n", __func__, mdname(mddev)); > md_wakeup_thread(mddev->thread); > did_change = 1; > } > > Thanks, > Song > dmesg attached - md127 - journal - md126 - the other raid - md125 - the problematic one [-- Attachment #2: dmesg.txt.xz --] [-- Type: application/x-xz, Size: 24356 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-06-25 16:11 ` Michal Soltys @ 2020-07-06 8:07 ` Michal Soltys 2020-07-06 22:08 ` Song Liu 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-07-06 8:07 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 20/06/25 18:11, Michal Soltys wrote: > On 6/24/20 1:13 AM, Song Liu wrote: >> On Tue, Jun 23, 2020 at 6:17 AM Michal Soltys <msoltyspl@yandex.pl> >> wrote: >> >> Hmm.. this is weird, as I think I marked every instance of set_bit >> MD_SB_CHANGE_PENDING. >> Would you mind confirm those are to the other array with something like: >> >> diff --git i/drivers/md/md.c w/drivers/md/md.c >> index dbbc8a50e2ed2..e91acfdcec032 100644 >> --- i/drivers/md/md.c >> +++ w/drivers/md/md.c >> @@ -8480,7 +8480,7 @@ bool md_write_start(struct mddev *mddev, struct >> bio *bi) >> mddev->in_sync = 0; >> set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); >> set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); >> - pr_info("%s set MD_SB_CHANGE_PENDING\n", >> __func__); >> + pr_info("%s: md: %s set >> MD_SB_CHANGE_PENDING\n", __func__, mdname(mddev)); >> md_wakeup_thread(mddev->thread); >> did_change = 1; >> } >> >> Thanks, >> Song >> > > dmesg attached > - md127 - journal > - md126 - the other raid > - md125 - the problematic one So, what kind of next step after this ? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-07-06 8:07 ` Michal Soltys @ 2020-07-06 22:08 ` Song Liu 2020-07-08 11:29 ` Michal Soltys 0 siblings, 1 reply; 46+ messages in thread From: Song Liu @ 2020-07-06 22:08 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1653 bytes --] On Mon, Jul 6, 2020 at 1:07 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 20/06/25 18:11, Michal Soltys wrote: > > On 6/24/20 1:13 AM, Song Liu wrote: > >> On Tue, Jun 23, 2020 at 6:17 AM Michal Soltys <msoltyspl@yandex.pl> > >> wrote: > >> > >> Hmm.. this is weird, as I think I marked every instance of set_bit > >> MD_SB_CHANGE_PENDING. > >> Would you mind confirm those are to the other array with something like: > >> > >> diff --git i/drivers/md/md.c w/drivers/md/md.c > >> index dbbc8a50e2ed2..e91acfdcec032 100644 > >> --- i/drivers/md/md.c > >> +++ w/drivers/md/md.c > >> @@ -8480,7 +8480,7 @@ bool md_write_start(struct mddev *mddev, struct > >> bio *bi) > >> mddev->in_sync = 0; > >> set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); > >> set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); > >> - pr_info("%s set MD_SB_CHANGE_PENDING\n", > >> __func__); > >> + pr_info("%s: md: %s set > >> MD_SB_CHANGE_PENDING\n", __func__, mdname(mddev)); > >> md_wakeup_thread(mddev->thread); > >> did_change = 1; > >> } > >> > >> Thanks, > >> Song > >> > > > > dmesg attached > > - md127 - journal > > - md126 - the other raid > > - md125 - the problematic one > > So, what kind of next step after this ? Sorry for the delay. I read the log again, and found the following line caused this issue: [ +16.088243] r5l_write_super_and_discard_space set MD_SB_CHANGE_PENDING The attached patch should workaround this issue. Could you please give it a try? Thanks, song [-- Attachment #2: 0001-md-raid5-cache-clear-MD_SB_CHANGE_PENDING-before-flu.patch --] [-- Type: application/octet-stream, Size: 1432 bytes --] From ce310728ea5e8420e9f62f11e3969c2100af7c81 Mon Sep 17 00:00:00 2001 From: Song Liu <songliubraving@fb.com> Date: Mon, 6 Jul 2020 14:57:32 -0700 Subject: [PATCH] md/raid5-cache: clear MD_SB_CHANGE_PENDING before flushing data only stripes Signed-off-by: Song Liu <songliubraving@fb.com> --- drivers/md/raid5-cache.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 9b6da759dca25..0bea21d81697e 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -2430,10 +2430,15 @@ static void r5c_recovery_flush_data_only_stripes(struct r5l_log *log, struct mddev *mddev = log->rdev->mddev; struct r5conf *conf = mddev->private; struct stripe_head *sh, *next; + bool cleared_pending = false; if (ctx->data_only_stripes == 0) return; + if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) { + cleared_pending = true; + clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); + } log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_BACK; list_for_each_entry_safe(sh, next, &ctx->cached_list, lru) { @@ -2448,6 +2453,8 @@ static void r5c_recovery_flush_data_only_stripes(struct r5l_log *log, atomic_read(&conf->active_stripes) == 0); log->r5c_journal_mode = R5C_JOURNAL_MODE_WRITE_THROUGH; + if (cleared_pending) + set_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); } static int r5l_recovery_log(struct r5l_log *log) -- 2.24.1 ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-07-06 22:08 ` Song Liu @ 2020-07-08 11:29 ` Michal Soltys 2020-07-08 23:51 ` Song Liu 0 siblings, 1 reply; 46+ messages in thread From: Michal Soltys @ 2020-07-08 11:29 UTC (permalink / raw) To: Song Liu; +Cc: linux-raid On 7/7/20 12:08 AM, Song Liu wrote: >> >> So, what kind of next step after this ? > > Sorry for the delay. I read the log again, and found the following > line caused this issue: > > [ +16.088243] r5l_write_super_and_discard_space set MD_SB_CHANGE_PENDING > > The attached patch should workaround this issue. Could you please give it a try? Yea, this solved the issue - the raid assembled correctly (so the patch is probably a good candidate for lts kernels). Thanks for helping with this bug. Underlying filesystems are mountable/usable as well - albeit read-only fsck (ext4) or btrfs check do find some minor issues; tough to say at this point what was the exact culprit. In this particular case - imho - one issue remains: the assembly is slower than full resync (without bitmap), which outside of some performance gains (writeback journal) and write-hole fixing - kind of completely defeats the point of having such resync policy in the first place. dmesg -H | grep r5c_recovery_flush_log [ +13.550877] r5c_recovery_flush_log processing ctx->seq 860700000 [Jul 7 15:16] r5c_recovery_flush_log processing ctx->seq 860800000 [Jul 7 15:40] r5c_recovery_flush_log processing ctx->seq 860900000 ... [Jul 8 06:40] r5c_recovery_flush_log processing ctx->seq 866300000 [Jul 8 06:58] r5c_recovery_flush_log processing ctx->seq 866400000 [Jul 8 07:20] r5c_recovery_flush_log processing ctx->seq 866500000 During those periods when I was testing your patches, the machine has always been basically idle - no cpu/io/waits, or anything that could hamper it. The read process going from the journal device (ssds) was averaging 1-4 mb/s. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-07-08 11:29 ` Michal Soltys @ 2020-07-08 23:51 ` Song Liu 0 siblings, 0 replies; 46+ messages in thread From: Song Liu @ 2020-07-08 23:51 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid On Wed, Jul 8, 2020 at 4:29 AM Michal Soltys <msoltyspl@yandex.pl> wrote: > > On 7/7/20 12:08 AM, Song Liu wrote: > >> > >> So, what kind of next step after this ? > > > > Sorry for the delay. I read the log again, and found the following > > line caused this issue: > > > > [ +16.088243] r5l_write_super_and_discard_space set MD_SB_CHANGE_PENDING > > > > The attached patch should workaround this issue. Could you please give it a try? > > Yea, this solved the issue - the raid assembled correctly (so the patch > is probably a good candidate for lts kernels). Thanks for testing the patch. I will work on applying it to upstream. > > Thanks for helping with this bug. > > Underlying filesystems are mountable/usable as well - albeit read-only > fsck (ext4) or btrfs check do find some minor issues; tough to say at > this point what was the exact culprit. > > In this particular case - imho - one issue remains: the assembly is > slower than full resync (without bitmap), which outside of some > performance gains (writeback journal) and write-hole fixing - kind of > completely defeats the point of having such resync policy in the first > place. Totally agreed that we need to improve the recovery speed. I will also look into that. Thanks again! Song ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Assemblin journaled array fails 2020-05-14 11:07 ` Michal Soltys 2020-05-16 0:15 ` Song Liu @ 2020-06-26 8:28 ` Guoqing Jiang 1 sibling, 0 replies; 46+ messages in thread From: Guoqing Jiang @ 2020-06-26 8:28 UTC (permalink / raw) To: Michal Soltys, Song Liu; +Cc: linux-raid On 5/14/20 1:07 PM, Michal Soltys wrote: > On 5/13/20 6:17 PM, Song Liu wrote: >>>> >>>> Are these captured back to back? I am asking because they showed >>>> different >>>> "Events" number. >>> >>> Nah, they were captured between reboots. Back to back all event >>> fields show identical value (at 56291 now). >>> >>>> >>>> Also, when mdadm -A hangs, could you please capture /proc/$(pidof >>>> mdadm)/stack ? >>>> >>> >>> The output is empty: >>> >>> xs22:/☠ ps -eF fww | grep mdadm >>> root 10332 9362 97 740 1884 25 12:47 pts/1 R+ 6:59 | >>> \_ mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdi1 >>> /dev/sdg1 /dev/sdj1 /dev/sdh1 >>> xs22:/☠ cd /proc/10332 >>> xs22:/proc/10332☠ cat stack >>> xs22:/proc/10332☠ >> >> Hmm... Could you please share the strace output of "mdadm -A" >> command? Like >> >> strace mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/xxx ... >> >> Thanks, >> Song > > I did one more thing - ran cat on that stack in another terminal every > 0.1 seconds, this is what was found: > > cat: /proc//stack: No such file or directory > cat: /proc//stack: No such file or directory > cat: /proc//stack: No such file or directory > [<0>] submit_bio_wait+0x5b/0x80 > [<0>] r5l_recovery_read_page+0x1b6/0x200 [raid456] > [<0>] r5l_recovery_verify_data_checksum+0x19/0x70 [raid456] > [<0>] r5l_start+0x99e/0xe70 [raid456] > [<0>] md_start.part.48+0x2e/0x50 [md_mod] > [<0>] do_md_run+0x64/0x100 [md_mod] > [<0>] md_ioctl+0xe7d/0x17d0 [md_mod] > [<0>] blkdev_ioctl+0x4d0/0xa00 > [<0>] block_ioctl+0x39/0x40 > [<0>] do_vfs_ioctl+0xa4/0x630 > [<0>] ksys_ioctl+0x60/0x90 > [<0>] __x64_sys_ioctl+0x16/0x20 > [<0>] do_syscall_64+0x52/0x160 > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > cat: /proc//stack: No such file or directory > [<0>] submit_bio_wait+0x5b/0x80 Just FYI. I recalled that long sync IO could fire the warning in submit_bio_wait. Have you applied the commit de6a78b601c5 ("block: Prevent hung_check firing during long sync IO")? Thanks, Guoqing ^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2020-07-08 23:51 UTC | newest] Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-09 10:54 Assemblin journaled array fails Michal Soltys 2020-05-09 11:32 ` Michal Soltys 2020-05-09 13:15 ` antlists 2020-05-09 13:20 ` Michal Soltys 2020-05-09 13:32 ` Michal Soltys 2020-05-09 23:57 ` Michal Soltys 2020-05-11 11:11 ` Michal Soltys 2020-05-12 1:27 ` Song Liu 2020-05-13 10:58 ` Michal Soltys 2020-05-13 16:17 ` Song Liu 2020-05-14 11:07 ` Michal Soltys 2020-05-16 0:15 ` Song Liu [not found] ` <d0340d7b-6b3a-4fd3-e446-5f0967132ef6@yandex.pl> 2020-05-18 23:55 ` Song Liu 2020-05-20 12:08 ` Michal Soltys 2020-05-25 16:23 ` Michal Soltys 2020-05-26 23:37 ` Song Liu 2020-05-27 13:36 ` Michal Soltys 2020-05-28 10:18 ` Michal Soltys 2020-05-28 15:06 ` Song Liu 2020-05-28 15:36 ` Michal Soltys 2020-05-29 12:09 ` Michal Soltys 2020-05-29 21:03 ` Song Liu 2020-05-29 21:57 ` Song Liu 2020-06-03 10:12 ` Michal Soltys 2020-06-03 22:07 ` Song Liu 2020-06-04 13:58 ` Michal Soltys 2020-06-05 12:26 ` Michal Soltys 2020-06-09 9:36 ` Michal Soltys 2020-06-09 18:36 ` Song Liu 2020-06-09 21:51 ` Michal Soltys 2020-06-16 11:11 ` Michal Soltys 2020-06-17 0:48 ` Song Liu 2020-06-17 12:53 ` Michal Soltys 2020-06-17 17:11 ` Song Liu 2020-06-19 11:35 ` Michal Soltys 2020-06-20 0:14 ` Song Liu 2020-06-22 11:12 ` Michal Soltys 2020-06-22 16:37 ` Song Liu 2020-06-23 13:17 ` Michal Soltys 2020-06-23 23:13 ` Song Liu 2020-06-25 16:11 ` Michal Soltys 2020-07-06 8:07 ` Michal Soltys 2020-07-06 22:08 ` Song Liu 2020-07-08 11:29 ` Michal Soltys 2020-07-08 23:51 ` Song Liu 2020-06-26 8:28 ` Guoqing Jiang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.