All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel bug in async_xor_offs during RAID5 recovery
@ 2021-04-30 20:16 Oleksandr Shchirskyi
  2021-05-06 10:57 ` Xiao Ni
  0 siblings, 1 reply; 3+ messages in thread
From: Oleksandr Shchirskyi @ 2021-04-30 20:16 UTC (permalink / raw)
  To: linux-raid; +Cc: Xiao Ni

Hi,

We recently discovered a regression in the md-next kernel. During PPL 
recovery on RAID5 with IMSM metadata, a kernel bug appears. The issue is 
caused by patch ceaf2966ab08. W/o this patch the issue is not reproducible.

Steps to reproduce:

mdadm --create /dev/md/imsm0 --metadata=imsm --raid-devices=3 /dev/nvme0n1 
/dev/nvme1n1 /dev/nvme2n1 --run
mdadm --create /dev/md/md1 /dev/md/imsm0 --level=5 --chunk 64 
--size=2621440 --raid-devices=3  --run --consistency-policy=ppl --assume-clean
# here we write previously prepared data to md1 device
mdadm --set-faulty /dev/md/md1 /dev/nvme1n1
mdadm --stop --scan
echo 1 > /sys/block/nvme1n1/device/device/remove
# here using our debug tool we set dirty flag and corrupt data on /dev/nvme1n1
mdadm --assemble --scan

Reproducibility rate:

100%

End of test log:

# mdadm --assemble --scan
mdadm: Container /dev/md/imsm0 has been assembled with 2 drives
Killed

Kernel log:

[  818.308417] md: md127 stopped.
[  818.468051] md: md126 stopped.
[  818.474506] md/raid:md126: not clean -- starting background reconstruction
[  818.481419] md/raid:md126: device nvme0n1 operational as raid disk 1
[  818.487789] md/raid:md126: device nvme2n1 operational as raid disk 2
[  818.494572] md/raid:md126: starting dirty degraded array with PPL.
[  818.500781] md/raid:md126: raid level 5 active with 2 out of 3 devices, 
algorithm 0
[  818.510417] BUG: kernel NULL pointer dereference, address: 0000000000000004
[  818.517401] #PF: supervisor read access in kernel mode
[  818.522562] #PF: error_code(0x0000) - not-present page
[  818.527709] PGD 800000048b276067 P4D 800000048b276067 PUD 48d42b067 PMD 0
[  818.534595] Oops: 0000 [#1] SMP PTI
[  818.538107] CPU: 45 PID: 8989 Comm: mdadm Tainted: G S 
5.12.0-rc3-20210428.intel.6363061+ #1
[  818.547862] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS 
SE5C620.86B.00.01.0016.020120190930 02/01/2019
[  818.558312] RIP: 0010:async_xor_offs+0x3be/0x5c0 [async_xor]
[  818.563989] Code: 7d ff 31 d2 31 ed eb 03 48 89 c2 49 8b 04 d7 48 85 c0 
74 27 48 01 f0 4d 89 c1 48 c1 f8 06 48 c1 e0 0c 48 01 c8 4d 85 e4 74 04 
<45> 8b 0c 94 4c 63 d5 4c 01 c8 83 c5 01 4b 89 04 d6 48 8d 42 01 48
[  818.582780] RSP: 0018:ffffc900041ab800 EFLAGS: 00010202
[  818.588019] RAX: ffff888484baf000 RBX: ffff88848ddd8000 RCX: 
ffff888000000000
[  818.595165] RDX: 0000000000000000 RSI: 0000160000000000 RDI: 
0000000000000000
[  818.602315] RBP: 0000000000000000 R08: 0000000000000000 R09: 
0000000000000000
[  818.609462] R10: 0000000000000000 R11: ffff88886fd68984 R12: 
0000000000000004
[  818.616612] R13: ffffc900041ab8a8 R14: ffffc900041ab8d8 R15: 
ffffc900041ab8d8
[  818.623774] FS:  00007f00978c08c0(0000) GS:ffff88886fd40000(0000) 
knlGS:0000000000000000
[  818.631885] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  818.637654] CR2: 0000000000000004 CR3: 0000000483766005 CR4: 
00000000007706e0
[  818.644804] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  818.651957] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[  818.659643] PKRU: 55555554
[  818.662904] Call Trace:
[  818.665905]  ? sync_page_io+0x11a/0x190 [md_mod]
[  818.671077]  async_xor+0x1c/0x20 [async_xor]
[  818.675882]  ppl_xor+0x6e/0xa0 [raid456]
[  818.680338]  ppl_recover_entry.isra.14+0x2a3/0x750 [raid456]
[  818.686541]  ? lock_timer_base+0x67/0x80
[  818.691006]  ? try_to_del_timer_sync+0x4d/0x80
[  818.695986]  ? del_timer_sync+0x25/0x40
[  818.700366]  ? schedule_timeout+0x195/0x2a0
[  818.705095]  ? chksum_update+0x13/0x20
[  818.709394]  ? crc32c+0x31/0x60
[  818.713068]  ? io_schedule_timeout+0x19/0x40
[  818.717866]  ? wait_for_completion_io_timeout+0xa4/0xf0
[  818.723617]  ? submit_bio_wait+0x80/0xb0
[  818.728065]  ? sync_page_io+0x11a/0x190 [md_mod]
[  818.733201]  ? submit_bio_wait+0xb0/0xb0
[  818.737652]  ? ppl_init_log+0xa17/0xb10 [raid456]
[  818.742890]  ppl_init_log+0xa17/0xb10 [raid456]
[  818.747958]  raid5_run.cold.60+0x4d1/0x5c2 [raid456]
[  818.753456]  md_run+0x488/0xa60 [md_mod]
[  818.757909]  ? blkdev_direct_IO+0x30c/0x4c0
[  818.762614]  do_md_run+0x14/0xf0 [md_mod]
[  818.767148]  array_state_store+0x2ce/0x320 [md_mod]
[  818.772553]  md_attr_store+0x6c/0xb0 [md_mod]
[  818.777447]  kernfs_fop_write_iter+0x11c/0x1b0
[  818.782416]  new_sync_write+0x119/0x1a0
[  818.786768]  vfs_write+0x181/0x250
[  818.790678]  ksys_write+0x59/0xd0
[  818.794501]  do_syscall_64+0x2d/0x40
[  818.798573]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  818.804118] RIP: 0033:0x7f0096f408a8
[  818.808182] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 
f3 0f 1e fa 48 8d 05 b5 4c 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 
<48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[  818.827933] RSP: 002b:00007fff08dae3d8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
[  818.835994] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 
00007f0096f408a8
[  818.843606] RDX: 0000000000000008 RSI: 0000000000466722 RDI: 
0000000000000007
[  818.851210] RBP: 0000000000466722 R08: 0000000000467670 R09: 
00007f0096fd0d40
[  818.858811] R10: 0000000000000000 R11: 0000000000000246 R12: 
0000000001d5bf00
[  818.866400] R13: 0000000000000000 R14: 0000000000000002 R15: 
0000000000000002
[  818.873992] Modules linked in: raid456 async_raid6_recov async_memcpy 
async_pq async_xor xor async_tx raid6_pq md_mod nvme nvme_core
[  818.886871] CR2: 0000000000000004

Regards,
Oleksandr Shchirskyi

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel bug in async_xor_offs during RAID5 recovery
  2021-04-30 20:16 Kernel bug in async_xor_offs during RAID5 recovery Oleksandr Shchirskyi
@ 2021-05-06 10:57 ` Xiao Ni
  2021-05-11 13:13   ` Tkaczyk, Mariusz
  0 siblings, 1 reply; 3+ messages in thread
From: Xiao Ni @ 2021-05-06 10:57 UTC (permalink / raw)
  To: Oleksandr Shchirskyi, linux-raid

Hi Oleksandr Shchirskyi

Can this only happen with PPL, imsm and nvme disks? My machine doesn't 
support creating raid device with nvme devices.
And rotational disks don't have /sys/block/nvme1n1/device/device/remove. 
What's the meaning about setting 1
to the remove file?

I tried to create the imsm raid with rotational disks and ppl. Then 
remove and add disk to trigger recovery. It works
well.

Regards
Xiao

On 05/01/2021 04:16 AM, Oleksandr Shchirskyi wrote:
> Hi,
>
> We recently discovered a regression in the md-next kernel. During PPL 
> recovery on RAID5 with IMSM metadata, a kernel bug appears. The issue 
> is caused by patch ceaf2966ab08. W/o this patch the issue is not 
> reproducible.
>
> Steps to reproduce:
>
> mdadm --create /dev/md/imsm0 --metadata=imsm --raid-devices=3 
> /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 --run
> mdadm --create /dev/md/md1 /dev/md/imsm0 --level=5 --chunk 64 
> --size=2621440 --raid-devices=3  --run --consistency-policy=ppl 
> --assume-clean
> # here we write previously prepared data to md1 device
> mdadm --set-faulty /dev/md/md1 /dev/nvme1n1
> mdadm --stop --scan
> echo 1 > /sys/block/nvme1n1/device/device/remove
> # here using our debug tool we set dirty flag and corrupt data on 
> /dev/nvme1n1
> mdadm --assemble --scan
>
> Reproducibility rate:
>
> 100%
>
> End of test log:
>
> # mdadm --assemble --scan
> mdadm: Container /dev/md/imsm0 has been assembled with 2 drives
> Killed
>
> Kernel log:
>
> [  818.308417] md: md127 stopped.
> [  818.468051] md: md126 stopped.
> [  818.474506] md/raid:md126: not clean -- starting background 
> reconstruction
> [  818.481419] md/raid:md126: device nvme0n1 operational as raid disk 1
> [  818.487789] md/raid:md126: device nvme2n1 operational as raid disk 2
> [  818.494572] md/raid:md126: starting dirty degraded array with PPL.
> [  818.500781] md/raid:md126: raid level 5 active with 2 out of 3 
> devices, algorithm 0
> [  818.510417] BUG: kernel NULL pointer dereference, address: 
> 0000000000000004
> [  818.517401] #PF: supervisor read access in kernel mode
> [  818.522562] #PF: error_code(0x0000) - not-present page
> [  818.527709] PGD 800000048b276067 P4D 800000048b276067 PUD 48d42b067 
> PMD 0
> [  818.534595] Oops: 0000 [#1] SMP PTI
> [  818.538107] CPU: 45 PID: 8989 Comm: mdadm Tainted: G S 
> 5.12.0-rc3-20210428.intel.6363061+ #1
> [  818.547862] Hardware name: Intel Corporation S2600WFT/S2600WFT, 
> BIOS SE5C620.86B.00.01.0016.020120190930 02/01/2019
> [  818.558312] RIP: 0010:async_xor_offs+0x3be/0x5c0 [async_xor]
> [  818.563989] Code: 7d ff 31 d2 31 ed eb 03 48 89 c2 49 8b 04 d7 48 
> 85 c0 74 27 48 01 f0 4d 89 c1 48 c1 f8 06 48 c1 e0 0c 48 01 c8 4d 85 
> e4 74 04 <45> 8b 0c 94 4c 63 d5 4c 01 c8 83 c5 01 4b 89 04 d6 48 8d 42 
> 01 48
> [  818.582780] RSP: 0018:ffffc900041ab800 EFLAGS: 00010202
> [  818.588019] RAX: ffff888484baf000 RBX: ffff88848ddd8000 RCX: 
> ffff888000000000
> [  818.595165] RDX: 0000000000000000 RSI: 0000160000000000 RDI: 
> 0000000000000000
> [  818.602315] RBP: 0000000000000000 R08: 0000000000000000 R09: 
> 0000000000000000
> [  818.609462] R10: 0000000000000000 R11: ffff88886fd68984 R12: 
> 0000000000000004
> [  818.616612] R13: ffffc900041ab8a8 R14: ffffc900041ab8d8 R15: 
> ffffc900041ab8d8
> [  818.623774] FS:  00007f00978c08c0(0000) GS:ffff88886fd40000(0000) 
> knlGS:0000000000000000
> [  818.631885] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  818.637654] CR2: 0000000000000004 CR3: 0000000483766005 CR4: 
> 00000000007706e0
> [  818.644804] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [  818.651957] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000400
> [  818.659643] PKRU: 55555554
> [  818.662904] Call Trace:
> [  818.665905]  ? sync_page_io+0x11a/0x190 [md_mod]
> [  818.671077]  async_xor+0x1c/0x20 [async_xor]
> [  818.675882]  ppl_xor+0x6e/0xa0 [raid456]
> [  818.680338]  ppl_recover_entry.isra.14+0x2a3/0x750 [raid456]
> [  818.686541]  ? lock_timer_base+0x67/0x80
> [  818.691006]  ? try_to_del_timer_sync+0x4d/0x80
> [  818.695986]  ? del_timer_sync+0x25/0x40
> [  818.700366]  ? schedule_timeout+0x195/0x2a0
> [  818.705095]  ? chksum_update+0x13/0x20
> [  818.709394]  ? crc32c+0x31/0x60
> [  818.713068]  ? io_schedule_timeout+0x19/0x40
> [  818.717866]  ? wait_for_completion_io_timeout+0xa4/0xf0
> [  818.723617]  ? submit_bio_wait+0x80/0xb0
> [  818.728065]  ? sync_page_io+0x11a/0x190 [md_mod]
> [  818.733201]  ? submit_bio_wait+0xb0/0xb0
> [  818.737652]  ? ppl_init_log+0xa17/0xb10 [raid456]
> [  818.742890]  ppl_init_log+0xa17/0xb10 [raid456]
> [  818.747958]  raid5_run.cold.60+0x4d1/0x5c2 [raid456]
> [  818.753456]  md_run+0x488/0xa60 [md_mod]
> [  818.757909]  ? blkdev_direct_IO+0x30c/0x4c0
> [  818.762614]  do_md_run+0x14/0xf0 [md_mod]
> [  818.767148]  array_state_store+0x2ce/0x320 [md_mod]
> [  818.772553]  md_attr_store+0x6c/0xb0 [md_mod]
> [  818.777447]  kernfs_fop_write_iter+0x11c/0x1b0
> [  818.782416]  new_sync_write+0x119/0x1a0
> [  818.786768]  vfs_write+0x181/0x250
> [  818.790678]  ksys_write+0x59/0xd0
> [  818.794501]  do_syscall_64+0x2d/0x40
> [  818.798573]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  818.804118] RIP: 0033:0x7f0096f408a8
> [  818.808182] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 
> 00 00 f3 0f 1e fa 48 8d 05 b5 4c 2d 00 8b 00 85 c0 75 17 b8 01 00 00 
> 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 
> d4 55
> [  818.827933] RSP: 002b:00007fff08dae3d8 EFLAGS: 00000246 ORIG_RAX: 
> 0000000000000001
> [  818.835994] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 
> 00007f0096f408a8
> [  818.843606] RDX: 0000000000000008 RSI: 0000000000466722 RDI: 
> 0000000000000007
> [  818.851210] RBP: 0000000000466722 R08: 0000000000467670 R09: 
> 00007f0096fd0d40
> [  818.858811] R10: 0000000000000000 R11: 0000000000000246 R12: 
> 0000000001d5bf00
> [  818.866400] R13: 0000000000000000 R14: 0000000000000002 R15: 
> 0000000000000002
> [  818.873992] Modules linked in: raid456 async_raid6_recov 
> async_memcpy async_pq async_xor xor async_tx raid6_pq md_mod nvme 
> nvme_core
> [  818.886871] CR2: 0000000000000004
>
> Regards,
> Oleksandr Shchirskyi
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel bug in async_xor_offs during RAID5 recovery
  2021-05-06 10:57 ` Xiao Ni
@ 2021-05-11 13:13   ` Tkaczyk, Mariusz
  0 siblings, 0 replies; 3+ messages in thread
From: Tkaczyk, Mariusz @ 2021-05-11 13:13 UTC (permalink / raw)
  To: Xiao Ni, Oleksandr Shchirskyi, linux-raid


Hi Xiao,
I simplified the scenario.

On 06.05.2021 12:57, Xiao Ni wrote:
> Hi Oleksandr Shchirskyi
> 
> Can this only happen with PPL, imsm and nvme disks? My machine doesn't support 
> creating raid device with nvme devices.
Could you try to create array with IMSM_NO_PLATFORM=1?

> And rotational disks don't have /sys/block/nvme1n1/device/device/remove. What's 
> the meaning about setting 1
> to the remove file?
> 
> I tried to create the imsm raid with rotational disks and ppl. Then remove and 
> add disk to trigger recovery. It works
> well.
I verified that drive removal is not crucial here.
The main trick here is to impose the PPL recovery. I did that by following
scenario:
1. Create array (I was able to reproduce it with 1Gb size):
#mdadm -CR imsm -e imsm -n3 /dev/nvme[456]n1
#mdadm -CR vol2 -l5 -n3 /dev/nvme[456]n1 -z 1G -c64 --assume-clean 
--consistency-policy=ppl

2. Get mdmon pid:
# ps -ef | grep mdmon

3. Write data to the drive, and kill mdmon after that (this will force
array to stay dirty):
#dd of=/dev/md126 if=/dev/urandom bs=4M oflag=direct; kill -9 {mdmon_pid}

4. Stop Array:
# mdadm -Ss

5. Start Array:
# mdadm -As

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-05-11 13:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30 20:16 Kernel bug in async_xor_offs during RAID5 recovery Oleksandr Shchirskyi
2021-05-06 10:57 ` Xiao Ni
2021-05-11 13:13   ` Tkaczyk, Mariusz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.