All of lore.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: double fault in md_end_io
@ 2021-04-09 21:40 Paweł Wiejacha
  2021-04-12  6:48 ` Song Liu
  2021-04-23  2:36 ` Guoqing Jiang
  0 siblings, 2 replies; 13+ messages in thread
From: Paweł Wiejacha @ 2021-04-09 21:40 UTC (permalink / raw)
  To: song; +Cc: linux-raid, artur.paszkiewicz

Hello,

Two of my machines constantly crash with a double fault like this:

1146  <0>[33685.629591] traps: PANIC: double fault, error_code: 0x0
1147  <4>[33685.629593] double fault: 0000 [#1] SMP NOPTI
1148  <4>[33685.629594] CPU: 10 PID: 2118287 Comm: kworker/10:0
Tainted: P           OE     5.11.8-051108-generic #202103200636
1149  <4>[33685.629595] Hardware name: ASUSTeK COMPUTER INC. KRPG-U8
Series/KRPG-U8 Series, BIOS 4201 09/25/2020
1150  <4>[33685.629595] Workqueue: xfs-conv/md12 xfs_end_io [xfs]
1151  <4>[33685.629596] RIP: 0010:__slab_free+0x23/0x340
1152  <4>[33685.629597] Code: 4c fe ff ff 0f 1f 00 0f 1f 44 00 00 55
48 89 e5 41 57 49 89 cf 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83
e4 f0 48 83 ec 70 <48> 89 54 24 28 0f 1f 44 00 00 41 8b 46 28 4d 8b 6c
24 20 49 8b 5c
1153  <4>[33685.629598] RSP: 0018:ffffa9bc00848fa0 EFLAGS: 00010086
1154  <4>[33685.629599] RAX: ffff94c04d8b10a0 RBX: ffff94437a34a880
RCX: ffff94437a34a880
1155  <4>[33685.629599] RDX: ffff94437a34a880 RSI: ffffcec745e8d280
RDI: ffff944300043b00
1156  <4>[33685.629599] RBP: ffffa9bc00849040 R08: 0000000000000001
R09: ffffffff82a5d6de
1157  <4>[33685.629600] R10: 0000000000000001 R11: 000000009c109000
R12: ffffcec745e8d280
1158  <4>[33685.629600] R13: ffff944300043b00 R14: ffff944300043b00
R15: ffff94437a34a880
1159  <4>[33685.629601] FS:  0000000000000000(0000)
GS:ffff94c04d880000(0000) knlGS:0000000000000000
1160  <4>[33685.629601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
1161  <4>[33685.629602] CR2: ffffa9bc00848f98 CR3: 000000014d04e000
CR4: 0000000000350ee0
1162  <4>[33685.629602] Call Trace:
1163  <4>[33685.629603]  <IRQ>
1164  <4>[33685.629603]  ? kfree+0x3bc/0x3e0
1165  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
1166  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
1167  <4>[33685.629604]  ? mempool_free+0x2f/0x80
1168  <4>[33685.629604]  ? md_end_io+0x4a/0x70
1169  <4>[33685.629604]  ? bio_endio+0xdc/0x130
1170  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
1171  <4>[33685.629605]  ? md_end_io+0x5c/0x70
1172  <4>[33685.629605]  ? bio_endio+0xdc/0x130
1173  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
1174  <4>[33685.629606]  ? md_end_io+0x5c/0x70
1175  <4>[33685.629606]  ? bio_endio+0xdc/0x130
1176  <4>[33685.629606]  ? bio_chain_endio+0x2d/0x40
1177  <4>[33685.629607]  ? md_end_io+0x5c/0x70
... repeated ...
1436  <4>[33685.629677]  ? bio_endio+0xdc/0x130
1437  <4>[33685.629677]  ? bio_chain_endio+0x2d/0x40
1438  <4>[33685.629677]  ? md_end_io+0x5c/0x70
1439  <4>[33685.629677]  ? bio_endio+0xdc/0x130
1440  <4>[33685.629678]  ? bio_chain_endio+0x2d/0x40
1441  <4>[33685.629678]  ? md_
1442  <4>[33685.629679] Lost 357 message(s)!

This happens on:
5.11.8-051108-generic #202103200636 SMP Sat Mar 20 11:17:32 UTC 2021
and on 5.8.0-44-generic #50~20.04.1-Ubuntu
(https://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_5.8.0-44.50/changelog)
which contains backported
https://github.com/torvalds/linux/commit/41d2d848e5c09209bdb57ff9c0ca34075e22783d
("md: improve io stats accounting").
The 5.8.18-050818-generic #202011011237 SMP Sun Nov 1 12:40:15 UTC
2020 which does not contain above suspected change does not crash.

If there's a better way/place to report this bug just let me know. If
not, here are steps to reproduce:

1. Create a RAID 0 device using three Micron_9300_MTFDHAL7T6TDP disks.
mdadm --create --verbose /dev/md12 --level=stripe --raid-devices=3
/dev/nvme0n1p1 /dev/nvme1n1p1 /dev/nvme2n1p1

2. Setup xfs on it:
mkfs.xfs /dev/md12 and mount it

3. Write to a file on this filesystem:
while true; do rm -rf /mnt/md12/crash* ; for i in `seq 8`; do dd
if=/dev/zero of=/mnt/md12/crash$i bs=32K count=50000000 & done; wait;
done
Wait for a crash (usually less than 20 min).

I couldn't reproduce it with a single dd process (maybe I have to wait
a little longer), but a single cat
/very/large/file/on/cephfs/over100GbE > /mnt/md12/crash is enough for
this double fault to occur.

More info:
This long mempool_kfree - md_end_io - *  -md_end_io stack trace looks
always the same, but the panic occurs in different places:

pstore/6948115143318/dmesg.txt-<4>[545649.087998] CPU: 88 PID: 0 Comm:
swapper/88 Tainted: P           OE     5.11.8-051108-generic
#202103200636
pstore/6948377398316/dmesg.txt-<4>[11275.914909] CPU: 14 PID: 0 Comm:
swapper/14 Tainted: P           OE     5.11.8-051108-generic
#202103200636
pstore/6948532816002/dmesg.txt-<4>[33685.629594] CPU: 10 PID: 2118287
Comm: kworker/10:0 Tainted: P           OE     5.11.8-051108-generic
#202103200636
pstore/6948532816002/dmesg.txt-<4>[33685.629595] Workqueue:
xfs-conv/md12 xfs_end_io [xfs]
pstore/6948855849083/dmesg.txt-<4>[42934.321129] CPU: 85 PID: 0 Comm:
swapper/85 Tainted: P           OE     5.11.8-051108-generic
#202103200636
pstore/6948876331782/dmesg.txt-<4>[ 3475.020672] CPU: 86 PID: 0 Comm:
swapper/86 Tainted: P           OE     5.11.8-051108-generic
#202103200636
pstore/6949083860307/dmesg.txt-<4>[43048.254375] CPU: 45 PID: 0 Comm:
swapper/45 Tainted: P           OE     5.11.8-051108-generic
#202103200636
pstore/6949091775931/dmesg.txt-<4>[ 1150.790240] CPU: 64 PID: 0 Comm:
swapper/64 Tainted: P           OE     5.11.8-051108-generic
#202103200636
pstore/6949123356826/dmesg.txt-<4>[ 6963.858253] CPU: 6 PID: 51 Comm:
kworker/6:0 Tainted: P           OE     5.11.8-051108-generic
#202103200636
pstore/6949123356826/dmesg.txt-<4>[ 6963.858255] Workqueue: ceph-msgr
ceph_con_workfn [libceph]
pstore/6949123356826/dmesg.txt-<4>[ 6963.858253] CPU: 6 PID: 51 Comm:
kworker/6:0 Tainted: P           OE     5.11.8-051108-generic
#202103200636
pstore/6949123356826/dmesg.txt-<4>[ 6963.858255] Workqueue: ceph-msgr
ceph_con_workfn [libceph]
pstore/6949152322085/dmesg.txt-<4>[ 6437.077874] CPU: 59 PID: 0 Comm:
swapper/59 Tainted: P           OE     5.11.8-051108-generic
#202103200636

cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.11.8-051108-generic
root=/dev/mapper/ubuntu--vg-ubuntu--lv ro net.ifnames=0 biosdevname=0
strict-devmem=0 mitigations=off iommu=pt

cat /proc/cpuinfo
model name      : AMD EPYC 7552 48-Core Processor

cat /proc/mounts
/dev/md12 /mnt/ssd1 xfs
rw,noatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=1024,swidth=3072,prjquota
0 0

Let me know if you need more information.

Best regards,
Paweł Wiejacha

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: double fault in md_end_io
  2021-04-09 21:40 PROBLEM: double fault in md_end_io Paweł Wiejacha
@ 2021-04-12  6:48 ` Song Liu
  2021-04-13 12:05   ` Paweł Wiejacha
  2021-04-23  2:36 ` Guoqing Jiang
  1 sibling, 1 reply; 13+ messages in thread
From: Song Liu @ 2021-04-12  6:48 UTC (permalink / raw)
  To: Paweł Wiejacha; +Cc: linux-raid, Artur Paszkiewicz

On Fri, Apr 9, 2021 at 2:41 PM Paweł Wiejacha
<pawel.wiejacha@rtbhouse.com> wrote:
>
> Hello,
>
> Two of my machines constantly crash with a double fault like this:
>
> 1146  <0>[33685.629591] traps: PANIC: double fault, error_code: 0x0
> 1147  <4>[33685.629593] double fault: 0000 [#1] SMP NOPTI
> 1148  <4>[33685.629594] CPU: 10 PID: 2118287 Comm: kworker/10:0
> Tainted: P           OE     5.11.8-051108-generic #202103200636
> 1149  <4>[33685.629595] Hardware name: ASUSTeK COMPUTER INC. KRPG-U8
> Series/KRPG-U8 Series, BIOS 4201 09/25/2020
> 1150  <4>[33685.629595] Workqueue: xfs-conv/md12 xfs_end_io [xfs]
> 1151  <4>[33685.629596] RIP: 0010:__slab_free+0x23/0x340
> 1152  <4>[33685.629597] Code: 4c fe ff ff 0f 1f 00 0f 1f 44 00 00 55
> 48 89 e5 41 57 49 89 cf 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83
> e4 f0 48 83 ec 70 <48> 89 54 24 28 0f 1f 44 00 00 41 8b 46 28 4d 8b 6c
> 24 20 49 8b 5c
> 1153  <4>[33685.629598] RSP: 0018:ffffa9bc00848fa0 EFLAGS: 00010086
> 1154  <4>[33685.629599] RAX: ffff94c04d8b10a0 RBX: ffff94437a34a880
> RCX: ffff94437a34a880
> 1155  <4>[33685.629599] RDX: ffff94437a34a880 RSI: ffffcec745e8d280
> RDI: ffff944300043b00
> 1156  <4>[33685.629599] RBP: ffffa9bc00849040 R08: 0000000000000001
> R09: ffffffff82a5d6de
> 1157  <4>[33685.629600] R10: 0000000000000001 R11: 000000009c109000
> R12: ffffcec745e8d280
> 1158  <4>[33685.629600] R13: ffff944300043b00 R14: ffff944300043b00
> R15: ffff94437a34a880
> 1159  <4>[33685.629601] FS:  0000000000000000(0000)
> GS:ffff94c04d880000(0000) knlGS:0000000000000000
> 1160  <4>[33685.629601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 1161  <4>[33685.629602] CR2: ffffa9bc00848f98 CR3: 000000014d04e000
> CR4: 0000000000350ee0
> 1162  <4>[33685.629602] Call Trace:
> 1163  <4>[33685.629603]  <IRQ>
> 1164  <4>[33685.629603]  ? kfree+0x3bc/0x3e0
> 1165  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
> 1166  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
> 1167  <4>[33685.629604]  ? mempool_free+0x2f/0x80
> 1168  <4>[33685.629604]  ? md_end_io+0x4a/0x70
> 1169  <4>[33685.629604]  ? bio_endio+0xdc/0x130
> 1170  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
> 1171  <4>[33685.629605]  ? md_end_io+0x5c/0x70
> 1172  <4>[33685.629605]  ? bio_endio+0xdc/0x130
> 1173  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
> 1174  <4>[33685.629606]  ? md_end_io+0x5c/0x70
> 1175  <4>[33685.629606]  ? bio_endio+0xdc/0x130
> 1176  <4>[33685.629606]  ? bio_chain_endio+0x2d/0x40
> 1177  <4>[33685.629607]  ? md_end_io+0x5c/0x70
> ... repeated ...
> 1436  <4>[33685.629677]  ? bio_endio+0xdc/0x130
> 1437  <4>[33685.629677]  ? bio_chain_endio+0x2d/0x40
> 1438  <4>[33685.629677]  ? md_end_io+0x5c/0x70
> 1439  <4>[33685.629677]  ? bio_endio+0xdc/0x130
> 1440  <4>[33685.629678]  ? bio_chain_endio+0x2d/0x40
> 1441  <4>[33685.629678]  ? md_
> 1442  <4>[33685.629679] Lost 357 message(s)!
>
> This happens on:
> 5.11.8-051108-generic #202103200636 SMP Sat Mar 20 11:17:32 UTC 2021
> and on 5.8.0-44-generic #50~20.04.1-Ubuntu
> (https://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_5.8.0-44.50/changelog)
> which contains backported
> https://github.com/torvalds/linux/commit/41d2d848e5c09209bdb57ff9c0ca34075e22783d
> ("md: improve io stats accounting").
> The 5.8.18-050818-generic #202011011237 SMP Sun Nov 1 12:40:15 UTC
> 2020 which does not contain above suspected change does not crash.
>
> If there's a better way/place to report this bug just let me know. If
> not, here are steps to reproduce:
>
> 1. Create a RAID 0 device using three Micron_9300_MTFDHAL7T6TDP disks.
> mdadm --create --verbose /dev/md12 --level=stripe --raid-devices=3
> /dev/nvme0n1p1 /dev/nvme1n1p1 /dev/nvme2n1p1
>
> 2. Setup xfs on it:
> mkfs.xfs /dev/md12 and mount it
>
> 3. Write to a file on this filesystem:
> while true; do rm -rf /mnt/md12/crash* ; for i in `seq 8`; do dd
> if=/dev/zero of=/mnt/md12/crash$i bs=32K count=50000000 & done; wait;
> done
> Wait for a crash (usually less than 20 min).
>
> I couldn't reproduce it with a single dd process (maybe I have to wait
> a little longer), but a single cat
> /very/large/file/on/cephfs/over100GbE > /mnt/md12/crash is enough for
> this double fault to occur.
>
> More info:
> This long mempool_kfree - md_end_io - *  -md_end_io stack trace looks
> always the same, but the panic occurs in different places:
>
> pstore/6948115143318/dmesg.txt-<4>[545649.087998] CPU: 88 PID: 0 Comm:
> swapper/88 Tainted: P           OE     5.11.8-051108-generic
> #202103200636
> pstore/6948377398316/dmesg.txt-<4>[11275.914909] CPU: 14 PID: 0 Comm:
> swapper/14 Tainted: P           OE     5.11.8-051108-generic
> #202103200636
> pstore/6948532816002/dmesg.txt-<4>[33685.629594] CPU: 10 PID: 2118287
> Comm: kworker/10:0 Tainted: P           OE     5.11.8-051108-generic
> #202103200636
> pstore/6948532816002/dmesg.txt-<4>[33685.629595] Workqueue:
> xfs-conv/md12 xfs_end_io [xfs]
> pstore/6948855849083/dmesg.txt-<4>[42934.321129] CPU: 85 PID: 0 Comm:
> swapper/85 Tainted: P           OE     5.11.8-051108-generic
> #202103200636
> pstore/6948876331782/dmesg.txt-<4>[ 3475.020672] CPU: 86 PID: 0 Comm:
> swapper/86 Tainted: P           OE     5.11.8-051108-generic
> #202103200636
> pstore/6949083860307/dmesg.txt-<4>[43048.254375] CPU: 45 PID: 0 Comm:
> swapper/45 Tainted: P           OE     5.11.8-051108-generic
> #202103200636
> pstore/6949091775931/dmesg.txt-<4>[ 1150.790240] CPU: 64 PID: 0 Comm:
> swapper/64 Tainted: P           OE     5.11.8-051108-generic
> #202103200636
> pstore/6949123356826/dmesg.txt-<4>[ 6963.858253] CPU: 6 PID: 51 Comm:
> kworker/6:0 Tainted: P           OE     5.11.8-051108-generic
> #202103200636
> pstore/6949123356826/dmesg.txt-<4>[ 6963.858255] Workqueue: ceph-msgr
> ceph_con_workfn [libceph]
> pstore/6949123356826/dmesg.txt-<4>[ 6963.858253] CPU: 6 PID: 51 Comm:
> kworker/6:0 Tainted: P           OE     5.11.8-051108-generic
> #202103200636
> pstore/6949123356826/dmesg.txt-<4>[ 6963.858255] Workqueue: ceph-msgr
> ceph_con_workfn [libceph]
> pstore/6949152322085/dmesg.txt-<4>[ 6437.077874] CPU: 59 PID: 0 Comm:
> swapper/59 Tainted: P           OE     5.11.8-051108-generic
> #202103200636
>
> cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-5.11.8-051108-generic
> root=/dev/mapper/ubuntu--vg-ubuntu--lv ro net.ifnames=0 biosdevname=0
> strict-devmem=0 mitigations=off iommu=pt
>
> cat /proc/cpuinfo
> model name      : AMD EPYC 7552 48-Core Processor
>
> cat /proc/mounts
> /dev/md12 /mnt/ssd1 xfs
> rw,noatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=1024,swidth=3072,prjquota
> 0 0
>
> Let me know if you need more information.

Hi Pawel,

Thanks for the report. Could you please try whether the following
change fixes the issue?

Song

diff --git i/drivers/md/md.c w/drivers/md/md.c
index bb09070a63bcf..889e9440a9f38 100644
--- i/drivers/md/md.c
+++ w/drivers/md/md.c
@@ -5648,7 +5648,7 @@ static int md_alloc(dev_t dev, char *name)
                 */
                mddev->hold_active = UNTIL_STOP;

-       error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
+       error = mempool_init_slab_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
                                          sizeof(struct md_io));
        if (error)
                goto abort;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: PROBLEM: double fault in md_end_io
  2021-04-12  6:48 ` Song Liu
@ 2021-04-13 12:05   ` Paweł Wiejacha
  2021-04-15  0:36     ` Song Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Paweł Wiejacha @ 2021-04-13 12:05 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid, Artur Paszkiewicz

Hello Song,

That code does not compile, but I guess that what you meant was
something like this:

diff --git drivers/md/md.c drivers/md/md.c
index 04384452a..cbc97a96b 100644
--- drivers/md/md.c
+++ drivers/md/md.c
@@ -78,6 +78,7 @@ static DEFINE_SPINLOCK(pers_lock);

 static struct kobj_type md_ktype;

+struct kmem_cache *md_io_cache;
 struct md_cluster_operations *md_cluster_ops;
 EXPORT_SYMBOL(md_cluster_ops);
 static struct module *md_cluster_mod;
@@ -5701,8 +5702,8 @@ static int md_alloc(dev_t dev, char *name)
         */
        mddev->hold_active = UNTIL_STOP;

-   error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
-                     sizeof(struct md_io));
+   error = mempool_init_slab_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
+                     md_io_cache);
    if (error)
        goto abort;

@@ -9542,6 +9543,10 @@ static int __init md_init(void)
 {
    int ret = -ENOMEM;

+   md_io_cache = KMEM_CACHE(md_io, 0);
+   if (!md_io_cache)
+       goto err_md_io_cache;
+
    md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0);
    if (!md_wq)
        goto err_wq;
@@ -9578,6 +9583,8 @@ static int __init md_init(void)
 err_misc_wq:
    destroy_workqueue(md_wq);
 err_wq:
+   kmem_cache_destroy(md_io_cache);
+err_md_io_cache:
    return ret;
 }

@@ -9863,6 +9870,7 @@ static __exit void md_exit(void)
    destroy_workqueue(md_rdev_misc_wq);
    destroy_workqueue(md_misc_wq);
    destroy_workqueue(md_wq);
+   kmem_cache_destroy(md_io_cache);
 }

 subsys_initcall(md_init);

Unfortunately, now it crashes with:

<0>[  625.159576] traps: PANIC: double fault, error_code: 0x0
<4>[  625.159578] double fault: 0000 [#1] SMP NOPTI
<4>[  625.159579] CPU: 32 PID: 0 Comm: swapper/32 Tainted: P
OE     5.11.8-pw-fix1 #1
<4>[  625.159579] Hardware name: empty S8030GM2NE/S8030GM2NE, BIOS
V4.00 03/11/2021
<4>[  625.159580] RIP: 0010:__slab_free+0x26/0x380
<4>[  625.159580] Code: 1f 44 00 00 0f 1f 44 00 00 55 49 89 ca 45 89
c3 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83 e4 f0
48 83 ec 70 <48> 89 54 24 28 0f 1f 44 00 00 41 8b 46 28 4d 8b 6c 24 20
49 8b 5c
<4>[  625.159581] RSP: 0018:ffffb11c4d028fc0 EFLAGS: 00010086
<4>[  625.159582] RAX: ffff9a38cfc34fd0 RBX: ffff9a1a1360b870 RCX:
ffff9a1a1360b870
<4>[  625.159583] RDX: ffff9a1a1360b870 RSI: ffffe328044d82c0 RDI:
ffff9a39536e5e00
<4>[  625.159583] RBP: ffffb11c4d029060 R08: 0000000000000001 R09:
ffffffff9f850a57
<4>[  625.159583] R10: ffff9a1a1360b870 R11: 0000000000000001 R12:
ffffe328044d82c0
<4>[  625.159584] R13: ffffe328044d82c0 R14: ffff9a39536e5e00 R15:
ffff9a39536e5e00
<4>[  625.159584] FS:  0000000000000000(0000)
GS:ffff9a38cfc00000(0000) knlGS:0000000000000000
<4>[  625.159585] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  625.159585] CR2: ffffb11c4d028fb8 CR3: 000000407aece000 CR4:
0000000000350ee0
<4>[  625.159585] Call Trace:
<4>[  625.159586]  <IRQ>
<4>[  625.159586]  ? kmem_cache_free+0x3bf/0x410
<4>[  625.159586]  ? mempool_free_slab+0x17/0x20
<4>[  625.159587]  ? mempool_free_slab+0x17/0x20
<4>[  625.159587]  ? mempool_free+0x2f/0x80
<4>[  625.159587]  ? md_end_io+0x4a/0x70
<4>[  625.159587]  ? bio_endio+0xe0/0x120
<4>[  625.159588]  ? bio_chain_endio+0x2d/0x40
<4>[  625.159588]  ? md_end_io+0x5c/0x70
<4>[  625.159588]  ? bio_endio+0xe0/0x120
<4>[  625.159589]  ? bio_chain_endio+0x2d/0x40
<4>[  625.159589]  ? md_end_io+0x5c/0x70
<4>[  625.159589]  ? bio_endio+0xe0/0x120
<4>[  625.159589]  ? bio_chain_endio+0x2d/0x40
<4>[  625.159590]  ? md_end_io+0x5c/0x70
<4>[  625.159590]  ? bio_endio+0xe0/0x120
...
<4>[  625.159663] Lost 344 message(s)!

Some info from a moment "just" (~0.2 s) before the crash:

$ cd /sys/dev/block/9:12 ;  watch -n0.2 'for f in `find . -maxdepth 1
-type f` `ls queue/* |grep -v usec` ; do echo $f `cat $f`; done | tee
-a ~/md'
./uevent MAJOR=9 MINOR=12 DEVNAME=md12 DEVTYPE=disk
./ext_range 256
./range 1
./alignment_offset 0
./dev 9:12
./ro 0
./stat 577 0 10446 84 46128254 0 11584858627 400313072 660 1289080
400313156 0 0 0 0 0 0
./events_poll_msecs -1
./events_async
./size 41986624512
./discard_alignment 0
./capability 50
./hidden 0
./removable 0
./events
./inflight 0 666
queue/add_random 0
queue/chunk_sectors 0
queue/dax 0
queue/discard_granularity 512
queue/discard_max_bytes 2199023255040
queue/discard_max_hw_bytes 2199023255040
queue/discard_zeroes_data 0
queue/fua 1
queue/hw_sector_size 512
queue/io_poll 0
queue/io_poll_delay 0
queue/iostats 0
queue/logical_block_size 512
queue/max_discard_segments 256
queue/max_hw_sectors_kb 128
queue/max_integrity_segments 0
queue/max_sectors_kb 128
queue/max_segments 33
queue/max_segment_size 4294967295
queue/minimum_io_size 524288
queue/nomerges 0
queue/nr_requests 128
queue/nr_zones 0
queue/optimal_io_size 1572864
queue/physical_block_size 512
queue/read_ahead_kb 3072
queue/rotational 0
queue/rq_affinity 0
queue/scheduler none
queue/stable_writes 0
queue/write_cache write back
queue/write_same_max_bytes 0
queue/write_zeroes_max_bytes 0
queue/zone_append_max_bytes 0
queue/zoned none


$ cd /sys/kernel/slab; watch -n0.2 'for f in `ls md_io/* | grep -v
calls`; do echo $f `cat $f`; done | tee -a ~/slab'
md_io/aliases 4
md_io/align 8
md_io/cache_dma 0
md_io/cpu_partial 30
md_io/cpu_slabs 625 N0=250 N1=84 N2=23 N3=268
md_io/ctor
md_io/destroy_by_rcu 0
md_io/hwcache_align 0
md_io/min_partial 5
md_io/objects 63852 N0=23256 N1=12138 N2=3264 N3=25194
md_io/object_size 40
md_io/objects_partial 0
md_io/objs_per_slab 102
md_io/order 0
md_io/partial 8 N0=3 N1=3 N3=2
md_io/poison 0
md_io/reclaim_account 0
md_io/red_zone 0
md_io/remote_node_defrag_ratio 100
md_io/sanity_checks 0
md_io/shrink
md_io/slabs 634 N0=231 N1=122 N2=32 N3=249
md_io/slabs_cpu_partial 559(559) C0=21(21) C1=16(16) C2=12(12)
C3=21(21) C4=7(7) C5=23(23) C6=19(19) C7=19(19) C8=5(5) C9=4(4)
C10=3(3) C12=5(5) C13=6(6) C14=6(6) C18=1(1) C19=2(2) C24=19(19)
C25=20(20) C26=15(15
) C27=11(11) C28=17(17) C29=14(14) C30=23(23) C31=14(14) C32=21(21)
C33=8(8) C34=19(19) C35=12(12) C36=19(19) C37=17(17) C38=15(15)
C39=7(7) C40=3(3) C41=3(3) C42=1(1) C45=3(3) C46=5(5) C47=1(1)
C48=1(1) C49=1(1)
 C53=1(1) C56=3(3) C57=5(5) C58=21(21) C59=25(25) C60=5(5) C61=25(25)
C62=17(17) C63=18(18)
md_io/slab_size 40
md_io/store_user 0
md_io/total_objects 64668 N0=23562 N1=12444 N2=3264 N3=25398
md_io/trace 0
md_io/usersize 0
md_io/validate


$ watch -n0.2 'cat /proc/meminfo | paste - - | tee -a ~/meminfo'
MemTotal:       528235648 kB    MemFree:        20002732 kB
MemAvailable:   483890268 kB    Buffers:            7356 kB
Cached:         495416180 kB    SwapCached:            0 kB
Active:         96396800 kB     Inactive:       399891308 kB
Active(anon):      10976 kB     Inactive(anon):   890908 kB
Active(file):   96385824 kB     Inactive(file): 399000400 kB
Unevictable:       78768 kB     Mlocked:           78768 kB
SwapTotal:             0 kB     SwapFree:              0 kB
Dirty:          88422072 kB     Writeback:        948756 kB
AnonPages:        945772 kB     Mapped:            57300 kB
Shmem:             26300 kB     KReclaimable:    7248160 kB
Slab:            7962748 kB     SReclaimable:    7248160 kB
SUnreclaim:       714588 kB     KernelStack:       18288 kB
PageTables:        10796 kB     NFS_Unstable:          0 kB
Bounce:                0 kB     WritebackTmp:          0 kB
CommitLimit:    264117824 kB    Committed_AS:   21816824 kB
VmallocTotal:   34359738367 kB  VmallocUsed:      561588 kB
VmallocChunk:          0 kB     Percpu:            65792 kB
HardwareCorrupted:     0 kB     AnonHugePages:         0 kB
ShmemHugePages:        0 kB     ShmemPmdMapped:        0 kB
FileHugePages:         0 kB     FilePmdMapped:         0 kB
HugePages_Total:       0        HugePages_Free:        0
HugePages_Rsvd:        0        HugePages_Surp:        0
Hugepagesize:       2048 kB     Hugetlb:               0 kB
DirectMap4k:      541000 kB     DirectMap2M:    11907072 kB
DirectMap1G:    525336576 kB


Regards,
Paweł Wiejacha.


On Mon, 12 Apr 2021 at 08:49, Song Liu <song@kernel.org> wrote:
>
> On Fri, Apr 9, 2021 at 2:41 PM Paweł Wiejacha
> <pawel.wiejacha@rtbhouse.com> wrote:
> >
> > Hello,
> >
> > Two of my machines constantly crash with a double fault like this:
> >
> > 1146  <0>[33685.629591] traps: PANIC: double fault, error_code: 0x0
> > 1147  <4>[33685.629593] double fault: 0000 [#1] SMP NOPTI
> > 1148  <4>[33685.629594] CPU: 10 PID: 2118287 Comm: kworker/10:0
> > Tainted: P           OE     5.11.8-051108-generic #202103200636
> > 1149  <4>[33685.629595] Hardware name: ASUSTeK COMPUTER INC. KRPG-U8
> > Series/KRPG-U8 Series, BIOS 4201 09/25/2020
> > 1150  <4>[33685.629595] Workqueue: xfs-conv/md12 xfs_end_io [xfs]
> > 1151  <4>[33685.629596] RIP: 0010:__slab_free+0x23/0x340
> > 1152  <4>[33685.629597] Code: 4c fe ff ff 0f 1f 00 0f 1f 44 00 00 55
> > 48 89 e5 41 57 49 89 cf 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83
> > e4 f0 48 83 ec 70 <48> 89 54 24 28 0f 1f 44 00 00 41 8b 46 28 4d 8b 6c
> > 24 20 49 8b 5c
> > 1153  <4>[33685.629598] RSP: 0018:ffffa9bc00848fa0 EFLAGS: 00010086
> > 1154  <4>[33685.629599] RAX: ffff94c04d8b10a0 RBX: ffff94437a34a880
> > RCX: ffff94437a34a880
> > 1155  <4>[33685.629599] RDX: ffff94437a34a880 RSI: ffffcec745e8d280
> > RDI: ffff944300043b00
> > 1156  <4>[33685.629599] RBP: ffffa9bc00849040 R08: 0000000000000001
> > R09: ffffffff82a5d6de
> > 1157  <4>[33685.629600] R10: 0000000000000001 R11: 000000009c109000
> > R12: ffffcec745e8d280
> > 1158  <4>[33685.629600] R13: ffff944300043b00 R14: ffff944300043b00
> > R15: ffff94437a34a880
> > 1159  <4>[33685.629601] FS:  0000000000000000(0000)
> > GS:ffff94c04d880000(0000) knlGS:0000000000000000
> > 1160  <4>[33685.629601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > 1161  <4>[33685.629602] CR2: ffffa9bc00848f98 CR3: 000000014d04e000
> > CR4: 0000000000350ee0
> > 1162  <4>[33685.629602] Call Trace:
> > 1163  <4>[33685.629603]  <IRQ>
> > 1164  <4>[33685.629603]  ? kfree+0x3bc/0x3e0
> > 1165  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
> > 1166  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
> > 1167  <4>[33685.629604]  ? mempool_free+0x2f/0x80
> > 1168  <4>[33685.629604]  ? md_end_io+0x4a/0x70
> > 1169  <4>[33685.629604]  ? bio_endio+0xdc/0x130
> > 1170  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
> > 1171  <4>[33685.629605]  ? md_end_io+0x5c/0x70
> > 1172  <4>[33685.629605]  ? bio_endio+0xdc/0x130
> > 1173  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
> > 1174  <4>[33685.629606]  ? md_end_io+0x5c/0x70
> > 1175  <4>[33685.629606]  ? bio_endio+0xdc/0x130
> > 1176  <4>[33685.629606]  ? bio_chain_endio+0x2d/0x40
> > 1177  <4>[33685.629607]  ? md_end_io+0x5c/0x70
> > ... repeated ...
> > 1436  <4>[33685.629677]  ? bio_endio+0xdc/0x130
> > 1437  <4>[33685.629677]  ? bio_chain_endio+0x2d/0x40
> > 1438  <4>[33685.629677]  ? md_end_io+0x5c/0x70
> > 1439  <4>[33685.629677]  ? bio_endio+0xdc/0x130
> > 1440  <4>[33685.629678]  ? bio_chain_endio+0x2d/0x40
> > 1441  <4>[33685.629678]  ? md_
> > 1442  <4>[33685.629679] Lost 357 message(s)!
> >
> > This happens on:
> > 5.11.8-051108-generic #202103200636 SMP Sat Mar 20 11:17:32 UTC 2021
> > and on 5.8.0-44-generic #50~20.04.1-Ubuntu
> > (https://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_5.8.0-44.50/changelog)
> > which contains backported
> > https://github.com/torvalds/linux/commit/41d2d848e5c09209bdb57ff9c0ca34075e22783d
> > ("md: improve io stats accounting").
> > The 5.8.18-050818-generic #202011011237 SMP Sun Nov 1 12:40:15 UTC
> > 2020 which does not contain above suspected change does not crash.
> >
> > If there's a better way/place to report this bug just let me know. If
> > not, here are steps to reproduce:
> >
> > 1. Create a RAID 0 device using three Micron_9300_MTFDHAL7T6TDP disks.
> > mdadm --create --verbose /dev/md12 --level=stripe --raid-devices=3
> > /dev/nvme0n1p1 /dev/nvme1n1p1 /dev/nvme2n1p1
> >
> > 2. Setup xfs on it:
> > mkfs.xfs /dev/md12 and mount it
> >
> > 3. Write to a file on this filesystem:
> > while true; do rm -rf /mnt/md12/crash* ; for i in `seq 8`; do dd
> > if=/dev/zero of=/mnt/md12/crash$i bs=32K count=50000000 & done; wait;
> > done
> > Wait for a crash (usually less than 20 min).
> >
> > I couldn't reproduce it with a single dd process (maybe I have to wait
> > a little longer), but a single cat
> > /very/large/file/on/cephfs/over100GbE > /mnt/md12/crash is enough for
> > this double fault to occur.
> >
> > More info:
> > This long mempool_kfree - md_end_io - *  -md_end_io stack trace looks
> > always the same, but the panic occurs in different places:
> >
> > pstore/6948115143318/dmesg.txt-<4>[545649.087998] CPU: 88 PID: 0 Comm:
> > swapper/88 Tainted: P           OE     5.11.8-051108-generic
> > #202103200636
> > pstore/6948377398316/dmesg.txt-<4>[11275.914909] CPU: 14 PID: 0 Comm:
> > swapper/14 Tainted: P           OE     5.11.8-051108-generic
> > #202103200636
> > pstore/6948532816002/dmesg.txt-<4>[33685.629594] CPU: 10 PID: 2118287
> > Comm: kworker/10:0 Tainted: P           OE     5.11.8-051108-generic
> > #202103200636
> > pstore/6948532816002/dmesg.txt-<4>[33685.629595] Workqueue:
> > xfs-conv/md12 xfs_end_io [xfs]
> > pstore/6948855849083/dmesg.txt-<4>[42934.321129] CPU: 85 PID: 0 Comm:
> > swapper/85 Tainted: P           OE     5.11.8-051108-generic
> > #202103200636
> > pstore/6948876331782/dmesg.txt-<4>[ 3475.020672] CPU: 86 PID: 0 Comm:
> > swapper/86 Tainted: P           OE     5.11.8-051108-generic
> > #202103200636
> > pstore/6949083860307/dmesg.txt-<4>[43048.254375] CPU: 45 PID: 0 Comm:
> > swapper/45 Tainted: P           OE     5.11.8-051108-generic
> > #202103200636
> > pstore/6949091775931/dmesg.txt-<4>[ 1150.790240] CPU: 64 PID: 0 Comm:
> > swapper/64 Tainted: P           OE     5.11.8-051108-generic
> > #202103200636
> > pstore/6949123356826/dmesg.txt-<4>[ 6963.858253] CPU: 6 PID: 51 Comm:
> > kworker/6:0 Tainted: P           OE     5.11.8-051108-generic
> > #202103200636
> > pstore/6949123356826/dmesg.txt-<4>[ 6963.858255] Workqueue: ceph-msgr
> > ceph_con_workfn [libceph]
> > pstore/6949123356826/dmesg.txt-<4>[ 6963.858253] CPU: 6 PID: 51 Comm:
> > kworker/6:0 Tainted: P           OE     5.11.8-051108-generic
> > #202103200636
> > pstore/6949123356826/dmesg.txt-<4>[ 6963.858255] Workqueue: ceph-msgr
> > ceph_con_workfn [libceph]
> > pstore/6949152322085/dmesg.txt-<4>[ 6437.077874] CPU: 59 PID: 0 Comm:
> > swapper/59 Tainted: P           OE     5.11.8-051108-generic
> > #202103200636
> >
> > cat /proc/cmdline
> > BOOT_IMAGE=/vmlinuz-5.11.8-051108-generic
> > root=/dev/mapper/ubuntu--vg-ubuntu--lv ro net.ifnames=0 biosdevname=0
> > strict-devmem=0 mitigations=off iommu=pt
> >
> > cat /proc/cpuinfo
> > model name      : AMD EPYC 7552 48-Core Processor
> >
> > cat /proc/mounts
> > /dev/md12 /mnt/ssd1 xfs
> > rw,noatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=1024,swidth=3072,prjquota
> > 0 0
> >
> > Let me know if you need more information.
>
> Hi Pawel,
>
> Thanks for the report. Could you please try whether the following
> change fixes the issue?
>
> Song
>
> diff --git i/drivers/md/md.c w/drivers/md/md.c
> index bb09070a63bcf..889e9440a9f38 100644
> --- i/drivers/md/md.c
> +++ w/drivers/md/md.c
> @@ -5648,7 +5648,7 @@ static int md_alloc(dev_t dev, char *name)
>                  */
>                 mddev->hold_active = UNTIL_STOP;
>
> -       error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
> +       error = mempool_init_slab_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
>                                           sizeof(struct md_io));
>         if (error)
>                 goto abort;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: PROBLEM: double fault in md_end_io
  2021-04-13 12:05   ` Paweł Wiejacha
@ 2021-04-15  0:36     ` Song Liu
  2021-04-15  6:35       ` Song Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2021-04-15  0:36 UTC (permalink / raw)
  To: Paweł Wiejacha; +Cc: linux-raid, Artur Paszkiewicz

On Tue, Apr 13, 2021 at 5:05 AM Paweł Wiejacha
<pawel.wiejacha@rtbhouse.com> wrote:
>
> Hello Song,
>
> That code does not compile, but I guess that what you meant was
> something like this:

Yeah.. I am really sorry for the noise.

>
> diff --git drivers/md/md.c drivers/md/md.c
> index 04384452a..cbc97a96b 100644
> --- drivers/md/md.c
> +++ drivers/md/md.c
> @@ -78,6 +78,7 @@ static DEFINE_SPINLOCK(pers_lock);
>
>  static struct kobj_type md_ktype;
>
> +struct kmem_cache *md_io_cache;
>  struct md_cluster_operations *md_cluster_ops;
>  EXPORT_SYMBOL(md_cluster_ops);
>  static struct module *md_cluster_mod;
> @@ -5701,8 +5702,8 @@ static int md_alloc(dev_t dev, char *name)
>          */
>         mddev->hold_active = UNTIL_STOP;
>
> -   error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
> -                     sizeof(struct md_io));
> +   error = mempool_init_slab_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
> +                     md_io_cache);
>     if (error)
>         goto abort;
>
> @@ -9542,6 +9543,10 @@ static int __init md_init(void)
>  {
>     int ret = -ENOMEM;
>
> +   md_io_cache = KMEM_CACHE(md_io, 0);
> +   if (!md_io_cache)
> +       goto err_md_io_cache;
> +
>     md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0);
>     if (!md_wq)
>         goto err_wq;
> @@ -9578,6 +9583,8 @@ static int __init md_init(void)
>  err_misc_wq:
>     destroy_workqueue(md_wq);
>  err_wq:
> +   kmem_cache_destroy(md_io_cache);
> +err_md_io_cache:
>     return ret;
>  }
>
> @@ -9863,6 +9870,7 @@ static __exit void md_exit(void)
>     destroy_workqueue(md_rdev_misc_wq);
>     destroy_workqueue(md_misc_wq);
>     destroy_workqueue(md_wq);
> +   kmem_cache_destroy(md_io_cache);
>  }
>
>  subsys_initcall(md_init);

[...]

>
> $ watch -n0.2 'cat /proc/meminfo | paste - - | tee -a ~/meminfo'
> MemTotal:       528235648 kB    MemFree:        20002732 kB
> MemAvailable:   483890268 kB    Buffers:            7356 kB
> Cached:         495416180 kB    SwapCached:            0 kB
> Active:         96396800 kB     Inactive:       399891308 kB
> Active(anon):      10976 kB     Inactive(anon):   890908 kB
> Active(file):   96385824 kB     Inactive(file): 399000400 kB
> Unevictable:       78768 kB     Mlocked:           78768 kB
> SwapTotal:             0 kB     SwapFree:              0 kB
> Dirty:          88422072 kB     Writeback:        948756 kB
> AnonPages:        945772 kB     Mapped:            57300 kB
> Shmem:             26300 kB     KReclaimable:    7248160 kB
> Slab:            7962748 kB     SReclaimable:    7248160 kB
> SUnreclaim:       714588 kB     KernelStack:       18288 kB
> PageTables:        10796 kB     NFS_Unstable:          0 kB
> Bounce:                0 kB     WritebackTmp:          0 kB
> CommitLimit:    264117824 kB    Committed_AS:   21816824 kB
> VmallocTotal:   34359738367 kB  VmallocUsed:      561588 kB
> VmallocChunk:          0 kB     Percpu:            65792 kB
> HardwareCorrupted:     0 kB     AnonHugePages:         0 kB
> ShmemHugePages:        0 kB     ShmemPmdMapped:        0 kB
> FileHugePages:         0 kB     FilePmdMapped:         0 kB
> HugePages_Total:       0        HugePages_Free:        0
> HugePages_Rsvd:        0        HugePages_Surp:        0
> Hugepagesize:       2048 kB     Hugetlb:               0 kB
> DirectMap4k:      541000 kB     DirectMap2M:    11907072 kB
> DirectMap1G:    525336576 kB
>

And thanks for these information.

I have set up a system to run the test, the code I am using is the top of the
md-next branch. I will update later tonight on the status.

Thanks,
Song

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: double fault in md_end_io
  2021-04-15  0:36     ` Song Liu
@ 2021-04-15  6:35       ` Song Liu
  2021-04-15 15:35         ` Paweł Wiejacha
  0 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2021-04-15  6:35 UTC (permalink / raw)
  To: Paweł Wiejacha; +Cc: linux-raid, Artur Paszkiewicz

On Wed, Apr 14, 2021 at 5:36 PM Song Liu <song@kernel.org> wrote:
>
> On Tue, Apr 13, 2021 at 5:05 AM Paweł Wiejacha
> <pawel.wiejacha@rtbhouse.com> wrote:
> >
> > Hello Song,
> >
> > That code does not compile, but I guess that what you meant was
> > something like this:
>
> Yeah.. I am really sorry for the noise.
>
> >
> > diff --git drivers/md/md.c drivers/md/md.c
> > index 04384452a..cbc97a96b 100644
> > --- drivers/md/md.c
> > +++ drivers/md/md.c
> > @@ -78,6 +78,7 @@ static DEFINE_SPINLOCK(pers_lock);
> >
> >  static struct kobj_type md_ktype;
> >
> > +struct kmem_cache *md_io_cache;
> >  struct md_cluster_operations *md_cluster_ops;
> >  EXPORT_SYMBOL(md_cluster_ops);
> >  static struct module *md_cluster_mod;
> > @@ -5701,8 +5702,8 @@ static int md_alloc(dev_t dev, char *name)
> >          */
> >         mddev->hold_active = UNTIL_STOP;
> >
> > -   error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
> > -                     sizeof(struct md_io));
> > +   error = mempool_init_slab_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
> > +                     md_io_cache);
> >     if (error)
> >         goto abort;
> >
> > @@ -9542,6 +9543,10 @@ static int __init md_init(void)
> >  {
> >     int ret = -ENOMEM;
> >
> > +   md_io_cache = KMEM_CACHE(md_io, 0);
> > +   if (!md_io_cache)
> > +       goto err_md_io_cache;
> > +
> >     md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0);
> >     if (!md_wq)
> >         goto err_wq;
> > @@ -9578,6 +9583,8 @@ static int __init md_init(void)
> >  err_misc_wq:
> >     destroy_workqueue(md_wq);
> >  err_wq:
> > +   kmem_cache_destroy(md_io_cache);
> > +err_md_io_cache:
> >     return ret;
> >  }
> >
> > @@ -9863,6 +9870,7 @@ static __exit void md_exit(void)
> >     destroy_workqueue(md_rdev_misc_wq);
> >     destroy_workqueue(md_misc_wq);
> >     destroy_workqueue(md_wq);
> > +   kmem_cache_destroy(md_io_cache);
> >  }
> >
> >  subsys_initcall(md_init);
>
> [...]
>
> >
> > $ watch -n0.2 'cat /proc/meminfo | paste - - | tee -a ~/meminfo'
> > MemTotal:       528235648 kB    MemFree:        20002732 kB
> > MemAvailable:   483890268 kB    Buffers:            7356 kB
> > Cached:         495416180 kB    SwapCached:            0 kB
> > Active:         96396800 kB     Inactive:       399891308 kB
> > Active(anon):      10976 kB     Inactive(anon):   890908 kB
> > Active(file):   96385824 kB     Inactive(file): 399000400 kB
> > Unevictable:       78768 kB     Mlocked:           78768 kB
> > SwapTotal:             0 kB     SwapFree:              0 kB
> > Dirty:          88422072 kB     Writeback:        948756 kB
> > AnonPages:        945772 kB     Mapped:            57300 kB
> > Shmem:             26300 kB     KReclaimable:    7248160 kB
> > Slab:            7962748 kB     SReclaimable:    7248160 kB
> > SUnreclaim:       714588 kB     KernelStack:       18288 kB
> > PageTables:        10796 kB     NFS_Unstable:          0 kB
> > Bounce:                0 kB     WritebackTmp:          0 kB
> > CommitLimit:    264117824 kB    Committed_AS:   21816824 kB
> > VmallocTotal:   34359738367 kB  VmallocUsed:      561588 kB
> > VmallocChunk:          0 kB     Percpu:            65792 kB
> > HardwareCorrupted:     0 kB     AnonHugePages:         0 kB
> > ShmemHugePages:        0 kB     ShmemPmdMapped:        0 kB
> > FileHugePages:         0 kB     FilePmdMapped:         0 kB
> > HugePages_Total:       0        HugePages_Free:        0
> > HugePages_Rsvd:        0        HugePages_Surp:        0
> > Hugepagesize:       2048 kB     Hugetlb:               0 kB
> > DirectMap4k:      541000 kB     DirectMap2M:    11907072 kB
> > DirectMap1G:    525336576 kB
> >
>
> And thanks for these information.
>
> I have set up a system to run the test, the code I am using is the top of the
> md-next branch. I will update later tonight on the status.

I am not able to reproduce the issue after 6 hours. Maybe it is because I run
tests on 3 partitions of the same nvme SSD. I will try on a different host with
multiple SSDs.

Pawel, have you tried to repro with md-next branch?

Thanks,
Song

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: double fault in md_end_io
  2021-04-15  6:35       ` Song Liu
@ 2021-04-15 15:35         ` Paweł Wiejacha
  2021-04-22 15:40           ` Paweł Wiejacha
  0 siblings, 1 reply; 13+ messages in thread
From: Paweł Wiejacha @ 2021-04-15 15:35 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid, Artur Paszkiewicz

> Pawel, have you tried to repro with md-next branch?

Yes, and it also crashes:

$ git log --decorate --format=oneline
ca882ba4c0478c68f927fa7b622ec17bde9361ce (HEAD -> md-next-pw2) using
slab_pool for md_io_pool
2715e61834586cef8292fcaa457cbf2da955a3b8 (song-md/md-next) md/bitmap:
wait for external bitmap writes to complete during tear down
c84917aa5dfc4c809634120fb429f0ff590a1f75 md: do not return existing
mddevs from mddev_find_or_alloc

<0>[ 2086.596361] traps: PANIC: double fault, error_code: 0x0
<4>[ 2086.596364] double fault: 0000 [#1] SMP NOPTI
<4>[ 2086.596365] CPU: 40 PID: 0 Comm: swapper/40 Tainted: G
OE     5.12.0-rc3-md-next-pw-fix2 #1
<4>[ 2086.596365] Hardware name: empty S8030GM2NE/S8030GM2NE, BIOS
V4.00 03/11/2021
<4>[ 2086.596366] RIP: 0010:__slab_free+0x26/0x380
<4>[ 2086.596367] Code: 1f 44 00 00 0f 1f 44 00 00 55 49 89 ca 45 89
c3 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83 e4 f0
48 83 ec 70 <48> 89 54 24 28 0f 1f 44 00 00 41 8b 46 28 4d 8b 6c 24 20
49 8b 5c
<4>[ 2086.596368] RSP: 0018:ffff96fa4d1c8f90 EFLAGS: 00010086
<4>[ 2086.596369] RAX: ffff892f4fc35300 RBX: ffff890fdca3ff78 RCX:
ffff890fdca3ff78
<4>[ 2086.596370] RDX: ffff890fdca3ff78 RSI: fffff393c1728fc0 RDI:
ffff892fd3a9b300
<4>[ 2086.596370] RBP: ffff96fa4d1c9030 R08: 0000000000000001 R09:
ffffffffb66500a7
<4>[ 2086.596371] R10: ffff890fdca3ff78 R11: 0000000000000001 R12:
fffff393c1728fc0
<4>[ 2086.596371] R13: fffff393c1728fc0 R14: ffff892fd3a9b300 R15:
0000000000000000
<4>[ 2086.596372] FS:  0000000000000000(0000)
GS:ffff892f4fc00000(0000) knlGS:0000000000000000
<4>[ 2086.596373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2086.596373] CR2: ffff96fa4d1c8f88 CR3: 00000010c8210000 CR4:
0000000000350ee0
<4>[ 2086.596373] Call Trace:
<4>[ 2086.596374]  <IRQ>
<4>[ 2086.596374]  ? kmem_cache_free+0x3d2/0x420
<4>[ 2086.596374]  ? mempool_free_slab+0x17/0x20
<4>[ 2086.596375]  ? mempool_free_slab+0x17/0x20
<4>[ 2086.596375]  ? mempool_free+0x2f/0x80
<4>[ 2086.596376]  ? md_end_io+0x47/0x60
<4>[ 2086.596376]  ? bio_endio+0xee/0x140
<4>[ 2086.596376]  ? bio_chain_endio+0x2d/0x40
<4>[ 2086.596377]  ? md_end_io+0x59/0x60
<4>[ 2086.596377]  ? bio_endio+0xee/0x140
<4>[ 2086.596378]  ? bio_chain_endio+0x2d/0x40
...
<4>[ 2086.596485] Lost 340 message(s)!

Dumping last: `watch -n0.1 ...`:

stat 451 0 13738 76 67823521 0 17162428974 801228700 351 1886784
801228776 0 0 0 0 0 0
inflight 0 354

md_io/aliases 4
md_io/align 8
md_io/cache_dma 0
md_io/cpu_partial 30
md_io/cpu_slabs 1235 N0=288 N1=239 N2=463 N3=245
md_io/destroy_by_rcu 0
md_io/hwcache_align 0
md_io/min_partial 5
md_io/objects 126582 N0=26928 N1=28254 N2=44064 N3=27336
md_io/object_size 40
md_io/objects_partial 0
md_io/objs_per_slab 102
md_io/order 0
md_io/partial 8 N2=4 N3=4
md_io/poison 0
md_io/reclaim_account 0
md_io/red_zone 0
md_io/remote_node_defrag_ratio 100
md_io/sanity_checks 0
md_io/slabs 1249 N0=264 N1=277 N2=436 N3=272
md_io/slabs_cpu_partial 1172(1172) C0=14(14) C1=17(17) C2=17(17)
C3=17(17) C4=18(18) C5=17(17) C6=18(18) C7=17(17) C8=21(21) C9=4(4)
C10=21(21) C11=22(22) C12=21(21) C13=25(25) C14=15(15) C15
=19(19) C16=24(24) C17=19(19) C18=22(22) C19=15(15) C20=28(28)
C21=18(18) C22=18(18) C23=24(24) C24=17(17) C25=27(27) C26=8(8)
C27=18(18) C28=17(17) C29=18(18) C30=21(21) C31=21(21) C32=9(9)
C33=9(9) C34=24(24) C35=11(11) C36=19(19) C37=14(14) C38=18(18)
C39=4(4) C40=23(23) C41=20(20) C42=18(18) C43=22(22) C44=21(21)
C45=24(24) C46=20(20) C47=15(15) C48=17(17) C49=15(15) C50=16(1
6) C51=26(26) C52=21(21) C53=19(19) C54=16(16) C55=18(18) C56=26(26)
C57=14(14) C58=18(18) C59=23(23) C60=18(18) C61=20(20) C62=19(19)
C63=17(17)
md_io/slab_size 40
md_io/store_user 0
md_io/total_objects 127398 N0=26928 N1=28254 N2=44472 N3=27744
md_io/trace 0
md_io/usersize 0

> I am not able to reproduce the issue [...]

It crashes on two different machines. Next week I'm going to upgrade a
distro on an older machine (with Intel NVMe disks, different
motherboard and Xeon instead of EPYC 2 CPU) running currently
linux-5.4 without this issue. So I will let you know if switching to a
newer kernel with "improved io stats accounting" makes it unstable or
not.

Best regards,
Pawel Wiejacha.

On Thu, 15 Apr 2021 at 08:35, Song Liu <song@kernel.org> wrote:
>
> On Wed, Apr 14, 2021 at 5:36 PM Song Liu <song@kernel.org> wrote:
> >
> > On Tue, Apr 13, 2021 at 5:05 AM Paweł Wiejacha
> > <pawel.wiejacha@rtbhouse.com> wrote:
> > >
> > > Hello Song,
> > >
> > > That code does not compile, but I guess that what you meant was
> > > something like this:
> >
> > Yeah.. I am really sorry for the noise.
> >
> > >
> > > diff --git drivers/md/md.c drivers/md/md.c
> > > index 04384452a..cbc97a96b 100644
> > > --- drivers/md/md.c
> > > +++ drivers/md/md.c
> > > @@ -78,6 +78,7 @@ static DEFINE_SPINLOCK(pers_lock);
> > >
> > >  static struct kobj_type md_ktype;
> > >
> > > +struct kmem_cache *md_io_cache;
> > >  struct md_cluster_operations *md_cluster_ops;
> > >  EXPORT_SYMBOL(md_cluster_ops);
> > >  static struct module *md_cluster_mod;
> > > @@ -5701,8 +5702,8 @@ static int md_alloc(dev_t dev, char *name)
> > >          */
> > >         mddev->hold_active = UNTIL_STOP;
> > >
> > > -   error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
> > > -                     sizeof(struct md_io));
> > > +   error = mempool_init_slab_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
> > > +                     md_io_cache);
> > >     if (error)
> > >         goto abort;
> > >
> > > @@ -9542,6 +9543,10 @@ static int __init md_init(void)
> > >  {
> > >     int ret = -ENOMEM;
> > >
> > > +   md_io_cache = KMEM_CACHE(md_io, 0);
> > > +   if (!md_io_cache)
> > > +       goto err_md_io_cache;
> > > +
> > >     md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0);
> > >     if (!md_wq)
> > >         goto err_wq;
> > > @@ -9578,6 +9583,8 @@ static int __init md_init(void)
> > >  err_misc_wq:
> > >     destroy_workqueue(md_wq);
> > >  err_wq:
> > > +   kmem_cache_destroy(md_io_cache);
> > > +err_md_io_cache:
> > >     return ret;
> > >  }
> > >
> > > @@ -9863,6 +9870,7 @@ static __exit void md_exit(void)
> > >     destroy_workqueue(md_rdev_misc_wq);
> > >     destroy_workqueue(md_misc_wq);
> > >     destroy_workqueue(md_wq);
> > > +   kmem_cache_destroy(md_io_cache);
> > >  }
> > >
> > >  subsys_initcall(md_init);
> >
> > [...]
> >
> > >
> > > $ watch -n0.2 'cat /proc/meminfo | paste - - | tee -a ~/meminfo'
> > > MemTotal:       528235648 kB    MemFree:        20002732 kB
> > > MemAvailable:   483890268 kB    Buffers:            7356 kB
> > > Cached:         495416180 kB    SwapCached:            0 kB
> > > Active:         96396800 kB     Inactive:       399891308 kB
> > > Active(anon):      10976 kB     Inactive(anon):   890908 kB
> > > Active(file):   96385824 kB     Inactive(file): 399000400 kB
> > > Unevictable:       78768 kB     Mlocked:           78768 kB
> > > SwapTotal:             0 kB     SwapFree:              0 kB
> > > Dirty:          88422072 kB     Writeback:        948756 kB
> > > AnonPages:        945772 kB     Mapped:            57300 kB
> > > Shmem:             26300 kB     KReclaimable:    7248160 kB
> > > Slab:            7962748 kB     SReclaimable:    7248160 kB
> > > SUnreclaim:       714588 kB     KernelStack:       18288 kB
> > > PageTables:        10796 kB     NFS_Unstable:          0 kB
> > > Bounce:                0 kB     WritebackTmp:          0 kB
> > > CommitLimit:    264117824 kB    Committed_AS:   21816824 kB
> > > VmallocTotal:   34359738367 kB  VmallocUsed:      561588 kB
> > > VmallocChunk:          0 kB     Percpu:            65792 kB
> > > HardwareCorrupted:     0 kB     AnonHugePages:         0 kB
> > > ShmemHugePages:        0 kB     ShmemPmdMapped:        0 kB
> > > FileHugePages:         0 kB     FilePmdMapped:         0 kB
> > > HugePages_Total:       0        HugePages_Free:        0
> > > HugePages_Rsvd:        0        HugePages_Surp:        0
> > > Hugepagesize:       2048 kB     Hugetlb:               0 kB
> > > DirectMap4k:      541000 kB     DirectMap2M:    11907072 kB
> > > DirectMap1G:    525336576 kB
> > >
> >
> > And thanks for these information.
> >
> > I have set up a system to run the test, the code I am using is the top of the
> > md-next branch. I will update later tonight on the status.
>
> I am not able to reproduce the issue after 6 hours. Maybe it is because I run
> tests on 3 partitions of the same nvme SSD. I will try on a different host with
> multiple SSDs.
>
> Pawel, have you tried to repro with md-next branch?
>
> Thanks,
> Song

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: double fault in md_end_io
  2021-04-15 15:35         ` Paweł Wiejacha
@ 2021-04-22 15:40           ` Paweł Wiejacha
  0 siblings, 0 replies; 13+ messages in thread
From: Paweł Wiejacha @ 2021-04-22 15:40 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid, Artur Paszkiewicz

I was able to reproduce this bug on a third machine. This time it's a
Xeon Gold 6230 CPU + 3x INTEL SSDPE2KE076T8. This machine is running
constantly the same type of work for almost two years and it was fine
with Ubuntu's 5.4 and older kernels. After upgrading the kernel to
5.8.0-50-generic (with backported "md: improve io stats accounting")
it crashed after 42 hours of running.

This time ERST data looks slightly different (the call trace
timestamps are before the panic timestamp and the panic message is
different):

dmesg-erst-6953966727021461507:
Panic#2 Part1
48180.542749]  ? bio_endio+0xef/0x150
<4>[148180.542749]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542750]  ? md_end_io+0x5d/0x70
<4>[148180.542750]  ? bio_endio+0xef/0x150
<4>[148180.542750]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542751]  ? md_end_io+0x5d/0x70
<4>[148180.542751]  ? bio_endio+0xef/0x150
<4>[148180.542751]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542751]  ? md_end_io+0x5d/0x70
...
<4>[148180.542791]  ? md_end_io+0x5d/0x70
<4>[148180.542792]  ? bio_endio+0xef/0x150
<4>[148180.542792]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542792]  ? md_end_io+0x5d/0x70
<4>[148180.542792]  ? bio_endio+0xef/0x150
<4>[148180.542793]  ? bio_chain_endio+0x2d/
<4>[148180.542794] Lost 353 message(s)!
<4>[148180.683385] ---[ end trace 00bf232b0f939095 ]---
<4>[148180.683385] RIP: 0010:__slab_free+0x26/0x330
<4>[148180.683386] Code: 1f 44 00 00 0f 1f 44 00 00 55 49 89 ca 45 89
c3 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83 e4 f0
48 83 ec 70 <48> 89 54 24 28 f7 47 08 00 0d 21 00 0f 85 ac 01 00 00 41
8b 46 20
<4>[148180.683387] RSP: 0018:ffffae2b19088fb0 EFLAGS: 00010082
<4>[148180.683388] RAX: ffff978cc0b31080 RBX: ffff978c29f25bc0 RCX:
ffff978c29f25bc0
<4>[148180.683389] RDX: ffff978c29f25bc0 RSI: ffffdb4638a7c940 RDI:
ffff978cc0407800
<4>[148180.683389] RBP: ffffae2b19089048 R08: 0000000000000001 R09:
ffffffffb0e345ee
<4>[148180.683390] R10: ffff978c29f25bc0 R11: 0000000000000001 R12:
ffffdb4638a7c940
<4>[148180.683390] R13: ffff978cc0407800 R14: ffff978cc0407800 R15:
0000000000000000
<4>[148180.683391] FS:  00007fcfdf7fe700(0000)
GS:ffff978cc0b00000(0000) knlGS:0000000000000000
<4>[148180.683391] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[148180.683392] CR2: ffffae2b19088fa8 CR3: 0000007a379ac001 CR4:
00000000007606e0
<4>[148180.683392] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
<4>[148180.683392] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
<4>[148180.683393] PKRU: 55555554
<0>[148180.683393] Kernel panic - not syncing: Fatal exception in interrupt
<0>[148180.683447] Kernel Offset: 0x2fc00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
dmesg-erst-6953966727021461505:
Oops#1 Part1
0x70
<4>[148180.542739]  ? bio_endio+0xef/0x150
<4>[148180.542740]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542740]  ? md_end_io+0x5d/0x70
<4>[148180.542740]  ? bio_endio+0xef/0x150
<4>[148180.542741]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542741]  ? md_end_io+0x5d/0x70
<4>[148180.542741]  ? bio_endio+0xef/0x150
<4>[148180.542741]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542742]  ? md_end_io+0x5d/0x70
...
<4>[148180.542792]  ? bio_chain_endio+0x2d/0x40
<4>[148180.542792]  ? md_end_io+0x5d/0x70
<4>[148180.542792]  ? bio_endio+0xef/0x150
<4>[148180.542793]  ? bio_chain_endio+0x2d/
<4>[148180.542794] Lost 353 message(s)!

If you cannot reproduce it, maybe there's a way I could help you to
solve this issue? For example, I could change the dumpstack.c to skip
printing repeated frames or something.

Regards,
Pawel Wiejacha


On Thu, 15 Apr 2021 at 17:35, Paweł Wiejacha
<pawel.wiejacha@rtbhouse.com> wrote:
>
> > Pawel, have you tried to repro with md-next branch?
>
> Yes, and it also crashes:
>
> $ git log --decorate --format=oneline
> ca882ba4c0478c68f927fa7b622ec17bde9361ce (HEAD -> md-next-pw2) using
> slab_pool for md_io_pool
> 2715e61834586cef8292fcaa457cbf2da955a3b8 (song-md/md-next) md/bitmap:
> wait for external bitmap writes to complete during tear down
> c84917aa5dfc4c809634120fb429f0ff590a1f75 md: do not return existing
> mddevs from mddev_find_or_alloc
>
> <0>[ 2086.596361] traps: PANIC: double fault, error_code: 0x0
> <4>[ 2086.596364] double fault: 0000 [#1] SMP NOPTI
> <4>[ 2086.596365] CPU: 40 PID: 0 Comm: swapper/40 Tainted: G
> OE     5.12.0-rc3-md-next-pw-fix2 #1
> <4>[ 2086.596365] Hardware name: empty S8030GM2NE/S8030GM2NE, BIOS
> V4.00 03/11/2021
> <4>[ 2086.596366] RIP: 0010:__slab_free+0x26/0x380
> <4>[ 2086.596367] Code: 1f 44 00 00 0f 1f 44 00 00 55 49 89 ca 45 89
> c3 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83 e4 f0
> 48 83 ec 70 <48> 89 54 24 28 0f 1f 44 00 00 41 8b 46 28 4d 8b 6c 24 20
> 49 8b 5c
> <4>[ 2086.596368] RSP: 0018:ffff96fa4d1c8f90 EFLAGS: 00010086
> <4>[ 2086.596369] RAX: ffff892f4fc35300 RBX: ffff890fdca3ff78 RCX:
> ffff890fdca3ff78
> <4>[ 2086.596370] RDX: ffff890fdca3ff78 RSI: fffff393c1728fc0 RDI:
> ffff892fd3a9b300
> <4>[ 2086.596370] RBP: ffff96fa4d1c9030 R08: 0000000000000001 R09:
> ffffffffb66500a7
> <4>[ 2086.596371] R10: ffff890fdca3ff78 R11: 0000000000000001 R12:
> fffff393c1728fc0
> <4>[ 2086.596371] R13: fffff393c1728fc0 R14: ffff892fd3a9b300 R15:
> 0000000000000000
> <4>[ 2086.596372] FS:  0000000000000000(0000)
> GS:ffff892f4fc00000(0000) knlGS:0000000000000000
> <4>[ 2086.596373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[ 2086.596373] CR2: ffff96fa4d1c8f88 CR3: 00000010c8210000 CR4:
> 0000000000350ee0
> <4>[ 2086.596373] Call Trace:
> <4>[ 2086.596374]  <IRQ>
> <4>[ 2086.596374]  ? kmem_cache_free+0x3d2/0x420
> <4>[ 2086.596374]  ? mempool_free_slab+0x17/0x20
> <4>[ 2086.596375]  ? mempool_free_slab+0x17/0x20
> <4>[ 2086.596375]  ? mempool_free+0x2f/0x80
> <4>[ 2086.596376]  ? md_end_io+0x47/0x60
> <4>[ 2086.596376]  ? bio_endio+0xee/0x140
> <4>[ 2086.596376]  ? bio_chain_endio+0x2d/0x40
> <4>[ 2086.596377]  ? md_end_io+0x59/0x60
> <4>[ 2086.596377]  ? bio_endio+0xee/0x140
> <4>[ 2086.596378]  ? bio_chain_endio+0x2d/0x40
> ...
> <4>[ 2086.596485] Lost 340 message(s)!
>
> Dumping last: `watch -n0.1 ...`:
>
> stat 451 0 13738 76 67823521 0 17162428974 801228700 351 1886784
> 801228776 0 0 0 0 0 0
> inflight 0 354
>
> md_io/aliases 4
> md_io/align 8
> md_io/cache_dma 0
> md_io/cpu_partial 30
> md_io/cpu_slabs 1235 N0=288 N1=239 N2=463 N3=245
> md_io/destroy_by_rcu 0
> md_io/hwcache_align 0
> md_io/min_partial 5
> md_io/objects 126582 N0=26928 N1=28254 N2=44064 N3=27336
> md_io/object_size 40
> md_io/objects_partial 0
> md_io/objs_per_slab 102
> md_io/order 0
> md_io/partial 8 N2=4 N3=4
> md_io/poison 0
> md_io/reclaim_account 0
> md_io/red_zone 0
> md_io/remote_node_defrag_ratio 100
> md_io/sanity_checks 0
> md_io/slabs 1249 N0=264 N1=277 N2=436 N3=272
> md_io/slabs_cpu_partial 1172(1172) C0=14(14) C1=17(17) C2=17(17)
> C3=17(17) C4=18(18) C5=17(17) C6=18(18) C7=17(17) C8=21(21) C9=4(4)
> C10=21(21) C11=22(22) C12=21(21) C13=25(25) C14=15(15) C15
> =19(19) C16=24(24) C17=19(19) C18=22(22) C19=15(15) C20=28(28)
> C21=18(18) C22=18(18) C23=24(24) C24=17(17) C25=27(27) C26=8(8)
> C27=18(18) C28=17(17) C29=18(18) C30=21(21) C31=21(21) C32=9(9)
> C33=9(9) C34=24(24) C35=11(11) C36=19(19) C37=14(14) C38=18(18)
> C39=4(4) C40=23(23) C41=20(20) C42=18(18) C43=22(22) C44=21(21)
> C45=24(24) C46=20(20) C47=15(15) C48=17(17) C49=15(15) C50=16(1
> 6) C51=26(26) C52=21(21) C53=19(19) C54=16(16) C55=18(18) C56=26(26)
> C57=14(14) C58=18(18) C59=23(23) C60=18(18) C61=20(20) C62=19(19)
> C63=17(17)
> md_io/slab_size 40
> md_io/store_user 0
> md_io/total_objects 127398 N0=26928 N1=28254 N2=44472 N3=27744
> md_io/trace 0
> md_io/usersize 0
>
> > I am not able to reproduce the issue [...]
>
> It crashes on two different machines. Next week I'm going to upgrade a
> distro on an older machine (with Intel NVMe disks, different
> motherboard and Xeon instead of EPYC 2 CPU) running currently
> linux-5.4 without this issue. So I will let you know if switching to a
> newer kernel with "improved io stats accounting" makes it unstable or
> not.
>
> Best regards,
> Pawel Wiejacha.
>
> On Thu, 15 Apr 2021 at 08:35, Song Liu <song@kernel.org> wrote:
> >
> > On Wed, Apr 14, 2021 at 5:36 PM Song Liu <song@kernel.org> wrote:
> > >
> > > On Tue, Apr 13, 2021 at 5:05 AM Paweł Wiejacha
> > > <pawel.wiejacha@rtbhouse.com> wrote:
> > > >
> > > > Hello Song,
> > > >
> > > > That code does not compile, but I guess that what you meant was
> > > > something like this:
> > >
> > > Yeah.. I am really sorry for the noise.
> > >
> > > >
> > > > diff --git drivers/md/md.c drivers/md/md.c
> > > > index 04384452a..cbc97a96b 100644
> > > > --- drivers/md/md.c
> > > > +++ drivers/md/md.c
> > > > @@ -78,6 +78,7 @@ static DEFINE_SPINLOCK(pers_lock);
> > > >
> > > >  static struct kobj_type md_ktype;
> > > >
> > > > +struct kmem_cache *md_io_cache;
> > > >  struct md_cluster_operations *md_cluster_ops;
> > > >  EXPORT_SYMBOL(md_cluster_ops);
> > > >  static struct module *md_cluster_mod;
> > > > @@ -5701,8 +5702,8 @@ static int md_alloc(dev_t dev, char *name)
> > > >          */
> > > >         mddev->hold_active = UNTIL_STOP;
> > > >
> > > > -   error = mempool_init_kmalloc_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
> > > > -                     sizeof(struct md_io));
> > > > +   error = mempool_init_slab_pool(&mddev->md_io_pool, BIO_POOL_SIZE,
> > > > +                     md_io_cache);
> > > >     if (error)
> > > >         goto abort;
> > > >
> > > > @@ -9542,6 +9543,10 @@ static int __init md_init(void)
> > > >  {
> > > >     int ret = -ENOMEM;
> > > >
> > > > +   md_io_cache = KMEM_CACHE(md_io, 0);
> > > > +   if (!md_io_cache)
> > > > +       goto err_md_io_cache;
> > > > +
> > > >     md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0);
> > > >     if (!md_wq)
> > > >         goto err_wq;
> > > > @@ -9578,6 +9583,8 @@ static int __init md_init(void)
> > > >  err_misc_wq:
> > > >     destroy_workqueue(md_wq);
> > > >  err_wq:
> > > > +   kmem_cache_destroy(md_io_cache);
> > > > +err_md_io_cache:
> > > >     return ret;
> > > >  }
> > > >
> > > > @@ -9863,6 +9870,7 @@ static __exit void md_exit(void)
> > > >     destroy_workqueue(md_rdev_misc_wq);
> > > >     destroy_workqueue(md_misc_wq);
> > > >     destroy_workqueue(md_wq);
> > > > +   kmem_cache_destroy(md_io_cache);
> > > >  }
> > > >
> > > >  subsys_initcall(md_init);
> > >
> > > [...]
> > >
> > > >
> > > > $ watch -n0.2 'cat /proc/meminfo | paste - - | tee -a ~/meminfo'
> > > > MemTotal:       528235648 kB    MemFree:        20002732 kB
> > > > MemAvailable:   483890268 kB    Buffers:            7356 kB
> > > > Cached:         495416180 kB    SwapCached:            0 kB
> > > > Active:         96396800 kB     Inactive:       399891308 kB
> > > > Active(anon):      10976 kB     Inactive(anon):   890908 kB
> > > > Active(file):   96385824 kB     Inactive(file): 399000400 kB
> > > > Unevictable:       78768 kB     Mlocked:           78768 kB
> > > > SwapTotal:             0 kB     SwapFree:              0 kB
> > > > Dirty:          88422072 kB     Writeback:        948756 kB
> > > > AnonPages:        945772 kB     Mapped:            57300 kB
> > > > Shmem:             26300 kB     KReclaimable:    7248160 kB
> > > > Slab:            7962748 kB     SReclaimable:    7248160 kB
> > > > SUnreclaim:       714588 kB     KernelStack:       18288 kB
> > > > PageTables:        10796 kB     NFS_Unstable:          0 kB
> > > > Bounce:                0 kB     WritebackTmp:          0 kB
> > > > CommitLimit:    264117824 kB    Committed_AS:   21816824 kB
> > > > VmallocTotal:   34359738367 kB  VmallocUsed:      561588 kB
> > > > VmallocChunk:          0 kB     Percpu:            65792 kB
> > > > HardwareCorrupted:     0 kB     AnonHugePages:         0 kB
> > > > ShmemHugePages:        0 kB     ShmemPmdMapped:        0 kB
> > > > FileHugePages:         0 kB     FilePmdMapped:         0 kB
> > > > HugePages_Total:       0        HugePages_Free:        0
> > > > HugePages_Rsvd:        0        HugePages_Surp:        0
> > > > Hugepagesize:       2048 kB     Hugetlb:               0 kB
> > > > DirectMap4k:      541000 kB     DirectMap2M:    11907072 kB
> > > > DirectMap1G:    525336576 kB
> > > >
> > >
> > > And thanks for these information.
> > >
> > > I have set up a system to run the test, the code I am using is the top of the
> > > md-next branch. I will update later tonight on the status.
> >
> > I am not able to reproduce the issue after 6 hours. Maybe it is because I run
> > tests on 3 partitions of the same nvme SSD. I will try on a different host with
> > multiple SSDs.
> >
> > Pawel, have you tried to repro with md-next branch?
> >
> > Thanks,
> > Song

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: double fault in md_end_io
  2021-04-09 21:40 PROBLEM: double fault in md_end_io Paweł Wiejacha
  2021-04-12  6:48 ` Song Liu
@ 2021-04-23  2:36 ` Guoqing Jiang
  2021-04-23  6:44   ` Song Liu
  1 sibling, 1 reply; 13+ messages in thread
From: Guoqing Jiang @ 2021-04-23  2:36 UTC (permalink / raw)
  To: Paweł Wiejacha, song; +Cc: linux-raid, artur.paszkiewicz



On 2021/4/10 上午5:40, Paweł Wiejacha wrote:
> Hello,
>
> Two of my machines constantly crash with a double fault like this:
>
> 1146  <0>[33685.629591] traps: PANIC: double fault, error_code: 0x0
> 1147  <4>[33685.629593] double fault: 0000 [#1] SMP NOPTI
> 1148  <4>[33685.629594] CPU: 10 PID: 2118287 Comm: kworker/10:0
> Tainted: P           OE     5.11.8-051108-generic #202103200636
> 1149  <4>[33685.629595] Hardware name: ASUSTeK COMPUTER INC. KRPG-U8
> Series/KRPG-U8 Series, BIOS 4201 09/25/2020
> 1150  <4>[33685.629595] Workqueue: xfs-conv/md12 xfs_end_io [xfs]
> 1151  <4>[33685.629596] RIP: 0010:__slab_free+0x23/0x340
> 1152  <4>[33685.629597] Code: 4c fe ff ff 0f 1f 00 0f 1f 44 00 00 55
> 48 89 e5 41 57 49 89 cf 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83
> e4 f0 48 83 ec 70 <48> 89 54 24 28 0f 1f 44 00 00 41 8b 46 28 4d 8b 6c
> 24 20 49 8b 5c
> 1153  <4>[33685.629598] RSP: 0018:ffffa9bc00848fa0 EFLAGS: 00010086
> 1154  <4>[33685.629599] RAX: ffff94c04d8b10a0 RBX: ffff94437a34a880
> RCX: ffff94437a34a880
> 1155  <4>[33685.629599] RDX: ffff94437a34a880 RSI: ffffcec745e8d280
> RDI: ffff944300043b00
> 1156  <4>[33685.629599] RBP: ffffa9bc00849040 R08: 0000000000000001
> R09: ffffffff82a5d6de
> 1157  <4>[33685.629600] R10: 0000000000000001 R11: 000000009c109000
> R12: ffffcec745e8d280
> 1158  <4>[33685.629600] R13: ffff944300043b00 R14: ffff944300043b00
> R15: ffff94437a34a880
> 1159  <4>[33685.629601] FS:  0000000000000000(0000)
> GS:ffff94c04d880000(0000) knlGS:0000000000000000
> 1160  <4>[33685.629601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 1161  <4>[33685.629602] CR2: ffffa9bc00848f98 CR3: 000000014d04e000
> CR4: 0000000000350ee0
> 1162  <4>[33685.629602] Call Trace:
> 1163  <4>[33685.629603]  <IRQ>
> 1164  <4>[33685.629603]  ? kfree+0x3bc/0x3e0
> 1165  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
> 1166  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
> 1167  <4>[33685.629604]  ? mempool_free+0x2f/0x80
> 1168  <4>[33685.629604]  ? md_end_io+0x4a/0x70
> 1169  <4>[33685.629604]  ? bio_endio+0xdc/0x130
> 1170  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
> 1171  <4>[33685.629605]  ? md_end_io+0x5c/0x70
> 1172  <4>[33685.629605]  ? bio_endio+0xdc/0x130
> 1173  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
> 1174  <4>[33685.629606]  ? md_end_io+0x5c/0x70
> 1175  <4>[33685.629606]  ? bio_endio+0xdc/0x130
> 1176  <4>[33685.629606]  ? bio_chain_endio+0x2d/0x40
> 1177  <4>[33685.629607]  ? md_end_io+0x5c/0x70
> ... repeated ...
> 1436  <4>[33685.629677]  ? bio_endio+0xdc/0x130
> 1437  <4>[33685.629677]  ? bio_chain_endio+0x2d/0x40
> 1438  <4>[33685.629677]  ? md_end_io+0x5c/0x70
> 1439  <4>[33685.629677]  ? bio_endio+0xdc/0x130
> 1440  <4>[33685.629678]  ? bio_chain_endio+0x2d/0x40
> 1441  <4>[33685.629678]  ? md_
> 1442  <4>[33685.629679] Lost 357 message(s)!
>
> This happens on:
> 5.11.8-051108-generic #202103200636 SMP Sat Mar 20 11:17:32 UTC 2021
> and on 5.8.0-44-generic #50~20.04.1-Ubuntu
> (https://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_5.8.0-44.50/changelog)
> which contains backported
> https://github.com/torvalds/linux/commit/41d2d848e5c09209bdb57ff9c0ca34075e22783d
> ("md: improve io stats accounting").
> The 5.8.18-050818-generic #202011011237 SMP Sun Nov 1 12:40:15 UTC
> 2020 which does not contain above suspected change does not crash.
>
> If there's a better way/place to report this bug just let me know. If
> not, here are steps to reproduce:
>
> 1. Create a RAID 0 device using three Micron_9300_MTFDHAL7T6TDP disks.
> mdadm --create --verbose /dev/md12 --level=stripe --raid-devices=3
> /dev/nvme0n1p1 /dev/nvme1n1p1 /dev/nvme2n1p1
>
> 2. Setup xfs on it:
> mkfs.xfs /dev/md12 and mount it
>
> 3. Write to a file on this filesystem:
> while true; do rm -rf /mnt/md12/crash* ; for i in `seq 8`; do dd
> if=/dev/zero of=/mnt/md12/crash$i bs=32K count=50000000 & done; wait;
> done
> Wait for a crash (usually less than 20 min).
>
> I couldn't reproduce it with a single dd process (maybe I have to wait
> a little longer), but a single cat
> /very/large/file/on/cephfs/over100GbE > /mnt/md12/crash is enough for
> this double fault to occur.

I guess it is related with bio split, if raid0_make_request calls 
bio_chain for the split bio, then
it's bi_end_io is changed to bio_chain_endio. Could you try this?

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -489,7 +489,7 @@ static blk_qc_t md_submit_bio(struct bio *bio)
                 return BLK_QC_T_NONE;
         }

-       if (bio->bi_end_io != md_end_io) {
+       if (bio->bi_end_io != md_end_io && bio->bi_end_io != 
bio_chain_endio) {

If the above works, then we could miss the statistics of split bio from 
blk_queue_split which is
called before hijack bi_end_io, so we may need to move blk_queue_split 
after hijack bi_end_io.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: double fault in md_end_io
  2021-04-23  2:36 ` Guoqing Jiang
@ 2021-04-23  6:44   ` Song Liu
  2021-05-04 21:17     ` Paweł Wiejacha
  0 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2021-04-23  6:44 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Paweł Wiejacha, linux-raid, Artur Paszkiewicz

On Thu, Apr 22, 2021 at 7:36 PM Guoqing Jiang <jgq516@gmail.com> wrote:
>
>
>
> On 2021/4/10 上午5:40, Paweł Wiejacha wrote:
> > Hello,
> >
> > Two of my machines constantly crash with a double fault like this:
> >
> > 1146  <0>[33685.629591] traps: PANIC: double fault, error_code: 0x0
> > 1147  <4>[33685.629593] double fault: 0000 [#1] SMP NOPTI
> > 1148  <4>[33685.629594] CPU: 10 PID: 2118287 Comm: kworker/10:0
> > Tainted: P           OE     5.11.8-051108-generic #202103200636
> > 1149  <4>[33685.629595] Hardware name: ASUSTeK COMPUTER INC. KRPG-U8
> > Series/KRPG-U8 Series, BIOS 4201 09/25/2020
> > 1150  <4>[33685.629595] Workqueue: xfs-conv/md12 xfs_end_io [xfs]
> > 1151  <4>[33685.629596] RIP: 0010:__slab_free+0x23/0x340
> > 1152  <4>[33685.629597] Code: 4c fe ff ff 0f 1f 00 0f 1f 44 00 00 55
> > 48 89 e5 41 57 49 89 cf 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83
> > e4 f0 48 83 ec 70 <48> 89 54 24 28 0f 1f 44 00 00 41 8b 46 28 4d 8b 6c
> > 24 20 49 8b 5c
> > 1153  <4>[33685.629598] RSP: 0018:ffffa9bc00848fa0 EFLAGS: 00010086
> > 1154  <4>[33685.629599] RAX: ffff94c04d8b10a0 RBX: ffff94437a34a880
> > RCX: ffff94437a34a880
> > 1155  <4>[33685.629599] RDX: ffff94437a34a880 RSI: ffffcec745e8d280
> > RDI: ffff944300043b00
> > 1156  <4>[33685.629599] RBP: ffffa9bc00849040 R08: 0000000000000001
> > R09: ffffffff82a5d6de
> > 1157  <4>[33685.629600] R10: 0000000000000001 R11: 000000009c109000
> > R12: ffffcec745e8d280
> > 1158  <4>[33685.629600] R13: ffff944300043b00 R14: ffff944300043b00
> > R15: ffff94437a34a880
> > 1159  <4>[33685.629601] FS:  0000000000000000(0000)
> > GS:ffff94c04d880000(0000) knlGS:0000000000000000
> > 1160  <4>[33685.629601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > 1161  <4>[33685.629602] CR2: ffffa9bc00848f98 CR3: 000000014d04e000
> > CR4: 0000000000350ee0
> > 1162  <4>[33685.629602] Call Trace:
> > 1163  <4>[33685.629603]  <IRQ>
> > 1164  <4>[33685.629603]  ? kfree+0x3bc/0x3e0
> > 1165  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
> > 1166  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
> > 1167  <4>[33685.629604]  ? mempool_free+0x2f/0x80
> > 1168  <4>[33685.629604]  ? md_end_io+0x4a/0x70
> > 1169  <4>[33685.629604]  ? bio_endio+0xdc/0x130
> > 1170  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
> > 1171  <4>[33685.629605]  ? md_end_io+0x5c/0x70
> > 1172  <4>[33685.629605]  ? bio_endio+0xdc/0x130
> > 1173  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
> > 1174  <4>[33685.629606]  ? md_end_io+0x5c/0x70
> > 1175  <4>[33685.629606]  ? bio_endio+0xdc/0x130
> > 1176  <4>[33685.629606]  ? bio_chain_endio+0x2d/0x40
> > 1177  <4>[33685.629607]  ? md_end_io+0x5c/0x70
> > ... repeated ...
> > 1436  <4>[33685.629677]  ? bio_endio+0xdc/0x130
> > 1437  <4>[33685.629677]  ? bio_chain_endio+0x2d/0x40
> > 1438  <4>[33685.629677]  ? md_end_io+0x5c/0x70
> > 1439  <4>[33685.629677]  ? bio_endio+0xdc/0x130
> > 1440  <4>[33685.629678]  ? bio_chain_endio+0x2d/0x40
> > 1441  <4>[33685.629678]  ? md_
> > 1442  <4>[33685.629679] Lost 357 message(s)!
> >
> > This happens on:
> > 5.11.8-051108-generic #202103200636 SMP Sat Mar 20 11:17:32 UTC 2021
> > and on 5.8.0-44-generic #50~20.04.1-Ubuntu
> > (https://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_5.8.0-44.50/changelog)
> > which contains backported
> > https://github.com/torvalds/linux/commit/41d2d848e5c09209bdb57ff9c0ca34075e22783d
> > ("md: improve io stats accounting").
> > The 5.8.18-050818-generic #202011011237 SMP Sun Nov 1 12:40:15 UTC
> > 2020 which does not contain above suspected change does not crash.
> >
> > If there's a better way/place to report this bug just let me know. If
> > not, here are steps to reproduce:
> >
> > 1. Create a RAID 0 device using three Micron_9300_MTFDHAL7T6TDP disks.
> > mdadm --create --verbose /dev/md12 --level=stripe --raid-devices=3
> > /dev/nvme0n1p1 /dev/nvme1n1p1 /dev/nvme2n1p1
> >
> > 2. Setup xfs on it:
> > mkfs.xfs /dev/md12 and mount it
> >
> > 3. Write to a file on this filesystem:
> > while true; do rm -rf /mnt/md12/crash* ; for i in `seq 8`; do dd
> > if=/dev/zero of=/mnt/md12/crash$i bs=32K count=50000000 & done; wait;
> > done
> > Wait for a crash (usually less than 20 min).
> >
> > I couldn't reproduce it with a single dd process (maybe I have to wait
> > a little longer), but a single cat
> > /very/large/file/on/cephfs/over100GbE > /mnt/md12/crash is enough for
> > this double fault to occur.
>
> I guess it is related with bio split, if raid0_make_request calls
> bio_chain for the split bio, then
> it's bi_end_io is changed to bio_chain_endio. Could you try this?
>
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -489,7 +489,7 @@ static blk_qc_t md_submit_bio(struct bio *bio)
>                  return BLK_QC_T_NONE;
>          }
>
> -       if (bio->bi_end_io != md_end_io) {
> +       if (bio->bi_end_io != md_end_io && bio->bi_end_io !=
> bio_chain_endio) {
>
> If the above works, then we could miss the statistics of split bio from
> blk_queue_split which is
> called before hijack bi_end_io, so we may need to move blk_queue_split
> after hijack bi_end_io.

Thanks Guoqing! This is likely the problem here.

Pawel, please give this a try.

Thanks,
Song

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: double fault in md_end_io
  2021-04-23  6:44   ` Song Liu
@ 2021-05-04 21:17     ` Paweł Wiejacha
  2021-05-06  5:48       ` Song Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Paweł Wiejacha @ 2021-05-04 21:17 UTC (permalink / raw)
  To: Song Liu; +Cc: Guoqing Jiang, linux-raid, Artur Paszkiewicz

Guoqing's patch fixes the problem. Here's the actual patch I am using:

-static void bio_chain_endio(struct bio *bio)
+void bio_chain_endio(struct bio *bio)
 {
    bio_endio(__bio_chain_endio(bio));
 }
+EXPORT_SYMBOL(bio_chain_endio);

 /**
  * bio_chain - chain bio completions
diff --git drivers/md/md.c drivers/md/md.c
index 04384452a7ab..f157bd6e0478 100644
--- drivers/md/md.c
+++ drivers/md/md.c
@@ -507,7 +507,8 @@ static blk_qc_t md_submit_bio(struct bio *bio)
        return BLK_QC_T_NONE;
    }

-   if (bio->bi_end_io != md_end_io) {
+   if (bio->bi_end_io != md_end_io && bio->bi_end_io !=
+                bio_chain_endio) {
        struct md_io *md_io;

        md_io = mempool_alloc(&mddev->md_io_pool, GFP_NOIO);
diff --git include/linux/bio.h include/linux/bio.h
index 1edda614f7ce..bfb5bd0be397 100644
--- include/linux/bio.h
+++ include/linux/bio.h
@@ -427,6 +427,7 @@ static inline struct bio *bio_kmalloc(gfp_t
gfp_mask, unsigned int nr_iovecs)
 extern blk_qc_t submit_bio(struct bio *);

 extern void bio_endio(struct bio *);
+extern void bio_chain_endio(struct bio *bio);

Thanks,
Paweł Wiejacha

On Fri, 23 Apr 2021 at 08:44, Song Liu <song@kernel.org> wrote:
>
> On Thu, Apr 22, 2021 at 7:36 PM Guoqing Jiang <jgq516@gmail.com> wrote:
> >
> >
> >
> > On 2021/4/10 上午5:40, Paweł Wiejacha wrote:
> > > Hello,
> > >
> > > Two of my machines constantly crash with a double fault like this:
> > >
> > > 1146  <0>[33685.629591] traps: PANIC: double fault, error_code: 0x0
> > > 1147  <4>[33685.629593] double fault: 0000 [#1] SMP NOPTI
> > > 1148  <4>[33685.629594] CPU: 10 PID: 2118287 Comm: kworker/10:0
> > > Tainted: P           OE     5.11.8-051108-generic #202103200636
> > > 1149  <4>[33685.629595] Hardware name: ASUSTeK COMPUTER INC. KRPG-U8
> > > Series/KRPG-U8 Series, BIOS 4201 09/25/2020
> > > 1150  <4>[33685.629595] Workqueue: xfs-conv/md12 xfs_end_io [xfs]
> > > 1151  <4>[33685.629596] RIP: 0010:__slab_free+0x23/0x340
> > > 1152  <4>[33685.629597] Code: 4c fe ff ff 0f 1f 00 0f 1f 44 00 00 55
> > > 48 89 e5 41 57 49 89 cf 41 56 49 89 fe 41 55 41 54 49 89 f4 53 48 83
> > > e4 f0 48 83 ec 70 <48> 89 54 24 28 0f 1f 44 00 00 41 8b 46 28 4d 8b 6c
> > > 24 20 49 8b 5c
> > > 1153  <4>[33685.629598] RSP: 0018:ffffa9bc00848fa0 EFLAGS: 00010086
> > > 1154  <4>[33685.629599] RAX: ffff94c04d8b10a0 RBX: ffff94437a34a880
> > > RCX: ffff94437a34a880
> > > 1155  <4>[33685.629599] RDX: ffff94437a34a880 RSI: ffffcec745e8d280
> > > RDI: ffff944300043b00
> > > 1156  <4>[33685.629599] RBP: ffffa9bc00849040 R08: 0000000000000001
> > > R09: ffffffff82a5d6de
> > > 1157  <4>[33685.629600] R10: 0000000000000001 R11: 000000009c109000
> > > R12: ffffcec745e8d280
> > > 1158  <4>[33685.629600] R13: ffff944300043b00 R14: ffff944300043b00
> > > R15: ffff94437a34a880
> > > 1159  <4>[33685.629601] FS:  0000000000000000(0000)
> > > GS:ffff94c04d880000(0000) knlGS:0000000000000000
> > > 1160  <4>[33685.629601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > 1161  <4>[33685.629602] CR2: ffffa9bc00848f98 CR3: 000000014d04e000
> > > CR4: 0000000000350ee0
> > > 1162  <4>[33685.629602] Call Trace:
> > > 1163  <4>[33685.629603]  <IRQ>
> > > 1164  <4>[33685.629603]  ? kfree+0x3bc/0x3e0
> > > 1165  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
> > > 1166  <4>[33685.629603]  ? mempool_kfree+0xe/0x10
> > > 1167  <4>[33685.629604]  ? mempool_free+0x2f/0x80
> > > 1168  <4>[33685.629604]  ? md_end_io+0x4a/0x70
> > > 1169  <4>[33685.629604]  ? bio_endio+0xdc/0x130
> > > 1170  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
> > > 1171  <4>[33685.629605]  ? md_end_io+0x5c/0x70
> > > 1172  <4>[33685.629605]  ? bio_endio+0xdc/0x130
> > > 1173  <4>[33685.629605]  ? bio_chain_endio+0x2d/0x40
> > > 1174  <4>[33685.629606]  ? md_end_io+0x5c/0x70
> > > 1175  <4>[33685.629606]  ? bio_endio+0xdc/0x130
> > > 1176  <4>[33685.629606]  ? bio_chain_endio+0x2d/0x40
> > > 1177  <4>[33685.629607]  ? md_end_io+0x5c/0x70
> > > ... repeated ...
> > > 1436  <4>[33685.629677]  ? bio_endio+0xdc/0x130
> > > 1437  <4>[33685.629677]  ? bio_chain_endio+0x2d/0x40
> > > 1438  <4>[33685.629677]  ? md_end_io+0x5c/0x70
> > > 1439  <4>[33685.629677]  ? bio_endio+0xdc/0x130
> > > 1440  <4>[33685.629678]  ? bio_chain_endio+0x2d/0x40
> > > 1441  <4>[33685.629678]  ? md_
> > > 1442  <4>[33685.629679] Lost 357 message(s)!
> > >
> > > This happens on:
> > > 5.11.8-051108-generic #202103200636 SMP Sat Mar 20 11:17:32 UTC 2021
> > > and on 5.8.0-44-generic #50~20.04.1-Ubuntu
> > > (https://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_5.8.0-44.50/changelog)
> > > which contains backported
> > > https://github.com/torvalds/linux/commit/41d2d848e5c09209bdb57ff9c0ca34075e22783d
> > > ("md: improve io stats accounting").
> > > The 5.8.18-050818-generic #202011011237 SMP Sun Nov 1 12:40:15 UTC
> > > 2020 which does not contain above suspected change does not crash.
> > >
> > > If there's a better way/place to report this bug just let me know. If
> > > not, here are steps to reproduce:
> > >
> > > 1. Create a RAID 0 device using three Micron_9300_MTFDHAL7T6TDP disks.
> > > mdadm --create --verbose /dev/md12 --level=stripe --raid-devices=3
> > > /dev/nvme0n1p1 /dev/nvme1n1p1 /dev/nvme2n1p1
> > >
> > > 2. Setup xfs on it:
> > > mkfs.xfs /dev/md12 and mount it
> > >
> > > 3. Write to a file on this filesystem:
> > > while true; do rm -rf /mnt/md12/crash* ; for i in `seq 8`; do dd
> > > if=/dev/zero of=/mnt/md12/crash$i bs=32K count=50000000 & done; wait;
> > > done
> > > Wait for a crash (usually less than 20 min).
> > >
> > > I couldn't reproduce it with a single dd process (maybe I have to wait
> > > a little longer), but a single cat
> > > /very/large/file/on/cephfs/over100GbE > /mnt/md12/crash is enough for
> > > this double fault to occur.
> >
> > I guess it is related with bio split, if raid0_make_request calls
> > bio_chain for the split bio, then
> > it's bi_end_io is changed to bio_chain_endio. Could you try this?
> >
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -489,7 +489,7 @@ static blk_qc_t md_submit_bio(struct bio *bio)
> >                  return BLK_QC_T_NONE;
> >          }
> >
> > -       if (bio->bi_end_io != md_end_io) {
> > +       if (bio->bi_end_io != md_end_io && bio->bi_end_io !=
> > bio_chain_endio) {
> >
> > If the above works, then we could miss the statistics of split bio from
> > blk_queue_split which is
> > called before hijack bi_end_io, so we may need to move blk_queue_split
> > after hijack bi_end_io.
>
> Thanks Guoqing! This is likely the problem here.
>
> Pawel, please give this a try.
>
> Thanks,
> Song

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: PROBLEM: double fault in md_end_io
  2021-05-04 21:17     ` Paweł Wiejacha
@ 2021-05-06  5:48       ` Song Liu
  2021-05-06 23:46         ` Guoqing Jiang
  0 siblings, 1 reply; 13+ messages in thread
From: Song Liu @ 2021-05-06  5:48 UTC (permalink / raw)
  To: Paweł Wiejacha; +Cc: Guoqing Jiang, linux-raid, Artur Paszkiewicz

Hi Paweł,

On Tue, May 4, 2021 at 2:18 PM Paweł Wiejacha
<pawel.wiejacha@rtbhouse.com> wrote:
>
> Guoqing's patch fixes the problem. Here's the actual patch I am using:

Thanks for running the tests.

Hi Guoqing,

Could you please send official patch for this fix?

Thanks,
Song

>
> -static void bio_chain_endio(struct bio *bio)
> +void bio_chain_endio(struct bio *bio)
>  {
>     bio_endio(__bio_chain_endio(bio));
>  }
> +EXPORT_SYMBOL(bio_chain_endio);
>
>  /**
>   * bio_chain - chain bio completions
> diff --git drivers/md/md.c drivers/md/md.c
> index 04384452a7ab..f157bd6e0478 100644
> --- drivers/md/md.c
> +++ drivers/md/md.c
> @@ -507,7 +507,8 @@ static blk_qc_t md_submit_bio(struct bio *bio)
>         return BLK_QC_T_NONE;
>     }
>
> -   if (bio->bi_end_io != md_end_io) {
> +   if (bio->bi_end_io != md_end_io && bio->bi_end_io !=
> +                bio_chain_endio) {
>         struct md_io *md_io;
>
>         md_io = mempool_alloc(&mddev->md_io_pool, GFP_NOIO);
> diff --git include/linux/bio.h include/linux/bio.h
> index 1edda614f7ce..bfb5bd0be397 100644
> --- include/linux/bio.h
> +++ include/linux/bio.h
> @@ -427,6 +427,7 @@ static inline struct bio *bio_kmalloc(gfp_t
> gfp_mask, unsigned int nr_iovecs)
>  extern blk_qc_t submit_bio(struct bio *);
>
>  extern void bio_endio(struct bio *);
> +extern void bio_chain_endio(struct bio *bio);

[...]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: double fault in md_end_io
  2021-05-06  5:48       ` Song Liu
@ 2021-05-06 23:46         ` Guoqing Jiang
  2021-05-08  1:17           ` Guoqing Jiang
  0 siblings, 1 reply; 13+ messages in thread
From: Guoqing Jiang @ 2021-05-06 23:46 UTC (permalink / raw)
  To: Song Liu, Paweł Wiejacha; +Cc: linux-raid, Artur Paszkiewicz



On 5/6/21 1:48 PM, Song Liu wrote:
> Hi Paweł,
>
> On Tue, May 4, 2021 at 2:18 PM Paweł Wiejacha
> <pawel.wiejacha@rtbhouse.com> wrote:
>> Guoqing's patch fixes the problem. Here's the actual patch I am using:
> Thanks for running the tests.

Thanks for the test, the better way could be just call bio_flagged(bio, 
BIO_CHAIN)
then no need to export bi_chain_endio.

>
> Hi Guoqing,
>
> Could you please send official patch for this fix?

Will do, and I think then we need to figure out a way to account the 
statistic of split
bio properly.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: double fault in md_end_io
  2021-05-06 23:46         ` Guoqing Jiang
@ 2021-05-08  1:17           ` Guoqing Jiang
  0 siblings, 0 replies; 13+ messages in thread
From: Guoqing Jiang @ 2021-05-08  1:17 UTC (permalink / raw)
  To: Song Liu, Paweł Wiejacha; +Cc: linux-raid, Artur Paszkiewicz



On 5/7/21 7:46 AM, Guoqing Jiang wrote:
>
>
> On 5/6/21 1:48 PM, Song Liu wrote:
>> Hi Paweł,
>>
>> On Tue, May 4, 2021 at 2:18 PM Paweł Wiejacha
>> <pawel.wiejacha@rtbhouse.com> wrote:
>>> Guoqing's patch fixes the problem. Here's the actual patch I am using:
>> Thanks for running the tests.
>
> Thanks for the test, the better way could be just call 
> bio_flagged(bio, BIO_CHAIN)
> then no need to export bi_chain_endio.

Err, BIO_CHAIN is set for parent bio, then we have to export 
bi_chain_endio if there
is no way to detect whether the bio was split from parent bio or not.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-05-08  1:17 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-09 21:40 PROBLEM: double fault in md_end_io Paweł Wiejacha
2021-04-12  6:48 ` Song Liu
2021-04-13 12:05   ` Paweł Wiejacha
2021-04-15  0:36     ` Song Liu
2021-04-15  6:35       ` Song Liu
2021-04-15 15:35         ` Paweł Wiejacha
2021-04-22 15:40           ` Paweł Wiejacha
2021-04-23  2:36 ` Guoqing Jiang
2021-04-23  6:44   ` Song Liu
2021-05-04 21:17     ` Paweł Wiejacha
2021-05-06  5:48       ` Song Liu
2021-05-06 23:46         ` Guoqing Jiang
2021-05-08  1:17           ` Guoqing Jiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.