Hi again, On 20. 08. 20 0:58, Chris Murphy wrote: > The problem here I think is that /proc/pid/stack is empty. You might > have to hammer on it a bunch of times to get a stack. I can't tell if > the sysrq+w is enough information to conclusively tell if this is > strictly an md problem or if there's something else going on. > > But I do see in the sysrq+w evidence of a Btrfs snapshot happening, > which will result in a flush of the file system. Since the mdadm raid > journal is on two SSDs which should be fast enough to accept the > metadata changes before actually doing the flush. We were lucky to dump some stacks yesterday, when the issue happened again. In `sysrq.log`, there is an dmesg/kern.log output of echo w > /proc/sysrq-trigger command, when `md1_raid6` started to consume 100% CPU and all btrfs-snapshot-related commands stuck in the *disk sleep* state. We had issued sysrq+w command several times and I have included all unique tasks appeared there. In `stack_md1_reclaim.txt`, `stack_btrfs-transacti.txt` and `stack_btrfs*.txt`, there are outputs of cat /proc//stack of given processes during the same time. The output is still the same when we issued this command several times. In `stack_md1_raid6-[0-3].txt`, there are outputs of the same command where pid is `md1_raid6` process id. We issued that several time and the output differs a bit sometimes. If I get it right, differs only in instruction offset of given functions. I include all the combinations we encounter for case it matters. We have dumped stack of this process in a while-true cycle just after we perform a manual "unstuck" workaround (I described this action in the first e-mail). I can send it to the mailing list as well if needed. I hope these outputs include what you, Song, requested on July the 30th (and I hope it's ok to continue in this thread) On 30. 07. 20 8:45, Song Liu wrote: >> On Jul 29, 2020, at 2:06 PM, Guoqing Jiang wrote: >> >> Hi, >> >> On 7/22/20 10:47 PM, Vojtech Myslivec wrote: >>> 1. What should be the cause of this problem? >> >> Just a quick glance based on the stacks which you attached, I guess >> it could be a deadlock issue of raid5 cache super write. >> >> Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in >> superblock write") didn't fix the problem completely. Cc Song. >> >> And I am curious why md thread is not waked if mddev_trylock fails, >> you can give it a try but I can't promise it helps ... > > Thanks Guoqing! > > I am not sure whether we hit the mddev_trylock() failure. Looks like > the md1_raid6 thread is already running at 100%. > > A few questions: > > 1. I see wbt_wait in the stack trace. Are we using write back > throttling here? > 2. Could you please get the /proc//stack for of md1_raid6? > We may want to sample it multiple times. > > Thanks, > Song Thanks, Vojtech and Michal