On Thu, Apr 28, 2022 at 10:39 AM Andreas Gruenbacher wrote: > > Yes, but note that it's gfs2_file_buffered_write() that fails. When > the pagefault_disable/enable() around iomap_file_buffered_write() is > removed, the corruption goes away. I looked some more at this on and off, and ended up even more confused. For some reason, I'd mostly looked at the read case, because I had mis-read some of your emails and thought it was the buffered reads that caused problems. Then I went back more carefully, and realized you had always said gfs2_file_buffered_write() was where the issues happened, and looked at that path more, and that confused me even *MORE*. Because that case has always done the copy from user space with page faults disabled, because of the traditional deadlock with reading from user space while holding the page lock on the target page cache page. So that is not really about the new deadlock with filesystem locks, that was fixed by 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered I/O"). So now that I'm looking at the right function (maybe) I'm going "huh", because it's none of the complex cases that would seem to fail, it's literally just the fault_in_iov_iter_readable() that we've always done in iomap_write_iter() that presumably starts failing. But *that* old code seems bogus too. It's doing if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) { status = -EFAULT; break; } which on the face of it is sane: it's saying "if we can't fault in any bytes, then stop trying". And it's good, and correct, but it does leave one case open. Because what if the result is "we can fault things in _partially_"? The code blithely goes on and tries to do the whole 'bytes' range _anyway_. Now, with a bug-free filesystem, this really shouldn't matter, since the later copy_page_from_iter_atomic() thing should then DTRT anyway, but this does mean that one fundamental thing that that commit 00bfe02f4796 changed is that it basically disabled that fault_in_iov_iter_readable() that *used* to fault in the whole range, and now potentially only faults in a small area. That, in turn, means that in practice it *used* to do "write_end()" with a fully successful range, ie when it did that status = a_ops->write_end(file, mapping, pos, bytes, copied, page, fsdata); then "bytes" and "copied" were the same. But now that commit 00bfe02f4796 added the "disable_pagefault()" around the whole thing, fault_in_iov_iter_readable() will easily fail half-way instead of bringing the next page in, and then that ->write_begin() to ->write_end() sequence will see the copy in the middle failing half-way too, and you'll have that write_end() condition with the write _partially_ succeeding. Which is the complex case for write_end() that you practically speaking never saw before (it *could* happen with a race with swap-out or similar, but it was not really something you could trigger in real life. And I suspect this is what bites you with gfs2 To *test* that hypothesis, how about you try this attached patch? The generic_perform_write() function in mm/filemap.c has the same exact pattern, but as mentioned, a filesystem really needs to be able to handle the partial write_end() case, so it's not a *bug* in that code, but it migth be triggering a bug in gfs2. And gfs2 only uses the iomap_write_iter() case, I think. So that's the only case this attached patch changes. Again - I think the unpatched iomap_write_iter() code is fine, but I think it may be what then triggers the real bug in gfs2. So this patch is not wrong per se, but this patch is basically a "hide the problem" patch, and it would be very interesting to hear if it does indeed fix your test-case. Because that would pinpoint exactly what the bug is. I'm adding Christoph and Darrick as iomap maintainers here to the participants (and Dave Chinner in case he's also the temporary maintainer because Darrick is doing reviews) not because they necessarily care, but just because this test-patch obviously involves the iomap code. NOTE! This patch is entirely untested. I also didn't actually yet go look at what gfs2 does when 'bytes' and 'copied' are different. But since I finally think I figured out what might be going on, I decided I'd send this out sooner rather than later. Because this is the first thing that makes me go "Aaahh.. This might explain it". Linus