All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-bcachefs@vger.kernel.org,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [GIT PULL] bcachefs
Date: Tue, 27 Jun 2023 21:16:31 -0600	[thread overview]
Message-ID: <c06a9e0b-8f3e-4e47-53d0-b4854a98cc44@kernel.dk> (raw)
In-Reply-To: <20230627201524.ool73bps2lre2tsz@moria.home.lan>

On 6/27/23 2:15?PM, Kent Overstreet wrote:
>> to ktest/tests/xfstests/ and run it with -bcachefs, otherwise it kept
>> failing because it assumed it was XFS.
>>
>> I suspected this was just a timing issue, and it looks like that's
>> exactly what it is. Looking at the test case, it'll randomly kill -9
>> fsstress, and if that happens while we have io_uring IO pending, then we
>> process completions inline (for a PF_EXITING current). This means they
>> get pushed to fallback work, which runs out of line. If we hit that case
>> AND the timing is such that it hasn't been processed yet, we'll still be
>> holding a file reference under the mount point and umount will -EBUSY
>> fail.
>>
>> As far as I can tell, this can happen with aio as well, it's just harder
>> to hit. If the fput happens while the task is exiting, then fput will
>> end up being delayed through a workqueue as well. The test case assumes
>> that once it's reaped the exit of the killed task that all files are
>> released, which isn't necessarily true if they are done out-of-line.
> 
> Yeah, I traced it through to the delayed fput code as well.
> 
> I'm not sure delayed fput is responsible here; what I learned when I was
> tracking this down has mostly fell out of my brain, so take anything I
> say with a large grain of salt. But I believe I tested with delayed_fput
> completely disabled, and found another thing in io_uring with the same
> effect as delayed_fput that wasn't being flushed.

I'm not saying it's delayed_fput(), I'm saying it's the delayed putting
io_uring can end up doing. But yes, delayed_fput() is another candidate.

>> For io_uring specifically, it may make sense to wait on the fallback
>> work. The below patch does this, and should fix the issue. But I'm not
>> fully convinced that this is really needed, as I do think this can
>> happen without io_uring as well. It just doesn't right now as the test
>> does buffered IO, and aio will be fully sync with buffered IO. That
>> means there's either no gap where aio will hit it without O_DIRECT, or
>> it's just small enough that it hasn't been hit.
> 
> I just tried your patch and I still have generic/388 failing - it
> might've taken a bit longer to pop this time.

Yep see the same here. Didn't have time to look into it after sending
that email today, just took a quick stab at writing a reproducer and
ended up crashing bcachefs:

[ 1122.384909] workqueue: Failed to create a rescuer kthread for wq "bcachefs": -EINTR
[ 1122.384915] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 1122.385814] Mem abort info:
[ 1122.385962]   ESR = 0x0000000096000004
[ 1122.386161]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 1122.386444]   SET = 0, FnV = 0
[ 1122.386612]   EA = 0, S1PTW = 0
[ 1122.386842]   FSC = 0x04: level 0 translation fault
[ 1122.387168] Data abort info:
[ 1122.387321]   ISV = 0, ISS = 0x00000004
[ 1122.387518]   CM = 0, WnR = 0
[ 1122.387676] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001133da000
[ 1122.388014] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[ 1122.388363] Internal error: Oops: 0000000096000004 [#1] SMP
[ 1122.388659] Modules linked in:
[ 1122.388866] CPU: 4 PID: 23129 Comm: mount Not tainted 6.4.0-02556-ge61c7fc22b68-dirty #3647
[ 1122.389389] Hardware name: linux,dummy-virt (DT)
[ 1122.389682] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 1122.390118] pc : bch2_free_pending_node_rewrites+0x40/0x90
[ 1122.390466] lr : bch2_free_pending_node_rewrites+0x28/0x90
[ 1122.390815] sp : ffff80002481b770
[ 1122.391030] x29: ffff80002481b770 x28: ffff0000e1d24000 x27: 00000000fffff7b7
[ 1122.391475] x26: 0000000000000000 x25: ffff0000e1d00040 x24: dead000000000122
[ 1122.391919] x23: dead000000000100 x22: ffff0000e1d031b8 x21: ffff0000e1d00040
[ 1122.392366] x20: 0000000000000000 x19: ffff0000e1d031a8 x18: 0000000000000009
[ 1122.392860] x17: 3a22736665686361 x16: 6362222071772072 x15: 6f66206461657268
[ 1122.393622] x14: 746b207265756373 x13: 52544e49452d203a x12: 0000000000000001
[ 1122.395170] x11: 0000000000000001 x10: 0000000000000000 x9 : 00000000000002d3
[ 1122.396592] x8 : 00000000000003f8 x7 : 0000000000000000 x6 : ffff8000093c2e78
[ 1122.397970] x5 : ffff000209de4240 x4 : ffff0000e1d00030 x3 : dead000000000122
[ 1122.399263] x2 : 00000000000031a8 x1 : 0000000000000000 x0 : 0000000000000000
[ 1122.400473] Call trace:
[ 1122.400908]  bch2_free_pending_node_rewrites+0x40/0x90
[ 1122.401783]  bch2_fs_release+0x48/0x24c
[ 1122.402589]  kobject_put+0x7c/0xe8
[ 1122.403271]  bch2_fs_free+0xa4/0xc8
[ 1122.404033]  bch2_fs_alloc+0x5c8/0xbcc
[ 1122.404888]  bch2_fs_open+0x19c/0x430
[ 1122.405781]  bch2_mount+0x194/0x45c
[ 1122.406643]  legacy_get_tree+0x2c/0x54
[ 1122.407476]  vfs_get_tree+0x28/0xd4
[ 1122.408251]  path_mount+0x5d0/0x6c8
[ 1122.409056]  do_mount+0x80/0xa4
[ 1122.409866]  __arm64_sys_mount+0x150/0x168
[ 1122.410904]  invoke_syscall.constprop.0+0x70/0xb8
[ 1122.411890]  do_el0_svc+0xbc/0xf0
[ 1122.412596]  el0_svc+0x74/0x9c
[ 1122.413343]  el0t_64_sync_handler+0xa8/0x134
[ 1122.414148]  el0t_64_sync+0x168/0x16c
[ 1122.414863] Code: f2fbd5b7 d2863502 91008af8 8b020273 (f85d8695) 
[ 1122.415939] ---[ end trace 0000000000000000 ]---

> I wonder if there might be a better way of solving this though? For aio,
> when a process is exiting we just synchronously tear down the ioctx,
> including waiting for outstanding iocbs.

aio is pretty trivial, because the only async it supports is O_DIRECT
on regular files which always completes in finite time. io_uring has to
cancel etc, so we need to do a lot more.

But the concept of my patch should be fine, but I think we must be
missing a case. Which is why I started writing a small reproducer
instead. I'll pick it up again tomorrow and see what is going on here.

> delayed_fput, even though I believe not responsible here, seems sketchy
> to me because there doesn't seem to be a straightforward way to flush
> delayed fputs for a given _process_ - there's a single global work item,
> and we can only flush globally.

Yep as mentioned I don't think it's delayed_fput at all. And yeah we can
only globally flush that, not per task/files_struct.

-- 
Jens Axboe


  parent reply	other threads:[~2023-06-28  3:16 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26 21:46 [GIT PULL] bcachefs Kent Overstreet
2023-06-26 23:11 ` Jens Axboe
2023-06-27  0:06   ` Kent Overstreet
2023-06-27  1:13     ` Jens Axboe
2023-06-27  2:05       ` Kent Overstreet
2023-06-27  2:59         ` Jens Axboe
2023-06-27  3:10           ` Kent Overstreet
2023-06-27 17:16           ` Jens Axboe
2023-06-27 20:15             ` Kent Overstreet
2023-06-27 22:05               ` Dave Chinner
2023-06-27 22:41                 ` Kent Overstreet
2023-06-28 14:40                 ` Jens Axboe
2023-06-28 14:48                   ` Thomas Weißschuh
2023-06-28 14:58                     ` Jens Axboe
2023-06-28  3:16               ` Jens Axboe [this message]
2023-06-28  4:01                 ` Kent Overstreet
2023-06-28 14:58                   ` Jens Axboe
2023-06-28 15:22                     ` Jens Axboe
2023-06-28 17:56                       ` Kent Overstreet
2023-06-28 20:45                         ` Jens Axboe
2023-06-28 16:57                     ` Jens Axboe
2023-06-28 17:33                       ` Christian Brauner
2023-06-28 17:52                       ` Kent Overstreet
2023-06-28 20:44                         ` Jens Axboe
2023-06-28 21:17                           ` Jens Axboe
2023-06-28 22:13                             ` Kent Overstreet
2023-06-28 22:33                               ` Jens Axboe
2023-06-28 22:55                                 ` Kent Overstreet
2023-06-28 23:14                                   ` Jens Axboe
2023-06-28 23:50                                     ` Kent Overstreet
2023-06-29  1:00                                       ` Dave Chinner
2023-06-29  1:33                                         ` Jens Axboe
2023-06-29 11:18                                           ` Christian Brauner
2023-06-29 14:17                                             ` Kent Overstreet
2023-06-29 15:31                                             ` Kent Overstreet
2023-06-30  9:40                                               ` Christian Brauner
2023-07-06 15:20                                                 ` Kent Overstreet
2023-07-06 16:26                                                   ` Jens Axboe
2023-07-06 16:34                                                     ` Kent Overstreet
2023-06-29  1:29                                       ` Jens Axboe
2023-07-06 20:15                             ` Kent Overstreet
2023-06-28 17:54                     ` Kent Overstreet
2023-06-28 20:54                       ` Jens Axboe
2023-06-28 22:14                         ` Jens Axboe
2023-06-28 23:04                           ` Kent Overstreet
2023-06-28 23:11                             ` Jens Axboe
2023-06-27  2:33       ` Kent Overstreet
2023-06-27  2:59         ` Jens Axboe
2023-06-27  3:19           ` Matthew Wilcox
2023-06-27  3:22             ` Kent Overstreet
2023-06-27  3:52 ` Christoph Hellwig
2023-06-27  4:36   ` Kent Overstreet
2023-07-06 15:56 ` Kent Overstreet
2023-07-06 16:40   ` Josef Bacik
2023-07-06 17:38     ` Kent Overstreet
2023-07-06 19:17       ` Eric Sandeen
2023-07-06 19:31         ` Kent Overstreet
2023-07-06 21:19       ` Darrick J. Wong
2023-07-06 22:43         ` Kent Overstreet
2023-07-07 13:13           ` Jan Kara
2023-07-07 13:52             ` Kent Overstreet
2023-07-07  8:48         ` Christian Brauner
2023-07-07  9:18           ` Kent Overstreet
2023-07-07 16:26             ` James Bottomley
2023-07-07 16:48               ` Kent Overstreet
2023-07-07 17:04                 ` James Bottomley
2023-07-07 17:26                   ` Kent Overstreet
2023-07-08  3:54               ` Matthew Wilcox
2023-07-08  4:10                 ` Kent Overstreet
2023-07-08  4:31                 ` Kent Overstreet
2023-07-08 15:02                   ` Theodore Ts'o
2023-07-08 15:23                     ` Kent Overstreet
2023-07-08 16:42                 ` James Bottomley
2023-07-09  1:16                   ` Kent Overstreet
2023-07-07  9:35           ` Kent Overstreet
2023-07-07  2:04       ` Theodore Ts'o
2023-07-07 12:18       ` Brian Foster
2023-07-07 14:49         ` Kent Overstreet
2023-07-12  2:54   ` Kent Overstreet
2023-07-12 19:48     ` Kees Cook
2023-07-12 19:57       ` Kent Overstreet
2023-07-12 22:10     ` Darrick J. Wong
2023-07-12 23:57       ` Kent Overstreet
2023-08-09  1:27     ` Linus Torvalds
2023-08-10 15:54       ` Kent Overstreet
2023-08-10 16:40         ` Linus Torvalds
2023-08-10 18:02           ` Kent Overstreet
2023-08-10 18:09             ` Linus Torvalds
2023-08-10 17:52         ` Jan Kara
2023-08-11  2:47           ` Kent Overstreet
2023-08-11  8:10             ` Jan Kara
2023-08-11  8:13               ` Christian Brauner
2023-08-10 22:39         ` Darrick J. Wong
2023-08-10 23:47           ` Linus Torvalds
2023-08-11  2:40             ` Jens Axboe
2023-08-11  4:03             ` Kent Overstreet
2023-08-11  5:20               ` Linus Torvalds
2023-08-11  5:29                 ` Kent Overstreet
2023-08-11  5:53                   ` Linus Torvalds
2023-08-11  7:52                     ` Christian Brauner
2023-08-11 14:31                     ` Jens Axboe
2023-08-11  3:45           ` Kent Overstreet
2023-08-21  0:09             ` Dave Chinner
2023-08-10 23:07         ` Matthew Wilcox
2023-08-11 10:54         ` Christian Brauner
2023-08-11 12:58           ` Kent Overstreet
2023-08-14  7:25             ` Christian Brauner
2023-08-14 15:23               ` Kent Overstreet
2023-08-11 13:21           ` Kent Overstreet
2023-08-11 22:56             ` Darrick J. Wong
2023-08-14  7:21             ` Christian Brauner
2023-08-14 15:27               ` Kent Overstreet
2023-09-03  3:25 Kent Overstreet
2023-09-05 13:24 ` Christoph Hellwig
2023-09-06  0:00   ` Kent Overstreet
2023-09-06  0:41     ` Matthew Wilcox
2023-09-06 16:10       ` Kent Overstreet
2023-09-06 17:57         ` Darrick J. Wong
2023-09-08  9:37     ` Christoph Hellwig
2023-09-06 19:36 ` Linus Torvalds
2023-09-06 20:02   ` Linus Torvalds
2023-09-06 20:20     ` Linus Torvalds
2023-09-06 21:55       ` Arnaldo Carvalho de Melo
2023-09-06 23:13         ` David Sterba
2023-09-06 23:34           ` Linus Torvalds
2023-09-06 23:46             ` Arnaldo Carvalho de Melo
2023-09-06 23:53               ` Arnaldo Carvalho de Melo
2023-09-06 23:16         ` Linus Torvalds
2023-09-10  0:53       ` Kent Overstreet
2023-09-07 20:37   ` Kent Overstreet
2023-09-07 20:51     ` Linus Torvalds
2023-09-07 23:40   ` Kent Overstreet
2023-09-08  6:29     ` Martin Steigerwald
2023-09-08  9:11     ` Joshua Ashton
2023-09-06 22:28 ` Nathan Chancellor
2023-09-07  0:03   ` Kees Cook
2023-09-07 14:29     ` Chris Mason
2023-09-07 20:39     ` Kent Overstreet
2023-09-08 10:50       ` Brian Foster
2023-09-08 23:05     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c06a9e0b-8f3e-4e47-53d0-b4854a98cc44@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.