All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-bcachefs@vger.kernel.org,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [GIT PULL] bcachefs
Date: Wed, 28 Jun 2023 08:58:45 -0600	[thread overview]
Message-ID: <b02657af-5bbb-b46b-cea0-ee89f385f3c1@kernel.dk> (raw)
In-Reply-To: <20230628040114.oz46icbsjpa4egpp@moria.home.lan>

On 6/27/23 10:01?PM, Kent Overstreet wrote:
> On Tue, Jun 27, 2023 at 09:16:31PM -0600, Jens Axboe wrote:
>> On 6/27/23 2:15?PM, Kent Overstreet wrote:
>>>> to ktest/tests/xfstests/ and run it with -bcachefs, otherwise it kept
>>>> failing because it assumed it was XFS.
>>>>
>>>> I suspected this was just a timing issue, and it looks like that's
>>>> exactly what it is. Looking at the test case, it'll randomly kill -9
>>>> fsstress, and if that happens while we have io_uring IO pending, then we
>>>> process completions inline (for a PF_EXITING current). This means they
>>>> get pushed to fallback work, which runs out of line. If we hit that case
>>>> AND the timing is such that it hasn't been processed yet, we'll still be
>>>> holding a file reference under the mount point and umount will -EBUSY
>>>> fail.
>>>>
>>>> As far as I can tell, this can happen with aio as well, it's just harder
>>>> to hit. If the fput happens while the task is exiting, then fput will
>>>> end up being delayed through a workqueue as well. The test case assumes
>>>> that once it's reaped the exit of the killed task that all files are
>>>> released, which isn't necessarily true if they are done out-of-line.
>>>
>>> Yeah, I traced it through to the delayed fput code as well.
>>>
>>> I'm not sure delayed fput is responsible here; what I learned when I was
>>> tracking this down has mostly fell out of my brain, so take anything I
>>> say with a large grain of salt. But I believe I tested with delayed_fput
>>> completely disabled, and found another thing in io_uring with the same
>>> effect as delayed_fput that wasn't being flushed.
>>
>> I'm not saying it's delayed_fput(), I'm saying it's the delayed putting
>> io_uring can end up doing. But yes, delayed_fput() is another candidate.
> 
> Sorry - was just working through my recollections/initial thought
> process out loud

No worries, it might actually be a combination and this is why my
io_uring side patch didn't fully resolve it. Wrote a simple reproducer
and it seems to reliably trigger it, but is fixed with an flush of the
delayed fput list on mount -EBUSY return. Still digging...

>>>> For io_uring specifically, it may make sense to wait on the fallback
>>>> work. The below patch does this, and should fix the issue. But I'm not
>>>> fully convinced that this is really needed, as I do think this can
>>>> happen without io_uring as well. It just doesn't right now as the test
>>>> does buffered IO, and aio will be fully sync with buffered IO. That
>>>> means there's either no gap where aio will hit it without O_DIRECT, or
>>>> it's just small enough that it hasn't been hit.
>>>
>>> I just tried your patch and I still have generic/388 failing - it
>>> might've taken a bit longer to pop this time.
>>
>> Yep see the same here. Didn't have time to look into it after sending
>> that email today, just took a quick stab at writing a reproducer and
>> ended up crashing bcachefs:
> 
> You must have hit an error before we finished initializing the
> filesystem, the list head never got initialized. Patch for that will be
> in the testing branch momentarily.

I'll pull that in. In testing just now, I hit a few more leaks:

unreferenced object 0xffff0000e55cf200 (size 128):
  comm "mount", pid 723, jiffies 4294899134 (age 85.868s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000001d69062c>] slab_post_alloc_hook.isra.0+0xb4/0xbc
    [<00000000c503def2>] __kmem_cache_alloc_node+0xd0/0x178
    [<00000000cde48528>] __kmalloc+0xac/0xd4
    [<000000006cb9446a>] kmalloc_array.constprop.0+0x18/0x20
    [<000000008341b32c>] bch2_fs_alloc+0x73c/0xbcc
    [<000000003b8339fd>] bch2_fs_open+0x19c/0x430
    [<00000000aef40a23>] bch2_mount+0x194/0x45c
    [<0000000005e49357>] legacy_get_tree+0x2c/0x54
    [<00000000f5813622>] vfs_get_tree+0x28/0xd4
    [<00000000ea6972ec>] path_mount+0x5d0/0x6c8
    [<00000000468ec307>] do_mount+0x80/0xa4
    [<00000000ea5d305d>] __arm64_sys_mount+0x150/0x168
    [<00000000da6d98cb>] invoke_syscall.constprop.0+0x70/0xb8
    [<000000008f20c487>] do_el0_svc+0xbc/0xf0
    [<00000000a1018c2c>] el0_svc+0x74/0x9c
    [<00000000fc46d579>] el0t_64_sync_handler+0xa8/0x134
unreferenced object 0xffff0000e55cf580 (size 128):
  comm "mount", pid 723, jiffies 4294899134 (age 85.868s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000001d69062c>] slab_post_alloc_hook.isra.0+0xb4/0xbc
    [<00000000c503def2>] __kmem_cache_alloc_node+0xd0/0x178
    [<00000000cde48528>] __kmalloc+0xac/0xd4
    [<0000000097f806f1>] __prealloc_shrinker+0x3c/0x60
    [<000000008ff20762>] register_shrinker+0x14/0x34
    [<000000007fa7e36c>] bch2_fs_btree_cache_init+0xf8/0x150
    [<000000005135a635>] bch2_fs_alloc+0x7ac/0xbcc
    [<000000003b8339fd>] bch2_fs_open+0x19c/0x430
    [<00000000aef40a23>] bch2_mount+0x194/0x45c
    [<0000000005e49357>] legacy_get_tree+0x2c/0x54
    [<00000000f5813622>] vfs_get_tree+0x28/0xd4
    [<00000000ea6972ec>] path_mount+0x5d0/0x6c8
    [<00000000468ec307>] do_mount+0x80/0xa4
    [<00000000ea5d305d>] __arm64_sys_mount+0x150/0x168
    [<00000000da6d98cb>] invoke_syscall.constprop.0+0x70/0xb8
    [<000000008f20c487>] do_el0_svc+0xbc/0xf0
unreferenced object 0xffff0000e55cf480 (size 128):
  comm "mount", pid 723, jiffies 4294899134 (age 85.868s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000001d69062c>] slab_post_alloc_hook.isra.0+0xb4/0xbc
    [<00000000c503def2>] __kmem_cache_alloc_node+0xd0/0x178
    [<00000000cde48528>] __kmalloc+0xac/0xd4
    [<0000000097f806f1>] __prealloc_shrinker+0x3c/0x60
    [<000000008ff20762>] register_shrinker+0x14/0x34
    [<000000003d050c32>] bch2_fs_btree_key_cache_init+0x88/0x90
    [<00000000d9f351c0>] bch2_fs_alloc+0x7c0/0xbcc
    [<000000003b8339fd>] bch2_fs_open+0x19c/0x430
    [<00000000aef40a23>] bch2_mount+0x194/0x45c
    [<0000000005e49357>] legacy_get_tree+0x2c/0x54
    [<00000000f5813622>] vfs_get_tree+0x28/0xd4
    [<00000000ea6972ec>] path_mount+0x5d0/0x6c8
    [<00000000468ec307>] do_mount+0x80/0xa4
    [<00000000ea5d305d>] __arm64_sys_mount+0x150/0x168
    [<00000000da6d98cb>] invoke_syscall.constprop.0+0x70/0xb8
    [<000000008f20c487>] do_el0_svc+0xbc/0xf0

>>> I wonder if there might be a better way of solving this though? For aio,
>>> when a process is exiting we just synchronously tear down the ioctx,
>>> including waiting for outstanding iocbs.
>>
>> aio is pretty trivial, because the only async it supports is O_DIRECT
>> on regular files which always completes in finite time. io_uring has to
>> cancel etc, so we need to do a lot more.
> 
> ahh yes, buffered IO would complicate things
> 
>> But the concept of my patch should be fine, but I think we must be
>> missing a case. Which is why I started writing a small reproducer
>> instead. I'll pick it up again tomorrow and see what is going on here.
> 
> Ok. Soon as you've got a patch I'll throw it at my CI, or I can point my
> CI at your branch if you have one.

I should have something later today, don't feel like I fully understand
all of it just yet.

-- 
Jens Axboe


  reply	other threads:[~2023-06-28 14:59 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26 21:46 [GIT PULL] bcachefs Kent Overstreet
2023-06-26 23:11 ` Jens Axboe
2023-06-27  0:06   ` Kent Overstreet
2023-06-27  1:13     ` Jens Axboe
2023-06-27  2:05       ` Kent Overstreet
2023-06-27  2:59         ` Jens Axboe
2023-06-27  3:10           ` Kent Overstreet
2023-06-27 17:16           ` Jens Axboe
2023-06-27 20:15             ` Kent Overstreet
2023-06-27 22:05               ` Dave Chinner
2023-06-27 22:41                 ` Kent Overstreet
2023-06-28 14:40                 ` Jens Axboe
2023-06-28 14:48                   ` Thomas Weißschuh
2023-06-28 14:58                     ` Jens Axboe
2023-06-28  3:16               ` Jens Axboe
2023-06-28  4:01                 ` Kent Overstreet
2023-06-28 14:58                   ` Jens Axboe [this message]
2023-06-28 15:22                     ` Jens Axboe
2023-06-28 17:56                       ` Kent Overstreet
2023-06-28 20:45                         ` Jens Axboe
2023-06-28 16:57                     ` Jens Axboe
2023-06-28 17:33                       ` Christian Brauner
2023-06-28 17:52                       ` Kent Overstreet
2023-06-28 20:44                         ` Jens Axboe
2023-06-28 21:17                           ` Jens Axboe
2023-06-28 22:13                             ` Kent Overstreet
2023-06-28 22:33                               ` Jens Axboe
2023-06-28 22:55                                 ` Kent Overstreet
2023-06-28 23:14                                   ` Jens Axboe
2023-06-28 23:50                                     ` Kent Overstreet
2023-06-29  1:00                                       ` Dave Chinner
2023-06-29  1:33                                         ` Jens Axboe
2023-06-29 11:18                                           ` Christian Brauner
2023-06-29 14:17                                             ` Kent Overstreet
2023-06-29 15:31                                             ` Kent Overstreet
2023-06-30  9:40                                               ` Christian Brauner
2023-07-06 15:20                                                 ` Kent Overstreet
2023-07-06 16:26                                                   ` Jens Axboe
2023-07-06 16:34                                                     ` Kent Overstreet
2023-06-29  1:29                                       ` Jens Axboe
2023-07-06 20:15                             ` Kent Overstreet
2023-06-28 17:54                     ` Kent Overstreet
2023-06-28 20:54                       ` Jens Axboe
2023-06-28 22:14                         ` Jens Axboe
2023-06-28 23:04                           ` Kent Overstreet
2023-06-28 23:11                             ` Jens Axboe
2023-06-27  2:33       ` Kent Overstreet
2023-06-27  2:59         ` Jens Axboe
2023-06-27  3:19           ` Matthew Wilcox
2023-06-27  3:22             ` Kent Overstreet
2023-06-27  3:52 ` Christoph Hellwig
2023-06-27  4:36   ` Kent Overstreet
2023-07-06 15:56 ` Kent Overstreet
2023-07-06 16:40   ` Josef Bacik
2023-07-06 17:38     ` Kent Overstreet
2023-07-06 19:17       ` Eric Sandeen
2023-07-06 19:31         ` Kent Overstreet
2023-07-06 21:19       ` Darrick J. Wong
2023-07-06 22:43         ` Kent Overstreet
2023-07-07 13:13           ` Jan Kara
2023-07-07 13:52             ` Kent Overstreet
2023-07-07  8:48         ` Christian Brauner
2023-07-07  9:18           ` Kent Overstreet
2023-07-07 16:26             ` James Bottomley
2023-07-07 16:48               ` Kent Overstreet
2023-07-07 17:04                 ` James Bottomley
2023-07-07 17:26                   ` Kent Overstreet
2023-07-08  3:54               ` Matthew Wilcox
2023-07-08  4:10                 ` Kent Overstreet
2023-07-08  4:31                 ` Kent Overstreet
2023-07-08 15:02                   ` Theodore Ts'o
2023-07-08 15:23                     ` Kent Overstreet
2023-07-08 16:42                 ` James Bottomley
2023-07-09  1:16                   ` Kent Overstreet
2023-07-07  9:35           ` Kent Overstreet
2023-07-07  2:04       ` Theodore Ts'o
2023-07-07 12:18       ` Brian Foster
2023-07-07 14:49         ` Kent Overstreet
2023-07-12  2:54   ` Kent Overstreet
2023-07-12 19:48     ` Kees Cook
2023-07-12 19:57       ` Kent Overstreet
2023-07-12 22:10     ` Darrick J. Wong
2023-07-12 23:57       ` Kent Overstreet
2023-08-09  1:27     ` Linus Torvalds
2023-08-10 15:54       ` Kent Overstreet
2023-08-10 16:40         ` Linus Torvalds
2023-08-10 18:02           ` Kent Overstreet
2023-08-10 18:09             ` Linus Torvalds
2023-08-10 17:52         ` Jan Kara
2023-08-11  2:47           ` Kent Overstreet
2023-08-11  8:10             ` Jan Kara
2023-08-11  8:13               ` Christian Brauner
2023-08-10 22:39         ` Darrick J. Wong
2023-08-10 23:47           ` Linus Torvalds
2023-08-11  2:40             ` Jens Axboe
2023-08-11  4:03             ` Kent Overstreet
2023-08-11  5:20               ` Linus Torvalds
2023-08-11  5:29                 ` Kent Overstreet
2023-08-11  5:53                   ` Linus Torvalds
2023-08-11  7:52                     ` Christian Brauner
2023-08-11 14:31                     ` Jens Axboe
2023-08-11  3:45           ` Kent Overstreet
2023-08-21  0:09             ` Dave Chinner
2023-08-10 23:07         ` Matthew Wilcox
2023-08-11 10:54         ` Christian Brauner
2023-08-11 12:58           ` Kent Overstreet
2023-08-14  7:25             ` Christian Brauner
2023-08-14 15:23               ` Kent Overstreet
2023-08-11 13:21           ` Kent Overstreet
2023-08-11 22:56             ` Darrick J. Wong
2023-08-14  7:21             ` Christian Brauner
2023-08-14 15:27               ` Kent Overstreet
2023-09-03  3:25 Kent Overstreet
2023-09-05 13:24 ` Christoph Hellwig
2023-09-06  0:00   ` Kent Overstreet
2023-09-06  0:41     ` Matthew Wilcox
2023-09-06 16:10       ` Kent Overstreet
2023-09-06 17:57         ` Darrick J. Wong
2023-09-08  9:37     ` Christoph Hellwig
2023-09-06 19:36 ` Linus Torvalds
2023-09-06 20:02   ` Linus Torvalds
2023-09-06 20:20     ` Linus Torvalds
2023-09-06 21:55       ` Arnaldo Carvalho de Melo
2023-09-06 23:13         ` David Sterba
2023-09-06 23:34           ` Linus Torvalds
2023-09-06 23:46             ` Arnaldo Carvalho de Melo
2023-09-06 23:53               ` Arnaldo Carvalho de Melo
2023-09-06 23:16         ` Linus Torvalds
2023-09-10  0:53       ` Kent Overstreet
2023-09-07 20:37   ` Kent Overstreet
2023-09-07 20:51     ` Linus Torvalds
2023-09-07 23:40   ` Kent Overstreet
2023-09-08  6:29     ` Martin Steigerwald
2023-09-08  9:11     ` Joshua Ashton
2023-09-06 22:28 ` Nathan Chancellor
2023-09-07  0:03   ` Kees Cook
2023-09-07 14:29     ` Chris Mason
2023-09-07 20:39     ` Kent Overstreet
2023-09-08 10:50       ` Brian Foster
2023-09-08 23:05     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b02657af-5bbb-b46b-cea0-ee89f385f3c1@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.