linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-bcachefs@vger.kernel.org, dchinner@redhat.com,
	sandeen@redhat.com, willy@infradead.org, josef@toxicpanda.com,
	tytso@mit.edu, bfoster@redhat.com, jack@suse.cz,
	andreas.gruenbacher@gmail.com, brauner@kernel.org,
	peterz@infradead.org, akpm@linux-foundation.org,
	dhowells@redhat.com, snitzer@kernel.org, axboe@kernel.dk
Subject: Re: [GIT PULL] bcachefs
Date: Thu, 10 Aug 2023 16:47:22 -0700	[thread overview]
Message-ID: <CAHk-=wj8RuUosugVZk+iqCAq7x6rs=7C-9sUXcO2heu4dCuOVw@mail.gmail.com> (raw)
In-Reply-To: <20230810223942.GG11336@frogsfrogsfrogs>

On Thu, 10 Aug 2023 at 15:39, Darrick J. Wong <djwong@kernel.org> wrote:
>
> FWIW I recently fixed all my stupid debian package dependencies so that
> I could actually install liburing again, and rebuilt fstests.  The very
> next morning I noticed a number of new test failures in /exactly/ the
> way that Kent said to expect:
>
> fsstress -d /mnt & <sleep then simulate fs crash>; \
>         umount /mnt; mount /dev/sda /mnt
>
> Here, umount exits before the filesystem is really torn down, and then
> mount fails because it can't get an exclusive lock on the device.

I agree that that obviously sounds like mount is just returning either
too early. Or too eagerly.

But I suspect any delayed fput() issues (whether from aio or io_uring)
are then just a way to trigger the problem, not the fundamental cause.

Because even if the fput() is delayed, the mntput() part of that
delayed __fput action is the one that *should* have kept the
filesystem mounted until it is no longer busy.

And more importantly, having some of the common paths synchronize
*their* fput() calls only affects those paths.

It doesn't affect the fundamental issue that the last fput() can
happen in odd contexts when the file descriptor was used for something
a bit stranger.

So I do feel like the fput patch I saw looked more like a "hide the
problem" than a real fix.

Put another way: I would not be surprised in the *least* if then
adding more synchronization to fput would basically hide any issue,
particularly from tests that then use those things that you added
synchronization for.

But it really smells like it's purely hiding the symptom to me.

If I were a betting man, I'd look at ->mnt_count.  I'm not saying
that's the problem, but the mnt refcount handling is more than a bit
scary.

It is so hugely performance-critical (ie every single path access)
that we use those percpu counters for it, and I'm not at all sure it's
all race-free.

Just as an example, mnt_add_count() has this comment above it:

 * vfsmount lock must be held for read

but afaik the main way it gets called is through mntget(), and I see
no vfsmount lock held anywhere there (think "path_get() and friends).
Maybe I'm missing something obvious.

So I think that comment is some historical left-over that hasn't been
true for ages.

And all of the counter updates should be consistent even in the
absence of said lock, so it's not an issue.

Except when it is: it does look like it *would* screw up
mnt_get_count() that tries to add up all those percput counters with

        for_each_possible_cpu(cpu) {
                count += per_cpu_ptr(mnt->mnt_pcp, cpu)->mnt_count;
        }

and that one has that

 * vfsmount lock must be held for write

comment, which makes sense as a "that would indeed synchronize if
others held it for read". But...

And where is that sum used? Very much in things like may_umount_tree().

Anyway, I'm absolutely not saying this is the actual problem - we
probably at least partly just have stale or incomplete comments, and
maybe I think the fput() side is good mainly because I'm *much* more
familiar with that side than I am with the actual mount code these
days.

So I might be barking up entirely the wrong tree.

But I do feel like the fput patch I saw looked more like a "hide the
problem" than a real fix. Because the mount counting *should* be
entirely independent of when exactly a fput happens.

So I believe having the test-case then do some common fput's
synchronously pretty much by definition can't fix any issues, but it
*can* make sure that any normal test using just regular system calls
then never triggers the "oh, in other situations the last fput will be
delayed".

So that's why I'm very leery of the fput patch I saw. I don't think it
makes sense.

That does *not* mean that I don't believe that umount can have serious problems.

I suspect we get very little coverage of that in normal situations.

And yes, obviously io_uring does add *way* more asynchronicity, and
I'd not be surprised at all if it uncovers problems.

In most other situations, the main source of file counts are purely
open/close system calls, which are in many ways "simple" (and where
things like process exit obviously does the closing part).

                Linus

  reply	other threads:[~2023-08-10 23:47 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26 21:46 [GIT PULL] bcachefs Kent Overstreet
2023-06-26 23:11 ` Jens Axboe
2023-06-27  0:06   ` Kent Overstreet
2023-06-27  1:13     ` Jens Axboe
2023-06-27  2:05       ` Kent Overstreet
2023-06-27  2:59         ` Jens Axboe
2023-06-27  3:10           ` Kent Overstreet
2023-06-27 17:16           ` Jens Axboe
2023-06-27 20:15             ` Kent Overstreet
2023-06-27 22:05               ` Dave Chinner
2023-06-27 22:41                 ` Kent Overstreet
2023-06-28 14:40                 ` Jens Axboe
2023-06-28 14:48                   ` Thomas Weißschuh
2023-06-28 14:58                     ` Jens Axboe
2023-06-28  3:16               ` Jens Axboe
2023-06-28  4:01                 ` Kent Overstreet
2023-06-28 14:58                   ` Jens Axboe
2023-06-28 15:22                     ` Jens Axboe
2023-06-28 17:56                       ` Kent Overstreet
2023-06-28 20:45                         ` Jens Axboe
2023-06-28 16:57                     ` Jens Axboe
2023-06-28 17:33                       ` Christian Brauner
2023-06-28 17:52                       ` Kent Overstreet
2023-06-28 20:44                         ` Jens Axboe
2023-06-28 21:17                           ` Jens Axboe
2023-06-28 22:13                             ` Kent Overstreet
2023-06-28 22:33                               ` Jens Axboe
2023-06-28 22:55                                 ` Kent Overstreet
2023-06-28 23:14                                   ` Jens Axboe
2023-06-28 23:50                                     ` Kent Overstreet
2023-06-29  1:00                                       ` Dave Chinner
2023-06-29  1:33                                         ` Jens Axboe
2023-06-29 11:18                                           ` Christian Brauner
2023-06-29 14:17                                             ` Kent Overstreet
2023-06-29 15:31                                             ` Kent Overstreet
2023-06-30  9:40                                               ` Christian Brauner
2023-07-06 15:20                                                 ` Kent Overstreet
2023-07-06 16:26                                                   ` Jens Axboe
2023-07-06 16:34                                                     ` Kent Overstreet
2023-06-29  1:29                                       ` Jens Axboe
2023-07-06 20:15                             ` Kent Overstreet
2023-06-28 17:54                     ` Kent Overstreet
2023-06-28 20:54                       ` Jens Axboe
2023-06-28 22:14                         ` Jens Axboe
2023-06-28 23:04                           ` Kent Overstreet
2023-06-28 23:11                             ` Jens Axboe
2023-06-27  2:33       ` Kent Overstreet
2023-06-27  2:59         ` Jens Axboe
2023-06-27  3:19           ` Matthew Wilcox
2023-06-27  3:22             ` Kent Overstreet
2023-06-27  3:52 ` Christoph Hellwig
2023-06-27  4:36   ` Kent Overstreet
2023-07-06 15:56 ` Kent Overstreet
2023-07-06 16:40   ` Josef Bacik
2023-07-06 17:38     ` Kent Overstreet
2023-07-06 19:17       ` Eric Sandeen
2023-07-06 19:31         ` Kent Overstreet
2023-07-06 21:19       ` Darrick J. Wong
2023-07-06 22:43         ` Kent Overstreet
2023-07-07 13:13           ` Jan Kara
2023-07-07 13:52             ` Kent Overstreet
2023-07-07  8:48         ` Christian Brauner
2023-07-07  9:18           ` Kent Overstreet
2023-07-07 16:26             ` James Bottomley
2023-07-07 16:48               ` Kent Overstreet
2023-07-07 17:04                 ` James Bottomley
2023-07-07 17:26                   ` Kent Overstreet
2023-07-08  3:54               ` Matthew Wilcox
2023-07-08  4:10                 ` Kent Overstreet
2023-07-08  4:31                 ` Kent Overstreet
2023-07-08 15:02                   ` Theodore Ts'o
2023-07-08 15:23                     ` Kent Overstreet
2023-07-08 16:42                 ` James Bottomley
2023-07-09  1:16                   ` Kent Overstreet
2023-07-07  9:35           ` Kent Overstreet
2023-07-07  2:04       ` Theodore Ts'o
2023-07-07 12:18       ` Brian Foster
2023-07-07 14:49         ` Kent Overstreet
2023-07-12  2:54   ` Kent Overstreet
2023-07-12 19:48     ` Kees Cook
2023-07-12 19:57       ` Kent Overstreet
2023-07-12 22:10     ` Darrick J. Wong
2023-07-12 23:57       ` Kent Overstreet
2023-08-09  1:27     ` Linus Torvalds
2023-08-10 15:54       ` Kent Overstreet
2023-08-10 16:40         ` Linus Torvalds
2023-08-10 18:02           ` Kent Overstreet
2023-08-10 18:09             ` Linus Torvalds
2023-08-10 17:52         ` Jan Kara
2023-08-11  2:47           ` Kent Overstreet
2023-08-11  8:10             ` Jan Kara
2023-08-11  8:13               ` Christian Brauner
2023-08-10 22:39         ` Darrick J. Wong
2023-08-10 23:47           ` Linus Torvalds [this message]
2023-08-11  2:40             ` Jens Axboe
2023-08-11  4:03             ` Kent Overstreet
2023-08-11  5:20               ` Linus Torvalds
2023-08-11  5:29                 ` Kent Overstreet
2023-08-11  5:53                   ` Linus Torvalds
2023-08-11  7:52                     ` Christian Brauner
2023-08-11 14:31                     ` Jens Axboe
2023-08-11  3:45           ` Kent Overstreet
2023-08-21  0:09             ` Dave Chinner
2023-08-10 23:07         ` Matthew Wilcox
2023-08-11 10:54         ` Christian Brauner
2023-08-11 12:58           ` Kent Overstreet
2023-08-14  7:25             ` Christian Brauner
2023-08-14 15:23               ` Kent Overstreet
2023-08-11 13:21           ` Kent Overstreet
2023-08-11 22:56             ` Darrick J. Wong
2023-08-14  7:21             ` Christian Brauner
2023-08-14 15:27               ` Kent Overstreet
2023-09-03  3:25 Kent Overstreet
2023-09-05 13:24 ` Christoph Hellwig
2023-09-06  0:00   ` Kent Overstreet
2023-09-06  0:41     ` Matthew Wilcox
2023-09-06 16:10       ` Kent Overstreet
2023-09-06 17:57         ` Darrick J. Wong
2023-09-08  9:37     ` Christoph Hellwig
2023-09-06 19:36 ` Linus Torvalds
2023-09-06 20:02   ` Linus Torvalds
2023-09-06 20:20     ` Linus Torvalds
2023-09-06 21:55       ` Arnaldo Carvalho de Melo
2023-09-06 23:13         ` David Sterba
2023-09-06 23:34           ` Linus Torvalds
2023-09-06 23:46             ` Arnaldo Carvalho de Melo
2023-09-06 23:53               ` Arnaldo Carvalho de Melo
2023-09-06 23:16         ` Linus Torvalds
2023-09-10  0:53       ` Kent Overstreet
2023-09-07 20:37   ` Kent Overstreet
2023-09-07 20:51     ` Linus Torvalds
2023-09-07 23:40   ` Kent Overstreet
2023-09-08  6:29     ` Martin Steigerwald
2023-09-08  9:11     ` Joshua Ashton
2023-09-06 22:28 ` Nathan Chancellor
2023-09-07  0:03   ` Kees Cook
2023-09-07 14:29     ` Chris Mason
2023-09-07 20:39     ` Kent Overstreet
2023-09-08 10:50       ` Brian Foster
2023-09-08 23:05     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wj8RuUosugVZk+iqCAq7x6rs=7C-9sUXcO2heu4dCuOVw@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=andreas.gruenbacher@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bfoster@redhat.com \
    --cc=brauner@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sandeen@redhat.com \
    --cc=snitzer@kernel.org \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).