linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kent Overstreet <kent.overstreet@linux.dev>
To: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-bcachefs@vger.kernel.org, djwong@kernel.org,
	dchinner@redhat.com, sandeen@redhat.com, willy@infradead.org,
	josef@toxicpanda.com, tytso@mit.edu, bfoster@redhat.com,
	andreas.gruenbacher@gmail.com, brauner@kernel.org,
	peterz@infradead.org, akpm@linux-foundation.org,
	dhowells@redhat.com, snitzer@kernel.org, axboe@kernel.dk
Subject: Re: [GIT PULL] bcachefs
Date: Thu, 10 Aug 2023 22:47:03 -0400	[thread overview]
Message-ID: <20230811024703.7dhu5rz5ovph7uop@moria.home.lan> (raw)
In-Reply-To: <20230810175205.gtlkydeis37xdxuk@quack3>

On Thu, Aug 10, 2023 at 07:52:05PM +0200, Jan Kara wrote:
> On Thu 10-08-23 11:54:53, Kent Overstreet wrote:
> > > And there clearly is something very strange going on with superblock
> > > handling
> > 
> > This deserves an explanation because sget() is a bit nutty.
> > 
> > The way sget() is conventionally used for block device filesystems, the
> > block device open _isn't actually exclusive_ - sure, FMODE_EXCL is used,
> > but the holder is the fs type pointer, so it won't exclude with other
> > opens of the same fs type.
> > 
> > That means the only protection from multiple opens scribbling over each
> > other is sget() itself - but if the bdev handle ever outlives the
> > superblock we're completely screwed; that's a silent data corruption bug
> > that we can't easily catch, and if the filesystem teardown path has any
> > asynchronous stuff going on (and of course it does) that's not a hard
> > mistake to make. I've observed at least one bug that looked suspiciously
> > like that, but I don't think I quite pinned it down at the time.
> 
> This is just being changed - check Christian's VFS tree. There are patches
> that make sget() use superblock pointer as a bdev holder so the reuse
> you're speaking about isn't a problem anymore.

So then the question is what do you use for identifying the superblock,
and you're switching to the dev_t - interesting.

Are we 100% sure that will never break, that a dev_t will always
identify a unique block_device? Namespacing has been changing things.

> > It also forces the caller to separate opening of the block devices from
> > the rest of filesystem initialization, which is a bit less than ideal.
> > 
> > Anyways, bcachefs just wants to be able to do real exclusive opens of
> > the block devices, and we do all filesystem bringup with a single
> > bch2_fs_open() call. I think this could be made to work with the way
> > sget() wants to work, but it'd require reworking the locking in
> > sget() - it does everything, including the test() and set() calls, under
> > a single spinlock.
> 
> Yeah. Maybe the current upstream changes aren't enough to make your life
> easier for bcachefs, btrfs does its special thing as well after all because
> mount also involves multiple devices for it. I just wanted to mention that
> the exclusive bdev open thing is changing.

I like the mount_bdev() approach in your patch a _lot_ better than the
old code, I think the approach almost works for multi device
filesystems - at least for bcachefs where we always pass in the full
list of devices we want to open, there's no kernel side probing like in
btrfs.

What changes is we'd have to pass a vector of dev_t's to sget(), and
set() needs to be able to stash them in super_block (not s_fs_info, we
need that for bch_fs later and that doesn't exist yet). But that's a
minor detail.

Yeah, this could work.

  reply	other threads:[~2023-08-11  2:47 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26 21:46 [GIT PULL] bcachefs Kent Overstreet
2023-06-26 23:11 ` Jens Axboe
2023-06-27  0:06   ` Kent Overstreet
2023-06-27  1:13     ` Jens Axboe
2023-06-27  2:05       ` Kent Overstreet
2023-06-27  2:59         ` Jens Axboe
2023-06-27  3:10           ` Kent Overstreet
2023-06-27 17:16           ` Jens Axboe
2023-06-27 20:15             ` Kent Overstreet
2023-06-27 22:05               ` Dave Chinner
2023-06-27 22:41                 ` Kent Overstreet
2023-06-28 14:40                 ` Jens Axboe
2023-06-28 14:48                   ` Thomas Weißschuh
2023-06-28 14:58                     ` Jens Axboe
2023-06-28  3:16               ` Jens Axboe
2023-06-28  4:01                 ` Kent Overstreet
2023-06-28 14:58                   ` Jens Axboe
2023-06-28 15:22                     ` Jens Axboe
2023-06-28 17:56                       ` Kent Overstreet
2023-06-28 20:45                         ` Jens Axboe
2023-06-28 16:57                     ` Jens Axboe
2023-06-28 17:33                       ` Christian Brauner
2023-06-28 17:52                       ` Kent Overstreet
2023-06-28 20:44                         ` Jens Axboe
2023-06-28 21:17                           ` Jens Axboe
2023-06-28 22:13                             ` Kent Overstreet
2023-06-28 22:33                               ` Jens Axboe
2023-06-28 22:55                                 ` Kent Overstreet
2023-06-28 23:14                                   ` Jens Axboe
2023-06-28 23:50                                     ` Kent Overstreet
2023-06-29  1:00                                       ` Dave Chinner
2023-06-29  1:33                                         ` Jens Axboe
2023-06-29 11:18                                           ` Christian Brauner
2023-06-29 14:17                                             ` Kent Overstreet
2023-06-29 15:31                                             ` Kent Overstreet
2023-06-30  9:40                                               ` Christian Brauner
2023-07-06 15:20                                                 ` Kent Overstreet
2023-07-06 16:26                                                   ` Jens Axboe
2023-07-06 16:34                                                     ` Kent Overstreet
2023-06-29  1:29                                       ` Jens Axboe
2023-07-06 20:15                             ` Kent Overstreet
2023-06-28 17:54                     ` Kent Overstreet
2023-06-28 20:54                       ` Jens Axboe
2023-06-28 22:14                         ` Jens Axboe
2023-06-28 23:04                           ` Kent Overstreet
2023-06-28 23:11                             ` Jens Axboe
2023-06-27  2:33       ` Kent Overstreet
2023-06-27  2:59         ` Jens Axboe
2023-06-27  3:19           ` Matthew Wilcox
2023-06-27  3:22             ` Kent Overstreet
2023-06-27  3:52 ` Christoph Hellwig
2023-06-27  4:36   ` Kent Overstreet
2023-07-06 15:56 ` Kent Overstreet
2023-07-06 16:40   ` Josef Bacik
2023-07-06 17:38     ` Kent Overstreet
2023-07-06 19:17       ` Eric Sandeen
2023-07-06 19:31         ` Kent Overstreet
2023-07-06 21:19       ` Darrick J. Wong
2023-07-06 22:43         ` Kent Overstreet
2023-07-07 13:13           ` Jan Kara
2023-07-07 13:52             ` Kent Overstreet
2023-07-07  8:48         ` Christian Brauner
2023-07-07  9:18           ` Kent Overstreet
2023-07-07 16:26             ` James Bottomley
2023-07-07 16:48               ` Kent Overstreet
2023-07-07 17:04                 ` James Bottomley
2023-07-07 17:26                   ` Kent Overstreet
2023-07-08  3:54               ` Matthew Wilcox
2023-07-08  4:10                 ` Kent Overstreet
2023-07-08  4:31                 ` Kent Overstreet
2023-07-08 15:02                   ` Theodore Ts'o
2023-07-08 15:23                     ` Kent Overstreet
2023-07-08 16:42                 ` James Bottomley
2023-07-09  1:16                   ` Kent Overstreet
2023-07-07  9:35           ` Kent Overstreet
2023-07-07  2:04       ` Theodore Ts'o
2023-07-07 12:18       ` Brian Foster
2023-07-07 14:49         ` Kent Overstreet
2023-07-12  2:54   ` Kent Overstreet
2023-07-12 19:48     ` Kees Cook
2023-07-12 19:57       ` Kent Overstreet
2023-07-12 22:10     ` Darrick J. Wong
2023-07-12 23:57       ` Kent Overstreet
2023-08-09  1:27     ` Linus Torvalds
2023-08-10 15:54       ` Kent Overstreet
2023-08-10 16:40         ` Linus Torvalds
2023-08-10 18:02           ` Kent Overstreet
2023-08-10 18:09             ` Linus Torvalds
2023-08-10 17:52         ` Jan Kara
2023-08-11  2:47           ` Kent Overstreet [this message]
2023-08-11  8:10             ` Jan Kara
2023-08-11  8:13               ` Christian Brauner
2023-08-10 22:39         ` Darrick J. Wong
2023-08-10 23:47           ` Linus Torvalds
2023-08-11  2:40             ` Jens Axboe
2023-08-11  4:03             ` Kent Overstreet
2023-08-11  5:20               ` Linus Torvalds
2023-08-11  5:29                 ` Kent Overstreet
2023-08-11  5:53                   ` Linus Torvalds
2023-08-11  7:52                     ` Christian Brauner
2023-08-11 14:31                     ` Jens Axboe
2023-08-11  3:45           ` Kent Overstreet
2023-08-21  0:09             ` Dave Chinner
2023-08-10 23:07         ` Matthew Wilcox
2023-08-11 10:54         ` Christian Brauner
2023-08-11 12:58           ` Kent Overstreet
2023-08-14  7:25             ` Christian Brauner
2023-08-14 15:23               ` Kent Overstreet
2023-08-11 13:21           ` Kent Overstreet
2023-08-11 22:56             ` Darrick J. Wong
2023-08-14  7:21             ` Christian Brauner
2023-08-14 15:27               ` Kent Overstreet
2023-09-03  3:25 Kent Overstreet
2023-09-05 13:24 ` Christoph Hellwig
2023-09-06  0:00   ` Kent Overstreet
2023-09-06  0:41     ` Matthew Wilcox
2023-09-06 16:10       ` Kent Overstreet
2023-09-06 17:57         ` Darrick J. Wong
2023-09-08  9:37     ` Christoph Hellwig
2023-09-06 19:36 ` Linus Torvalds
2023-09-06 20:02   ` Linus Torvalds
2023-09-06 20:20     ` Linus Torvalds
2023-09-06 21:55       ` Arnaldo Carvalho de Melo
2023-09-06 23:13         ` David Sterba
2023-09-06 23:34           ` Linus Torvalds
2023-09-06 23:46             ` Arnaldo Carvalho de Melo
2023-09-06 23:53               ` Arnaldo Carvalho de Melo
2023-09-06 23:16         ` Linus Torvalds
2023-09-10  0:53       ` Kent Overstreet
2023-09-07 20:37   ` Kent Overstreet
2023-09-07 20:51     ` Linus Torvalds
2023-09-07 23:40   ` Kent Overstreet
2023-09-08  6:29     ` Martin Steigerwald
2023-09-08  9:11     ` Joshua Ashton
2023-09-06 22:28 ` Nathan Chancellor
2023-09-07  0:03   ` Kees Cook
2023-09-07 14:29     ` Chris Mason
2023-09-07 20:39     ` Kent Overstreet
2023-09-08 10:50       ` Brian Foster
2023-09-08 23:05     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230811024703.7dhu5rz5ovph7uop@moria.home.lan \
    --to=kent.overstreet@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=andreas.gruenbacher@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bfoster@redhat.com \
    --cc=brauner@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sandeen@redhat.com \
    --cc=snitzer@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).