All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Kuai <yukuai1@huaweicloud.com>
To: Christian Brauner <brauner@kernel.org>,
	Yu Kuai <yukuai1@huaweicloud.com>
Cc: Christoph Hellwig <hch@lst.de>,
	jack@suse.cz, axboe@kernel.dk, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, yi.zhang@huawei.com,
	yangerkun@huawei.com, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
Date: Mon, 18 Mar 2024 19:57:49 +0800	[thread overview]
Message-ID: <06c8cc61-0368-eccc-b781-0eda223a9b07@huaweicloud.com> (raw)
In-Reply-To: <20240318-darauf-lachhaft-b7a510575d87@brauner>

Hi,

在 2024/03/18 18:46, Christian Brauner 写道:
> On Mon, Mar 18, 2024 at 11:29:22AM +0100, Christian Brauner wrote:
>> On Mon, Mar 18, 2024 at 11:07:49AM +0100, Christian Brauner wrote:
>>> On Mon, Mar 18, 2024 at 03:19:03PM +0800, Yu Kuai wrote:
>>>> Hi, Christoph!
>>>>
>>>> 在 2024/03/18 9:51, Yu Kuai 写道:
>>>>> Hi,
>>>>>
>>>>> 在 2024/03/18 9:32, Christoph Hellwig 写道:
>>>>>> On Mon, Mar 18, 2024 at 09:26:48AM +0800, Yu Kuai wrote:
>>>>>>> Because there is a real filesystem(devtmpfs) used for raw block devcie
>>>>>>> file operations, open syscall to devtmpfs:
>>
>> Don't forget:
>>
>> mknod /my/xfs/file/system b 8 0
>>
>> which means you're not opening it via devtmpfs but via xfs. IOW, the
>> inode for that file is from xfs.

I think there is no difference from devtmpfs, no matter what file is
passed in from blkdev_open(), we'll find the only bd_inode and stash
new bdev_file here.
>>
>>>>>>>
>>>>>>> blkdev_open
>>>>>>>    bdev = blkdev_get_no_open
>>>>>>>    bdev_open -> pass in file is from devtmpfs
>>>>>>>    -> in this case, file inode is from devtmpfs,
>>>>>>
>>>>>> But file->f_mapping->host should still point to the bdevfs inode,
>>>>>> and file->f_mapping->host is what everything in the I/O path should
>>>>>> be using.
>>>
>>> I mentioned this in
>>> https://lore.kernel.org/r/20240118-gemustert-aalen-ee71d0c69826@brauner
>>>
>>> "[...] if we want to have all code pass a file and we have code in
>>> fs/buffer.c like iomap_to_bh():
>>>
>>> iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>>>          loff_t offset = block << inode->i_blkbits;
>>>
>>>          bh->b_bdev = iomap->bdev;
>>> +       bh->f_b_bdev = iomap->f_bdev;
>>>
>>> While that works for every single filesystem that uses block devices
>>> because they stash them somewhere (like s_bdev_file) it doesn't work for
>>> the bdev filesystem itself. So if the bdev filesystem calls into helpers
>>> that expect e.g., buffer_head->s_f_bdev to have been initialized from
>>> iomap->f_bdev this wouldn't work.
>>>
>>> So if we want to remove b_bdev from struct buffer_head and fully rely on
>>> f_b_bdev - and similar in iomap - then we need a story for the bdev fs
>>> itself. And I wasn't clear on what that would be."
>>>
>>>>>>
>>>>>>> Then later, in blkdev_iomap_begin(), bd_inode is passed in and there is
>>>>>>> no access to the devtmpfs file, we can't use s_bdev_file() as other
>>>>>>> filesystems here.
>>>>>>
>>>>>> We can just pass the file down in iomap_iter.private
>>>>>
>>>>> I can do this for blkdev_read_folio(), however, for other ops like
>>>>> blkdev_writepages(), I can't find a way to pass the file to
>>>>> iomap_iter.private yet.
>>>>>
>>>>> Any suggestions?
>>>>
>>>> I come up with an ideal:
>>>>
>>>> While opening the block_device the first time, store the generated new
>>>> file in "bd_inode->i_private". And release it after the last opener
>>>> close the block_device.
>>>>
>>>> The advantages are:
>>>>   - multiple openers can share the same bdev_file;
>>>
>>> You mean use the file stashed in bdev_inode->i_private only to retrieve
>>> the inode/mapping in the block layer ops.

Yes. I mean in the first bdev_open() allocate a bdev_file and stash it,
and free it in the last bdev_release().
>>>
>>>>   - raw block device ops can use the bdev_file as well, and there is no
>>>> need to distinguish iomap/buffer_head for raw block_device;
>>>>
>>>> Please let me know what do you think?
>>>
>>> It's equally ugly but probably slightly less error prone than the union
>>> approach. But please make that separate patches on top of the series.
> 
> The other issue with this on-demand inode->i_private allocation will be
> lifetime management. If you're doing some sort of writeback initiated
> from the filesystem then you're guaranteed that the file stashed in
> sb->bdev_file is aligned with the lifetime of the filesystem. All
> writeback related stuff that relies on inode's can rely on the
> superblock being valid while it is doing stuff.

For raw block device, before bdev_release() is called for the last
opener(specifically bd_openers decreased to zero),
blkdev_flush_mapping() is called, hence raw block_device writeback
should always see valid 'bdev_file' that will be release in the last
bdev_release().

And 'blockdev_superblock' will always be there and is always valid.>
> In your approach that guarantee can't be given easily. If someone opens
> a block device /dev/sda does some buffered writes and then closes it the
> file might be cleaned up while there's still operations ongoing that
> rely on the file stashed in inode->i_private to be valid.
> 
> If on the other hand you allocate a stub file on-demand during
> bdev_open() and stash it in inode->i_private you need to make sure to
> avoid creating reference count cycles that keep the inode alive.

I'm thinking about use 'bdev_openers' to gurantee the lifetime. I can't
think of possible problems for now, however, I cound be wrong.

Thanks,
Kuai

> .
> 


  reply	other threads:[~2024-03-18 12:17 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
2024-02-22 12:45 ` [RFC v4 linux-next 01/19] block: move two helpers into bdev.c Yu Kuai
2024-03-15 14:31   ` Jan Kara
2024-03-17 21:19   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 02/19] block: remove sync_blockdev_nowait() Yu Kuai
2024-03-15 14:34   ` Jan Kara
2024-03-17 21:19   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 03/19] block: remove sync_blockdev_range() Yu Kuai
2024-03-15 14:37   ` Jan Kara
2024-03-17 21:21   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 04/19] block: prevent direct access of bd_inode Yu Kuai
2024-03-15 14:44   ` Jan Kara
2024-03-17 21:23   ` Christoph Hellwig
2024-03-22  5:44   ` Al Viro
2024-02-22 12:45 ` [RFC v4 linux-next 05/19] bcachefs: remove dead function bdev_sectors() Yu Kuai
2024-03-15 14:42   ` Jan Kara
2024-03-17 21:23   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 06/19] cramfs: prevent direct access of bd_inode Yu Kuai
2024-03-15 14:44   ` Jan Kara
2024-03-17 21:23   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 07/19] erofs: " Yu Kuai
2024-03-15 14:45   ` Jan Kara
2024-03-17 21:24   ` Christoph Hellwig
2024-03-18  2:39   ` Gao Xiang
2024-02-22 12:45 ` [RFC v4 linux-next 08/19] nilfs2: " Yu Kuai
2024-03-15 14:49   ` Jan Kara
2024-03-17 21:24   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 09/19] gfs2: " Yu Kuai
2024-03-15 14:54   ` Jan Kara
2024-03-17 21:24   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 10/19] s390/dasd: use bdev api in dasd_format() Yu Kuai
2024-03-15 14:55   ` Jan Kara
2024-03-17 21:25   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 11/19] btrfs: prevent direct access of bd_inode Yu Kuai
2024-03-15 15:09   ` Jan Kara
2024-03-17 21:25   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 12/19] ext4: remove block_device_ejected() Yu Kuai
2024-02-22 12:45 ` [RFC v4 linux-next 13/19] ext4: prevent direct access of bd_inode Yu Kuai
2024-03-15 14:58   ` Jan Kara
2024-03-17 21:25   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 14/19] jbd2: " Yu Kuai
2024-03-15 15:06   ` Jan Kara
2024-03-17 21:26   ` Christoph Hellwig
2024-03-18  1:10     ` Yu Kuai
2024-02-22 12:45 ` [RFC v4 linux-next 15/19] bcache: " Yu Kuai
2024-03-15 15:11   ` Jan Kara
2024-03-17 21:34   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 16/19] block2mtd: " Yu Kuai
2024-03-15 15:12   ` Jan Kara
2024-03-17 21:36   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 17/19] dm-vdo: " Yu Kuai
2024-02-28 13:41   ` Christoph Hellwig
2024-03-18  9:11     ` Jan Kara
2024-03-18  9:19   ` Jan Kara
2024-03-18 13:38     ` Yu Kuai
2024-03-19  2:00       ` Matthew Sakai
2024-02-22 12:45 ` [RFC v4 linux-next 18/19] scsi: factor out a helper bdev_read_folio() from scsi_bios_ptable() Yu Kuai
2024-03-17 21:36   ` Christoph Hellwig
2024-03-18  1:12     ` Yu Kuai
2024-03-18  9:22   ` Jan Kara
2024-02-22 12:45 ` [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode Yu Kuai
2024-02-25  0:06   ` kernel test robot
2024-03-17 21:38   ` Christoph Hellwig
2024-03-18  1:26     ` Yu Kuai
2024-03-18  1:32       ` Christoph Hellwig
2024-03-18  1:51         ` Yu Kuai
2024-03-18  7:19           ` Yu Kuai
2024-03-18 10:07             ` Christian Brauner
2024-03-18 10:29               ` Christian Brauner
2024-03-18 10:46                 ` Christian Brauner
2024-03-18 11:57                   ` Yu Kuai [this message]
2024-03-18 23:35                 ` Christoph Hellwig
2024-03-18 23:22             ` Christoph Hellwig
2024-03-19  8:26               ` Yu Kuai
2024-03-21 11:27                 ` Jan Kara
2024-03-21 12:15                   ` Yu Kuai
2024-03-22  6:37                     ` Al Viro
2024-03-22  6:39                       ` Al Viro
2024-03-22  6:52                         ` Yu Kuai
2024-03-22 12:57                           ` Jan Kara
2024-03-22 13:57                             ` Christian Brauner
2024-03-22 15:43                           ` Al Viro
2024-03-22 16:16                             ` Al Viro
2024-03-22  6:33                 ` Al Viro
2024-03-22  7:09                   ` Yu Kuai
2024-03-22 16:01                     ` Al Viro
2024-03-22 13:10                   ` Jan Kara
2024-03-22 14:57                     ` Al Viro
2024-03-25  1:06                       ` Christoph Hellwig
2024-02-28 13:42 ` [RFC v4 linux-next 00/19] " Christoph Hellwig
2024-03-15 12:08 ` Yu Kuai
2024-03-15 13:54   ` Christian Brauner
2024-03-16  2:49     ` Yu Kuai
2024-03-18  9:39       ` Christian Brauner
2024-03-19  1:18         ` Yu Kuai
2024-03-19  1:43           ` Yu Kuai
2024-03-19  2:13             ` Matthew Sakai
2024-03-19  2:27               ` Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=06c8cc61-0368-eccc-b781-0eda223a9b07@huaweicloud.com \
    --to=yukuai1@huaweicloud.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.