From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Luis Chamberlain <mcgrof@kernel.org>, Jan Kara <jack@suse.cz>,
Matthew Wilcox <willy@infradead.org>
Cc: lsf-pc@lists.linux-foundation.org,
Christoph Hellwig <hch@infradead.org>,
David Howells <dhowells@redhat.com>,
"kbus @imap.suse.de>> Keith Busch" <kbusch@kernel.org>,
Pankaj Raghav <p.raghav@samsung.com>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
yi.zhang@huawei.com, guohanjun@huawei.com
Subject: Re: LSF/MM/BPF 2023 IOMAP conversion status update
Date: Fri, 24 Feb 2023 15:01:37 +0800 [thread overview]
Message-ID: <b1dec5c2-0437-de15-b2f4-13609b4378f0@huaweicloud.com> (raw)
In-Reply-To: <20230208160422.m4d4rx6kg57xm5xk@quack3>
On 2023/2/9 0:04, Jan Kara wrote:
> On Sun 29-01-23 05:06:47, Matthew Wilcox wrote:
>> On Sat, Jan 28, 2023 at 08:46:45PM -0800, Luis Chamberlain wrote:
>>> I'm hoping this *might* be useful to some, but I fear it may leave quite
>>> a bit of folks with more questions than answers as it did for me. And
>>> hence I figured that *this aspect of this topic* perhaps might be a good
>>> topic for LSF. The end goal would hopefully then be finally enabling us
>>> to document IOMAP API properly and helping with the whole conversion
>>> effort.
>>
>> +1 from me.
>>
>> I've made a couple of abortive efforts to try and convert a "trivial"
>> filesystem like ext2/ufs/sysv/jfs to iomap, and I always get hung up on
>> what the semantics are for get_block_t and iomap_begin().
>
> Yeah, I'd be also interested in this discussion. In particular as a
> maintainer of part of these legacy filesystems (ext2, udf, isofs).
>
>>> Perhaps fs/buffers.c could be converted to folios only, and be done
>>> with it. But would we be loosing out on something? What would that be?
>>
>> buffer_heads are inefficient for multi-page folios because some of the
>> algorthims are O(n^2) for n being the number of buffers in a folio.
>> It's fine for 8x 512b buffers in a 4k page, but for 512x 4kb buffers in
>> a 2MB folio, it's pretty sticky. Things like "Read I/O has completed on
>> this buffer, can I mark the folio as Uptodate now?" For iomap, that's a
>> scan of a 64 byte bitmap up to 512 times; for BHs, it's a loop over 512
>> allocations, looking at one bit in each BH before moving on to the next.
>> Similarly for writeback, iirc.
>>
>> So +1 from me for a "How do we convert 35-ish block based filesystems
>> from BHs to iomap for their buffered & direct IO paths". There's maybe a
>> separate discussion to be had for "What should the API be for filesystems
>> to access metadata on the block device" because I don't believe the
>> page-cache based APIs are easy for fs authors to use.
>
> Yeah, so the actual data paths should be relatively easy for these old
> filesystems as they usually don't do anything special (those that do - like
> reiserfs - are deprecated and to be removed). But for metadata we do need
> some convenience functions like - give me block of metadata at this block
> number, make it dirty / clean / uptodate (block granularity dirtying &
> uptodate state is absolute must for metadata, otherwise we'll have data
> corruption issues). From the more complex functionality we need stuff like:
> lock particular block of metadata (equivalent of buffer lock), track that
> this block is metadata for given inode so that it can be written on
> fsync(2). Then more fancy filesystems like ext4 also need to attach more
> private state to each metadata block but that needs to be dealt with on
> case-by-case basis anyway.
>
Hello, all.
I also interested in this topic, especially for the ext4 filesystem iomap
conversion of buffered IO paths. And also for the discussion of the metadata APIs,
current buffer_heads could lead to many potential problems and brings a lot of
quality challenges to our products. I look forward to more discussion if I can
attend offline.
Thanks,
Yi.
>> Maybe some related topics are
>> "What testing should we require for some of these ancient filesystems?"
>> "Whose job is it to convert these 35 filesystems anyway, can we just
>> delete some of them?"
>
> I would not certainly miss some more filesystems - like minix, sysv, ...
> But before really treatening to remove some of these ancient and long
> untouched filesystems, we should convert at least those we do care about.
> When there's precedent how simple filesystem conversion looks like, it is
> easier to argue about what to do with the ones we don't care about so much.
>
>> "Is there a lower-performance but easier-to-implement API than iomap
>> for old filesystems that only exist for compatibiity reasons?"
>
> As I wrote above, for metadata there ought to be something as otherwise it
> will be real pain (and no gain really). But I guess the concrete API only
> matterializes once we attempt a conversion of some filesystem like ext2.
> I'll try to have a look into that, at least the obvious preparatory steps
> like converting the data paths to iomap.
>
> Honza
>
next prev parent reply other threads:[~2023-02-24 7:01 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-29 4:46 LSF/MM/BPF 2023 IOMAP conversion status update Luis Chamberlain
2023-01-29 5:06 ` Matthew Wilcox
2023-01-29 5:39 ` Luis Chamberlain
2023-02-08 16:04 ` Jan Kara
2023-02-24 7:01 ` Zhang Yi [this message]
2023-02-26 20:16 ` Ritesh Harjani
2023-03-16 14:40 ` [RFCv1][WIP] ext2: Move direct-io to use iomap Ritesh Harjani (IBM)
2023-03-16 15:41 ` Darrick J. Wong
2023-03-20 16:11 ` Ritesh Harjani
2023-03-20 13:15 ` Christoph Hellwig
2023-03-20 17:51 ` Jan Kara
2023-03-22 6:34 ` Ritesh Harjani
2023-03-23 11:30 ` Jan Kara
2023-03-23 13:19 ` Ritesh Harjani
2023-03-30 0:02 ` Christoph Hellwig
2023-02-27 19:26 ` LSF/MM/BPF 2023 IOMAP conversion status update Darrick J. Wong
2023-02-27 21:02 ` Matthew Wilcox
2023-02-27 19:47 ` Darrick J. Wong
2023-02-27 20:24 ` Luis Chamberlain
2023-02-27 19:06 ` Darrick J. Wong
2023-02-27 19:58 ` Luis Chamberlain
2023-03-01 16:59 ` Ritesh Harjani
2023-03-01 17:08 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b1dec5c2-0437-de15-b2f4-13609b4378f0@huaweicloud.com \
--to=yi.zhang@huaweicloud.com \
--cc=dhowells@redhat.com \
--cc=guohanjun@huawei.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=kbusch@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=willy@infradead.org \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).