Re: LSF/MM/BPF 2023 IOMAP conversion status update

From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Luis Chamberlain <mcgrof@kernel.org>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@infradead.org>
Cc: lsf-pc@lists.linux-foundation.org,
	Christoph Hellwig <hch@infradead.org>,
	David Howells <dhowells@redhat.com>,
	"kbus @imap.suse.de>> Keith Busch" <kbusch@kernel.org>,
	Pankaj Raghav <p.raghav@samsung.com>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	yi.zhang@huawei.com, guohanjun@huawei.com
Subject: Re: LSF/MM/BPF 2023 IOMAP conversion status update
Date: Fri, 24 Feb 2023 15:01:37 +0800	[thread overview]
Message-ID: <b1dec5c2-0437-de15-b2f4-13609b4378f0@huaweicloud.com> (raw)
In-Reply-To: <20230208160422.m4d4rx6kg57xm5xk@quack3>

On 2023/2/9 0:04, Jan Kara wrote:
> On Sun 29-01-23 05:06:47, Matthew Wilcox wrote:
>> On Sat, Jan 28, 2023 at 08:46:45PM -0800, Luis Chamberlain wrote:
>>> I'm hoping this *might* be useful to some, but I fear it may leave quite
>>> a bit of folks with more questions than answers as it did for me. And
>>> hence I figured that *this aspect of this topic* perhaps might be a good
>>> topic for LSF.  The end goal would hopefully then be finally enabling us
>>> to document IOMAP API properly and helping with the whole conversion
>>> effort.
>>
>> +1 from me.
>>
>> I've made a couple of abortive efforts to try and convert a "trivial"
>> filesystem like ext2/ufs/sysv/jfs to iomap, and I always get hung up on
>> what the semantics are for get_block_t and iomap_begin().
> 
> Yeah, I'd be also interested in this discussion. In particular as a
> maintainer of part of these legacy filesystems (ext2, udf, isofs).
> 
>>> Perhaps fs/buffers.c could be converted to folios only, and be done
>>> with it. But would we be loosing out on something? What would that be?
>>
>> buffer_heads are inefficient for multi-page folios because some of the
>> algorthims are O(n^2) for n being the number of buffers in a folio.
>> It's fine for 8x 512b buffers in a 4k page, but for 512x 4kb buffers in
>> a 2MB folio, it's pretty sticky.  Things like "Read I/O has completed on
>> this buffer, can I mark the folio as Uptodate now?"  For iomap, that's a
>> scan of a 64 byte bitmap up to 512 times; for BHs, it's a loop over 512
>> allocations, looking at one bit in each BH before moving on to the next.
>> Similarly for writeback, iirc.
>>
>> So +1 from me for a "How do we convert 35-ish block based filesystems
>> from BHs to iomap for their buffered & direct IO paths".  There's maybe a
>> separate discussion to be had for "What should the API be for filesystems
>> to access metadata on the block device" because I don't believe the
>> page-cache based APIs are easy for fs authors to use.
> 
> Yeah, so the actual data paths should be relatively easy for these old
> filesystems as they usually don't do anything special (those that do - like
> reiserfs - are deprecated and to be removed). But for metadata we do need
> some convenience functions like - give me block of metadata at this block
> number, make it dirty / clean / uptodate (block granularity dirtying &
> uptodate state is absolute must for metadata, otherwise we'll have data
> corruption issues). From the more complex functionality we need stuff like:
> lock particular block of metadata (equivalent of buffer lock), track that
> this block is metadata for given inode so that it can be written on
> fsync(2). Then more fancy filesystems like ext4 also need to attach more
> private state to each metadata block but that needs to be dealt with on
> case-by-case basis anyway.
> 

Hello, all.

I also interested in this topic, especially for the ext4 filesystem iomap
conversion of buffered IO paths. And also for the discussion of the metadata APIs,
current buffer_heads could lead to many potential problems and brings a lot of
quality challenges to our products. I look forward to more discussion if I can
attend offline.

Thanks,
Yi.

>> Maybe some related topics are
>> "What testing should we require for some of these ancient filesystems?"
>> "Whose job is it to convert these 35 filesystems anyway, can we just
>> delete some of them?"
> 
> I would not certainly miss some more filesystems - like minix, sysv, ...
> But before really treatening to remove some of these ancient and long
> untouched filesystems, we should convert at least those we do care about.
> When there's precedent how simple filesystem conversion looks like, it is
> easier to argue about what to do with the ones we don't care about so much.
> 
>> "Is there a lower-performance but easier-to-implement API than iomap
>> for old filesystems that only exist for compatibiity reasons?"
> 
> As I wrote above, for metadata there ought to be something as otherwise it
> will be real pain (and no gain really). But I guess the concrete API only
> matterializes once we attempt a conversion of some filesystem like ext2.
> I'll try to have a look into that, at least the obvious preparatory steps
> like converting the data paths to iomap.
> 
> 								Honza
>