From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from verein.lst.de ([213.95.11.211]:40358 "EHLO newverein.lst.de"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1752135AbeEOHSf (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
        Tue, 15 May 2018 03:18:35 -0400
Date: Tue, 15 May 2018 09:22:52 +0200
From: Christoph Hellwig <hch@lst.de>
To: Dave Chinner <david@fromorbit.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>,
        cluster-devel@redhat.com, Christoph Hellwig <hch@lst.de>,
        linux-fsdevel@vger.kernel.org, Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH v4 06/11] iomap: Add write_{begin,end} iomap operations
Message-ID: <20180515072252.GA23202@lst.de>
References: <20180514153624.29598-1-agruenba@redhat.com> <20180514153624.29598-7-agruenba@redhat.com> <20180515011147.GF23861@dastard>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20180515011147.GF23861@dastard>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Tue, May 15, 2018 at 11:11:47AM +1000, Dave Chinner wrote:
> I'm not sure adding a per-page filesystem callout abstraction is
> what we want in the iomap code.  The commit message does not explain
> why gfs2 needs per-page callouts, nor why the per-page work cannot
> be done/avoided by running code in the ->iomap_begin callout
> (e.g. by unstuffing the inode so nothing needs to be done per-page).

Agreed.
> 
> My main concern is that the callouts to the filesystem in iomap are
> - by design - per IO, not per page. The whole point of using iomap
> was to get away from doing per-page filesystem operations on
> multi-page IOs - it's highly inefficient when we only need to call
> into the filesystem on a per-extent basis, and we simplified the
> code a *lot* by avoiding such behaviours.

Yes.  And from a quick look we can easily handle "stuffed" aka inline
data with the existing architecture.  We just need a new IOMAP_INLINE
flag for the extent type.  Details will need to be worked out how to
pass that data, either a struct page or virtual address should probably
do it.  The other thing gfs2 seems to be doing is handling of journaled
data.  I'm not quite sure how to fit that into the grand overall scheme
of things, but at least that would only be after the write.

> That is, Christoph's patchset pushes us further away from
> filesystems doing their own per-page processing of page state. It
> converts the iomap IO path to store it's own per-page state tracking
> information in page->private, greatly reducing memory overhead (16
> bytes on 4k page instead of 104 bytes per bufferhead) and

Or in fact zero bytes for block size == PAGE_SIZE

> streamlining the code.  This, however, essentially makes the
> buffered IO path through the iomap infrastructure incompatible with
> existing bufferhead based filesystems.
> 
> Depsite this, we can still provide limited support for buffered
> writes on bufferhead based filesystems through the iomap
> infrastructure, but I only see that as a mechanism for filesystems
> moving away from dependencies on bufferheads. It would require a
> different refactoring of the iomap code - I'd much prefer to keep
> bufferhead dependent IO paths completely separate (e.g. via a new
> actor function) to minimise impact and dependencies on the internal
> bufferhead free iomap IO path....
> 
> Let's see what Christoph thinks first, though....

gfs2 seems to indeed depend pretty heavily on buffer_heads.  This is 
a bit of a bummer, but I'd need to take a deeper look how to offer
iomap functionality to them without them moving away from buffer_heads
which isn't really going to be a quick project.  In a way the
write_begin/write_end ops from Andreas facilitate that, as the
rest of iomap_file_buffered_write and iomap_write_actor are untouched.

So at least temporarily his callouts are what we need, even if the
actual inline data functionality should be moved out of them for sure,
but only with a long term plan to move gfs2 away from buffer heads.

Btw, does gfs2 support blocksizes smallers than PAGE_SIZE?

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
---end quoted text---