From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from ipmail01.adl2.internode.on.net ([150.101.137.133]:42344 "EHLO
        ipmail01.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1752024AbeEOBLu (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Mon, 14 May 2018 21:11:50 -0400
Date: Tue, 15 May 2018 11:11:47 +1000
From: Dave Chinner <david@fromorbit.com>
To: Andreas Gruenbacher <agruenba@redhat.com>
Cc: cluster-devel@redhat.com, Christoph Hellwig <hch@lst.de>,
        linux-fsdevel@vger.kernel.org, Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH v4 06/11] iomap: Add write_{begin,end} iomap operations
Message-ID: <20180515011147.GF23861@dastard>
References: <20180514153624.29598-1-agruenba@redhat.com>
 <20180514153624.29598-7-agruenba@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20180514153624.29598-7-agruenba@redhat.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Mon, May 14, 2018 at 05:36:19PM +0200, Andreas Gruenbacher wrote:
> Add write_begin and write_end operations to struct iomap_ops to provide
> a way of overriding the default behavior of iomap_write_begin and
> iomap_write_end.  This is needed for implementing data journaling: in
> the data journaling case, pages are written into the journal before
> being written back to their proper on-disk locations.

I'm not sure adding a per-page filesystem callout abstraction is
what we want in the iomap code.  The commit message does not explain
why gfs2 needs per-page callouts, nor why the per-page work cannot
be done/avoided by running code in the ->iomap_begin callout
(e.g. by unstuffing the inode so nothing needs to be done per-page).

My main concern is that the callouts to the filesystem in iomap are
- by design - per IO, not per page. The whole point of using iomap
was to get away from doing per-page filesystem operations on
multi-page IOs - it's highly inefficient when we only need to call
into the filesystem on a per-extent basis, and we simplified the
code a *lot* by avoiding such behaviours.

And to that point: this change has serious conflicts with the next
step for the iomap infrastructure: Christoph's recent
"remove bufferheads from iomap" series. Apart from the obvious code
conflicts, there's a fairly major architectural/functional conflict.

https://marc.info/?l=linux-fsdevel&m=152585212000411&w=2

That is, Christoph's patchset pushes us further away from
filesystems doing their own per-page processing of page state. It
converts the iomap IO path to store it's own per-page state tracking
information in page->private, greatly reducing memory overhead (16
bytes on 4k page instead of 104 bytes per bufferhead) and
streamlining the code.  This, however, essentially makes the
buffered IO path through the iomap infrastructure incompatible with
existing bufferhead based filesystems.

Depsite this, we can still provide limited support for buffered
writes on bufferhead based filesystems through the iomap
infrastructure, but I only see that as a mechanism for filesystems
moving away from dependencies on bufferheads. It would require a
different refactoring of the iomap code - I'd much prefer to keep
bufferhead dependent IO paths completely separate (e.g. via a new
actor function) to minimise impact and dependencies on the internal
bufferhead free iomap IO path....

Let's see what Christoph thinks first, though....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com