All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Andreas Grünbacher" <andreas.gruenbacher@gmail.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Matthew Wilcox <willy@infradead.org>,
	Huang Jianan <huangjianan@oppo.com>,
	linux-erofs@lists.ozlabs.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v7] iomap: make inline data support more flexible
Date: Tue, 27 Jul 2021 00:20:18 +0200	[thread overview]
Message-ID: <CAHpGcM+oarDjgAC30X1eP4qPkpPP6i1s32t6CPXCYgiySOqVrg@mail.gmail.com> (raw)
In-Reply-To: <20210726213629.GF8572@magnolia>

Am Mo., 26. Juli 2021 um 23:36 Uhr schrieb Darrick J. Wong <djwong@kernel.org>:
> On Mon, Jul 26, 2021 at 09:22:41AM +0200, Andreas Gruenbacher wrote:
> > On Mon, Jul 26, 2021 at 4:36 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > On Mon, Jul 26, 2021 at 12:16:39AM +0200, Andreas Gruenbacher wrote:
> > > > Here's a fixed and cleaned up version that passes fstests on gfs2.
> > > >
> > > > I see no reason why the combination of tail packing + writing should
> > > > cause any issues, so in my opinion, the check that disables that
> > > > combination in iomap_write_begin_inline should still be removed.
> > >
> > > Since there is no such fs for tail-packing write, I just do a wild
> > > guess, for example,
> > >  1) the tail-end block was not inlined, so iomap_write_end() dirtied
> > >     the whole page (or buffer) for the page writeback;
> > >  2) then it was truncated into a tail-packing inline block so the last
> > >     extent(page) became INLINE but dirty instead;
> > >  3) during the late page writeback for dirty pages,
> > >     if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
> > >     would be triggered in iomap_writepage_map() for such dirty page.
> > >
> > > As Matthew pointed out before,
> > > https://lore.kernel.org/r/YPrms0fWPwEZGNAL@casper.infradead.org/
> > > currently tail-packing inline won't interact with page writeback, but
> > > I'm afraid a supported tail-packing write fs needs to reconsider the
> > > whole stuff how page, inode writeback works and what the pattern is
> > > with the tail-packing.
> > >
> > > >
> > > > It turns out that returning the number of bytes copied from
> > > > iomap_read_inline_data is a bit irritating: the function is really used
> > > > for filling the page, but that's not always the "progress" we're looking
> > > > for.  In the iomap_readpage case, we actually need to advance by an
> > > > antire page, but in the iomap_file_buffered_write case, we need to
> > > > advance by the length parameter of iomap_write_actor or less.  So I've
> > > > changed that back.
> > > >
> > > > I've also renamed iomap_inline_buf to iomap_inline_data and I've turned
> > > > iomap_inline_data_size_valid into iomap_within_inline_data, which seems
> > > > more useful to me.
> > > >
> > > > Thanks,
> > > > Andreas
> > > >
> > > > --
> > > >
> > > > Subject: [PATCH] iomap: Support tail packing
> > > >
> > > > The existing inline data support only works for cases where the entire
> > > > file is stored as inline data.  For larger files, EROFS stores the
> > > > initial blocks separately and then can pack a small tail adjacent to the
> > > > inode.  Generalise inline data to allow for tail packing.  Tails may not
> > > > cross a page boundary in memory.
> > > >
> > > > We currently have no filesystems that support tail packing and writing,
> > > > so that case is currently disabled (see iomap_write_begin_inline).  I'm
> > > > not aware of any reason why this code path shouldn't work, however.
> > > >
> > > > Cc: Christoph Hellwig <hch@lst.de>
> > > > Cc: Darrick J. Wong <djwong@kernel.org>
> > > > Cc: Matthew Wilcox <willy@infradead.org>
> > > > Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
> > > > Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
> > > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > > > ---
> > > >  fs/iomap/buffered-io.c | 34 +++++++++++++++++++++++-----------
> > > >  fs/iomap/direct-io.c   | 11 ++++++-----
> > > >  include/linux/iomap.h  | 22 +++++++++++++++++++++-
> > > >  3 files changed, 50 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > index 87ccb3438bec..334bf98fdd4a 100644
> > > > --- a/fs/iomap/buffered-io.c
> > > > +++ b/fs/iomap/buffered-io.c
> > > > @@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
> > > >       struct readahead_control *rac;
> > > >  };
> > > >
> > > > -static void
> > > > -iomap_read_inline_data(struct inode *inode, struct page *page,
> > > > +static int iomap_read_inline_data(struct inode *inode, struct page *page,
> > > >               struct iomap *iomap)
> > > >  {
> > > > -     size_t size = i_size_read(inode);
> > > > +     size_t size = i_size_read(inode) - iomap->offset;
> > >
> > > I wonder why you use i_size / iomap->offset here,
> >
> > This function is supposed to copy the inline or tail data at
> > iomap->inline_data into the page passed to it. Logically, the inline
> > data starts at iomap->offset and extends until i_size_read(inode).
> > Relative to the page, the inline data starts at offset 0 and extends
> > until i_size_read(inode) - iomap->offset. It's as simple as that.
>
> It's only as simple as that because the inline data read code is overfit
> to the single use case (gfs2) that it supports.  So far in its history,
> iomap has never had to support inline data regions that do not coincide
> or overlap with EOF, nor has it had to support regions that do not start
> at pos==0.  That is why it is appropriate to use the memcpy -> memset ->
> return PAGE_SIZE pattern and short-circuit what we do everywhere else in
> iomap.
>
> For a non-inline readpage call, filesystems are allowed to return
> mappings for blocks beyond EOF.  The call to iomap_adjust_read_range
> sets us up to read data from disk through the EOF block, and for the
> remainder of the page we zero the post-eof blocks within that page.
>
> IOWs, for reads, __gfs2_iomap_get probably ought to set iomap->length to
> gfs2_max_stuffed_size() like it does for writes, and we ought to
> generalize iomap_read_inline_data to stop copying after
> min(iomap->length, i_size_read() - iomap->offset) bytes.  If it then
> discovers that it has indeed reached EOF, then we can zero the rest of
> the page and add that quantity to the number of bytes read.

That sounds like a useful improvement. I'll give it a try.

Thanks,
Andreas

> Right now for gfs2 the two arguments to min are always the same so the
> function omits all the bits that would make the zeroing actually
> conditional on whether we really hit EOF, and pass any copied size other
> than PAGE_SIZE back to iomap_readpage_actor.  Given that we still don't
> have any filesystems that require us to support inline regions entirely
> below EOF I'm fine with omitting the general (and hence untestable)
> solution... for now.
>
> (I now think I understand why someone brought up inline data regions in
> the middle of files last week.)
>
> --D
>
> > > and why you completely ignoring iomap->length field returning by fs.
> >
> > In the iomap_readpage case (iomap_begin with flags == 0),
> > iomap->length will be the amount of data up to the end of the inode.
> > In the iomap_file_buffered_write case (iomap_begin with flags ==
> > IOMAP_WRITE), iomap->length will be the size of iomap->inline_data.
> > (For extending writes, we need to write beyond the current end of
> > inode.) So iomap->length isn't all that useful for
> > iomap_read_inline_data.
> >
> > > Using i_size here instead of iomap->length seems coupling to me in the
> > > beginning (even currently in practice there is some limitation.)
> >
> > And what is that?
> >
> > Thanks,
> > Andreas
> >

WARNING: multiple messages have this Message-ID (diff)
From: "Andreas Grünbacher" <andreas.gruenbacher@gmail.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-erofs@lists.ozlabs.org, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v7] iomap: make inline data support more flexible
Date: Tue, 27 Jul 2021 00:20:18 +0200	[thread overview]
Message-ID: <CAHpGcM+oarDjgAC30X1eP4qPkpPP6i1s32t6CPXCYgiySOqVrg@mail.gmail.com> (raw)
In-Reply-To: <20210726213629.GF8572@magnolia>

Am Mo., 26. Juli 2021 um 23:36 Uhr schrieb Darrick J. Wong <djwong@kernel.org>:
> On Mon, Jul 26, 2021 at 09:22:41AM +0200, Andreas Gruenbacher wrote:
> > On Mon, Jul 26, 2021 at 4:36 AM Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > > On Mon, Jul 26, 2021 at 12:16:39AM +0200, Andreas Gruenbacher wrote:
> > > > Here's a fixed and cleaned up version that passes fstests on gfs2.
> > > >
> > > > I see no reason why the combination of tail packing + writing should
> > > > cause any issues, so in my opinion, the check that disables that
> > > > combination in iomap_write_begin_inline should still be removed.
> > >
> > > Since there is no such fs for tail-packing write, I just do a wild
> > > guess, for example,
> > >  1) the tail-end block was not inlined, so iomap_write_end() dirtied
> > >     the whole page (or buffer) for the page writeback;
> > >  2) then it was truncated into a tail-packing inline block so the last
> > >     extent(page) became INLINE but dirty instead;
> > >  3) during the late page writeback for dirty pages,
> > >     if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
> > >     would be triggered in iomap_writepage_map() for such dirty page.
> > >
> > > As Matthew pointed out before,
> > > https://lore.kernel.org/r/YPrms0fWPwEZGNAL@casper.infradead.org/
> > > currently tail-packing inline won't interact with page writeback, but
> > > I'm afraid a supported tail-packing write fs needs to reconsider the
> > > whole stuff how page, inode writeback works and what the pattern is
> > > with the tail-packing.
> > >
> > > >
> > > > It turns out that returning the number of bytes copied from
> > > > iomap_read_inline_data is a bit irritating: the function is really used
> > > > for filling the page, but that's not always the "progress" we're looking
> > > > for.  In the iomap_readpage case, we actually need to advance by an
> > > > antire page, but in the iomap_file_buffered_write case, we need to
> > > > advance by the length parameter of iomap_write_actor or less.  So I've
> > > > changed that back.
> > > >
> > > > I've also renamed iomap_inline_buf to iomap_inline_data and I've turned
> > > > iomap_inline_data_size_valid into iomap_within_inline_data, which seems
> > > > more useful to me.
> > > >
> > > > Thanks,
> > > > Andreas
> > > >
> > > > --
> > > >
> > > > Subject: [PATCH] iomap: Support tail packing
> > > >
> > > > The existing inline data support only works for cases where the entire
> > > > file is stored as inline data.  For larger files, EROFS stores the
> > > > initial blocks separately and then can pack a small tail adjacent to the
> > > > inode.  Generalise inline data to allow for tail packing.  Tails may not
> > > > cross a page boundary in memory.
> > > >
> > > > We currently have no filesystems that support tail packing and writing,
> > > > so that case is currently disabled (see iomap_write_begin_inline).  I'm
> > > > not aware of any reason why this code path shouldn't work, however.
> > > >
> > > > Cc: Christoph Hellwig <hch@lst.de>
> > > > Cc: Darrick J. Wong <djwong@kernel.org>
> > > > Cc: Matthew Wilcox <willy@infradead.org>
> > > > Cc: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
> > > > Tested-by: Huang Jianan <huangjianan@oppo.com> # erofs
> > > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > > > ---
> > > >  fs/iomap/buffered-io.c | 34 +++++++++++++++++++++++-----------
> > > >  fs/iomap/direct-io.c   | 11 ++++++-----
> > > >  include/linux/iomap.h  | 22 +++++++++++++++++++++-
> > > >  3 files changed, 50 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > > > index 87ccb3438bec..334bf98fdd4a 100644
> > > > --- a/fs/iomap/buffered-io.c
> > > > +++ b/fs/iomap/buffered-io.c
> > > > @@ -205,25 +205,29 @@ struct iomap_readpage_ctx {
> > > >       struct readahead_control *rac;
> > > >  };
> > > >
> > > > -static void
> > > > -iomap_read_inline_data(struct inode *inode, struct page *page,
> > > > +static int iomap_read_inline_data(struct inode *inode, struct page *page,
> > > >               struct iomap *iomap)
> > > >  {
> > > > -     size_t size = i_size_read(inode);
> > > > +     size_t size = i_size_read(inode) - iomap->offset;
> > >
> > > I wonder why you use i_size / iomap->offset here,
> >
> > This function is supposed to copy the inline or tail data at
> > iomap->inline_data into the page passed to it. Logically, the inline
> > data starts at iomap->offset and extends until i_size_read(inode).
> > Relative to the page, the inline data starts at offset 0 and extends
> > until i_size_read(inode) - iomap->offset. It's as simple as that.
>
> It's only as simple as that because the inline data read code is overfit
> to the single use case (gfs2) that it supports.  So far in its history,
> iomap has never had to support inline data regions that do not coincide
> or overlap with EOF, nor has it had to support regions that do not start
> at pos==0.  That is why it is appropriate to use the memcpy -> memset ->
> return PAGE_SIZE pattern and short-circuit what we do everywhere else in
> iomap.
>
> For a non-inline readpage call, filesystems are allowed to return
> mappings for blocks beyond EOF.  The call to iomap_adjust_read_range
> sets us up to read data from disk through the EOF block, and for the
> remainder of the page we zero the post-eof blocks within that page.
>
> IOWs, for reads, __gfs2_iomap_get probably ought to set iomap->length to
> gfs2_max_stuffed_size() like it does for writes, and we ought to
> generalize iomap_read_inline_data to stop copying after
> min(iomap->length, i_size_read() - iomap->offset) bytes.  If it then
> discovers that it has indeed reached EOF, then we can zero the rest of
> the page and add that quantity to the number of bytes read.

That sounds like a useful improvement. I'll give it a try.

Thanks,
Andreas

> Right now for gfs2 the two arguments to min are always the same so the
> function omits all the bits that would make the zeroing actually
> conditional on whether we really hit EOF, and pass any copied size other
> than PAGE_SIZE back to iomap_readpage_actor.  Given that we still don't
> have any filesystems that require us to support inline regions entirely
> below EOF I'm fine with omitting the general (and hence untestable)
> solution... for now.
>
> (I now think I understand why someone brought up inline data regions in
> the middle of files last week.)
>
> --D
>
> > > and why you completely ignoring iomap->length field returning by fs.
> >
> > In the iomap_readpage case (iomap_begin with flags == 0),
> > iomap->length will be the amount of data up to the end of the inode.
> > In the iomap_file_buffered_write case (iomap_begin with flags ==
> > IOMAP_WRITE), iomap->length will be the size of iomap->inline_data.
> > (For extending writes, we need to write beyond the current end of
> > inode.) So iomap->length isn't all that useful for
> > iomap_read_inline_data.
> >
> > > Using i_size here instead of iomap->length seems coupling to me in the
> > > beginning (even currently in practice there is some limitation.)
> >
> > And what is that?
> >
> > Thanks,
> > Andreas
> >

  reply	other threads:[~2021-07-26 22:20 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-23 17:41 [PATCH v7] iomap: make inline data support more flexible Gao Xiang
2021-07-23 17:41 ` Gao Xiang
2021-07-23 19:40 ` Matthew Wilcox
2021-07-23 19:40   ` Matthew Wilcox
2021-07-24  0:54   ` Gao Xiang
2021-07-24  0:54     ` Gao Xiang
2021-07-25 21:39 ` Andreas Grünbacher
2021-07-25 21:39   ` Andreas Grünbacher
2021-07-25 22:16 ` Andreas Gruenbacher
2021-07-25 22:16   ` Andreas Gruenbacher
2021-07-26  2:36   ` Gao Xiang
2021-07-26  2:36     ` Gao Xiang
2021-07-26  7:22     ` Andreas Gruenbacher
2021-07-26  7:22       ` Andreas Gruenbacher
2021-07-26  7:38       ` Gao Xiang
2021-07-26  7:38         ` Gao Xiang
2021-07-26 21:36       ` Darrick J. Wong
2021-07-26 21:36         ` Darrick J. Wong
2021-07-26 22:20         ` Andreas Grünbacher [this message]
2021-07-26 22:20           ` Andreas Grünbacher
2021-07-26  3:06   ` Matthew Wilcox
2021-07-26  3:06     ` Matthew Wilcox
2021-07-26  6:56     ` Andreas Gruenbacher
2021-07-26  6:56       ` Andreas Gruenbacher
2021-07-26  4:00   ` Gao Xiang
2021-07-26  4:00     ` Gao Xiang
2021-07-26  8:08     ` Andreas Grünbacher
2021-07-26  8:08       ` Andreas Grünbacher
2021-07-26  8:17       ` Gao Xiang
2021-07-26  8:17         ` Gao Xiang
2021-07-26 11:06     ` Andreas Gruenbacher
2021-07-26 11:06       ` Andreas Gruenbacher
2021-07-26 12:17       ` Christoph Hellwig
2021-07-26 12:17         ` Christoph Hellwig
2021-07-26 12:27         ` Andreas Grünbacher
2021-07-26 12:27           ` Andreas Grünbacher
2021-07-26 12:50           ` Gao Xiang
2021-07-26 12:50             ` Gao Xiang
2021-07-26 13:10             ` Andreas Gruenbacher
2021-07-26 13:27           ` Christoph Hellwig
2021-07-26 13:27             ` Christoph Hellwig
2021-07-27  8:20         ` David Sterba
2021-07-27  8:20           ` David Sterba
2021-07-27 13:35           ` Matthew Wilcox
2021-07-27 15:04             ` Gao Xiang
2021-07-27 15:04               ` Gao Xiang
2021-07-27 16:53             ` David Sterba
2021-07-27 16:53               ` David Sterba
2021-07-26 12:32       ` Matthew Wilcox
2021-07-26 12:32         ` Matthew Wilcox
2021-07-26 13:03         ` Andreas Gruenbacher
2021-07-26 13:03           ` Andreas Gruenbacher
2021-07-26 13:12           ` Gao Xiang
2021-07-26 13:12             ` Gao Xiang
2021-07-26 13:30             ` Christoph Hellwig
2021-07-26  8:08 ` Joseph Qi
2021-07-26  8:08   ` Joseph Qi
2021-08-01 10:29 Andreas Gruenbacher
2021-08-01 10:29 ` Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHpGcM+oarDjgAC30X1eP4qPkpPP6i1s32t6CPXCYgiySOqVrg@mail.gmail.com \
    --to=andreas.gruenbacher@gmail.com \
    --cc=agruenba@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=huangjianan@oppo.com \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.