linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Andreas Grünbacher" <andreas.gruenbacher@gmail.com>
To: Chao Yu <yuchao0@huawei.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-xfs@vger.kernel.org,
	Linux FS-devel Mailing List <linux-fsdevel@vger.kernel.org>,
	Gao Xiang <gaoxiang25@huawei.com>,
	chao@kernel.org
Subject: Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
Date: Wed, 10 Jul 2019 23:50:39 +0200	[thread overview]
Message-ID: <CAHpGcMJ_wPJf8KtF3xMP_28pe4Vq4XozFtmd2EuZ+RTqZKQxLA@mail.gmail.com> (raw)
In-Reply-To: <39944e50-5888-f900-1954-91be2b12ea5b@huawei.com>

Am Mi., 10. Juli 2019 um 12:31 Uhr schrieb Chao Yu <yuchao0@huawei.com>:
> Hi Andreas,
>
> Thanks for your review in advance. :)
>
> On 2019/7/10 7:32, Andreas Grünbacher wrote:
> > Hi Chao,
> >
> > Am Mi., 3. Juli 2019 um 09:55 Uhr schrieb Chao Yu <yuchao0@huawei.com>:
> >> Some filesystems like erofs/reiserfs have the ability to pack tail
> >> data into metadata, e.g.:
> >> IOMAP_MAPPED [0, 8192]
> >> IOMAP_INLINE [8192, 8200]
> >>
> >> However current IOMAP_INLINE type has assumption that:
> >> - inline data should be locating at page #0.
> >> - inline size should equal to .i_size
> >> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs,
> >> so this patch tries to relieve above limits to make IOMAP_INLINE more
> >> generic to cover tail-packing case.
> >
> > this patch suffers from the following problems:
> >
> > * Fiemap distinguishes between FIEMAP_EXTENT_DATA_INLINE and
> > FIEMAP_EXTENT_DATA_TAIL. Someone will sooner or later inevitably use
> > iomap_fiemap on a filesystem with tail packing, so if we don't make
> > the same distinction in iomap, iomap_fiemap will silently produce the
> > wrong result. This means that IOMAP_TAIL should become a separate
> > mapping type.
>
> I'm a little confused, IIUC, IOMAP_TAIL and FIEMAP_EXTENT_DATA_TAIL are with
> different semantics:
>
> - IOMAP_TAIL:
>   we may introduce this flag to cover tail-packing case, in where we merge
> small-sized data in the tail of file into free space of its metadata.
> - FIEMAP_EXTENT_DATA_TAIL:
> quoted from Documentation/filesystems/fiemap.txt
> "  This will also set FIEMAP_EXTENT_NOT_ALIGNED
> Data is packed into a block with data from other files."
> It looks like this flag indicates that blocks from different files stores into
> one common block.

Well, maybe FIEMAP_EXTENT_DATA_INLINE is indeed the more appropriate
flag in your scenario. In that case, we should be fine on the fiemap
front.

> I'm not familiar with fiemap flags' history, but maybe FIEMAP_EXTENT_DATA_TAIL
> should be better to rename to FIEMAP_EXTENT_DATA_PACKING to avoid ambiguity. And
> then FIEMAP_EXTENT_DATA_INLINE will be enough to cover normal inline case and
> tail-packing case?

Those definitions are user-visible.

> > * IOMAP_INLINE mappings always start at offset 0 and span an entire
> > page, so they are always page aligned. IOMAP_TAIL mappings only need
>
> Why can't #0 page consist of more than one block/mapping? I didn't get what's
> difference here.

Currently, iomap_write_begin will read the inline data into page #0
and mark that page up-to-date. There's a check for that in
iomap_write_end_inline. If a page contains more than one mapping, we
won't be able to mark the entire page up to date anymore; we'd need to
track which parts of the page are up to date and which are not. Iomap
supports two tracking mechanisms, buffer heads and iomap_page, and
we'd need to implement that tracking code for each of those cases.

> > to be block aligned. If the block size is smaller than the page size,
>
> - reiserfs tries to find last page's content, if the size of that page's valid
> content is smaller than threshold (at least less than one block), reiserfs can
> do the packing.
>
> - erofs' block size is 4kb which is the same as page size.
>
>
> Actually, for IOMAP_TAIL, there is no users who need to map more than one block
> into metadata, so I'm not sure that we should support that, or we just need to
> let code preparing for that to those potential users?

How about architectures with PAGE_SIZE > 4096? I'm assuming that erofs
doesn't require block size == PAGE_SIZE?

> Thanks,
>
> > a tail page can consist of more than one mapping. So
> > iomap_read_inline_data needs to be changed to only copy a single block
> > out of iomap->inline_data, and we need to adjust iomap_write_begin and
> > iomap_readpage_actor accordingly.
> >
> > * Functions iomap_read_inline_data, iomap_write_end_inline, and
> > iomap_dio_inline_actor currently all assume that we are operating on
> > page 0, and that iomap->inline_data points at the data at offset 0.
> > That's no longer the case for IOMAP_TAIL mappings. We need to look
> > only at the offset within the page / block there.
> >
> > * There are some asserts like the following int he code:
> >
> >   BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> >
> > Those should probably be tightened to check for block boundaries
> > instead of page boundaries, i.e. something like:
> >
> >   BUG_ON(size > i_blocksize(inode) -
> > offset_in_block(iomap->inline_data, inode));
> >
> >> Signed-off-by: Chao Yu <yuchao0@huawei.com>
> >> ---
> >>  fs/iomap.c | 4 +---
> >>  1 file changed, 1 insertion(+), 3 deletions(-)
> >>
> >> diff --git a/fs/iomap.c b/fs/iomap.c
> >> index 12654c2e78f8..d1c16b692d31 100644
> >> --- a/fs/iomap.c
> >> +++ b/fs/iomap.c
> >> @@ -264,13 +264,12 @@ static void
> >>  iomap_read_inline_data(struct inode *inode, struct page *page,
> >>                 struct iomap *iomap)
> >>  {
> >> -       size_t size = i_size_read(inode);
> >> +       size_t size = iomap->length;
> >
> > Function iomap_read_inline_data fills the entire page here, not only
> > the iomap->length part, so this is wrong. But see my comment above
> > about changing iomap_read_inline_data to read a single block above as
> > well.
> >
> >>         void *addr;
> >>
> >>         if (PageUptodate(page))
> >>                 return;
> >>
> >> -       BUG_ON(page->index);
> >>         BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> >>
> >>         addr = kmap_atomic(page);
> >> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >>         sector_t sector;
> >>
> >>         if (iomap->type == IOMAP_INLINE) {
> >> -               WARN_ON_ONCE(pos);
> >>                 iomap_read_inline_data(inode, page, iomap);
> >>                 return PAGE_SIZE;
> >>         }
> >
> > Those last two changes look right to me.
> >
> > Thanks,
> > Andreas
> > .
> >

At this point, can I ask how important this packing mechanism is to
you? I can see a point in implementing inline files, which help
because there tends to be a large number of very small files. But for
not-so-small files, is saving an extra block really worth the trouble,
especially given how cheap storage has become?

Thanks,
Andreas

  reply	other threads:[~2019-07-10 21:50 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-03  7:55 [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case Chao Yu
2019-07-08  2:12 ` Chao Yu
2019-07-08 16:03 ` Christoph Hellwig
2019-07-09 13:52   ` Chao Yu
2019-07-09 23:32 ` Andreas Grünbacher
2019-07-10 10:30   ` Chao Yu
2019-07-10 21:50     ` Andreas Grünbacher [this message]
2019-07-10 23:42       ` Gao Xiang
2019-07-11 13:06         ` Matthew Wilcox
2019-07-11 13:54           ` Gao Xiang
2019-07-11 14:15       ` Chao Yu
2019-07-11 12:28     ` Andreas Gruenbacher
2019-07-12  9:31       ` Chao Yu
2019-07-12 11:54         ` Andreas Gruenbacher
2019-07-15  9:26           ` Chao Yu
2019-07-17  2:58             ` Chao Yu
2019-07-18 12:33           ` Christoph Hellwig
2019-07-18 12:31       ` Christoph Hellwig
2019-07-18 13:27         ` Gao Xiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHpGcMJ_wPJf8KtF3xMP_28pe4Vq4XozFtmd2EuZ+RTqZKQxLA@mail.gmail.com \
    --to=andreas.gruenbacher@gmail.com \
    --cc=chao@kernel.org \
    --cc=darrick.wong@oracle.com \
    --cc=gaoxiang25@huawei.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=yuchao0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).