From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, hch@lst.de, adilger@dilger.ca,
sandeen@redhat.com, david@fromorbit.com
Subject: Re: [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls
Date: Mon, 4 Feb 2019 10:27:22 -0800 [thread overview]
Message-ID: <20190204182722.GA32119@magnolia> (raw)
In-Reply-To: <20190204151147.rra4n7k56ec4ndob@hades.usersys.redhat.com>
On Mon, Feb 04, 2019 at 04:11:47PM +0100, Carlos Maiolino wrote:
> Hi, Sorry for the long delay Darrick.
>
> > > + fextent.fe_logical = 0;
> > > + fextent.fe_physical = 0;
> > > + fieinfo.fi_extents_max = 1;
> > > + fieinfo.fi_extents_mapped = 0;
> > > + fieinfo.fi_extents_start = &fextent;
> > > + fieinfo.fi_start = start;
> > > + fieinfo.fi_len = 1 << inode->i_blkbits;
> > > + fieinfo.fi_flags = 0;
> > > + fieinfo.fi_cb = fiemap_fill_kernel_extent;
> > > +
> > > + error = inode->i_op->fiemap(inode, &fieinfo);
> > > +
> > > + if (error)
> > > + return error;
> > > +
> > > + if (fieinfo.fi_flags & (FIEMAP_EXTENT_UNKNOWN |
> > > + FIEMAP_EXTENT_ENCODED |
> > > + FIEMAP_EXTENT_DATA_INLINE |
> > > + FIEMAP_EXTENT_UNWRITTEN))
> > > + return -EINVAL;
> >
> > AFAICT, three of the filesystems that support COW writes (xfs, ocfs2,
> > and btrfs) do not return bmap results for files with shared blocks.
> > This check here should include FIEMAP_EXTENT_SHARED since external
> > overwrites of a COW file block are bad news on btrfs (and ocfs2 and
> > xfs).
>
> ok, np
>
> >
> > > +
> > > + *block = (fextent.fe_physical +
> > > + (start - fextent.fe_logical)) >> inode->i_blkbits;
> >
> > Hmmm, so there's nothing here checking that the physical device fiemap
> > reports is the same device that was passed into the mount. This is
> > trivially true for most of the filesystems that implement bmap and
> > fiemap, but definitely not true for xfs or btrfs. I would bet most
> > userspace callers of bmap (since it's an ext2-era ioctl) make that
> > assumption and don't even know how to find the device.
>
> Makes sense.
>
> >
> > On xfs, the bmap implementation won't return any results for realtime
> > files, but it looks as though we suddenly will start doing that here,
> > because in the new bmap implementation we will use fiemap, and fiemap
> > reports extents without providing any context about which device they're
> > on, and that context-less extent gets passed back to bmap_fiemap.
> >
> > In any case, I think a better solution to the multi-device problem is to
> > start returning device information via struct fiemap_extent, at least
> > inside the kernel. Use one of the reserved fields to declare a new
> > '__u32 fe_device' field in struct fiemap_extent which can be the dev_t
> > device number, and then you can check that against inode->i_sb->s_bdev
> > to avoid returning results for the non-primary device of a multi-device
> > filesystem.
>
> I agree we should address it here, but I don't think fiemap_extent is the right
> place for it, it is linked to the UAPI, and changing it is usually not a good
> idea.
Adding a FIEMAP_EXTENT flag or two to turn one of the fe_reserved fields
into some sort of dev_t/per-device cookie should be fine. Userspace
shouldn't be expecting any meaning in reserved areas.
> I think I got your idea anyway, but, what if, instead returning the bdev in
> fiemap_extent, we instead, send a flag (via fi_flags) to the filesystem, to
> idenfify a FIBMAP or a FIEMAP call, and let the filesystem decide what to do
> with such information?
I don't like the idea of adding a FIEMAP_FLAG to distinguish callers.
--D
> >
> > > +
> > > + return error;
> > > +}
> > > +
> > > /**
> > > * bmap - find a block number in a file
> > > * @inode: inode owning the block number being requested
> > > @@ -1594,10 +1628,14 @@ EXPORT_SYMBOL(iput);
> > > */
> > > int bmap(struct inode *inode, sector_t *block)
> > > {
> > > - if (!inode->i_mapping->a_ops->bmap)
> > > + if (inode->i_op->fiemap)
> > > + return bmap_fiemap(inode, block);
> > > + else if (inode->i_mapping->a_ops->bmap)
> > > + *block = inode->i_mapping->a_ops->bmap(inode->i_mapping,
> > > + *block);
> > > + else
> > > return -EINVAL;
> >
> > Waitaminute. btrfs currently supports fiemap but not bmap, and now
> > suddenly it will support this legacy interface they've never supported
> > before. Are they on board with this?
> >
> > --D
> >
> > >
> > > - *block = inode->i_mapping->a_ops->bmap(inode->i_mapping, *block);
> > > return 0;
> > > }
> > > EXPORT_SYMBOL(bmap);
> > > diff --git a/fs/ioctl.c b/fs/ioctl.c
> > > index 6086978fe01e..bfa59df332bf 100644
> > > --- a/fs/ioctl.c
> > > +++ b/fs/ioctl.c
> > > @@ -116,6 +116,38 @@ int fiemap_fill_user_extent(struct fiemap_extent_info *fieinfo, u64 logical,
> > > return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > > }
> > >
> > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *fieinfo, u64 logical,
> > > + u64 phys, u64 len, u32 flags)
> > > +{
> > > + struct fiemap_extent *extent = fieinfo->fi_extents_start;
> > > +
> > > + /* only count the extents */
> > > + if (fieinfo->fi_extents_max == 0) {
> > > + fieinfo->fi_extents_mapped++;
> > > + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > > + }
> > > +
> > > + if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max)
> > > + return 1;
> > > +
> > > + if (flags & SET_UNKNOWN_FLAGS)
> > > + flags |= FIEMAP_EXTENT_UNKNOWN;
> > > + if (flags & SET_NO_UNMOUNTED_IO_FLAGS)
> > > + flags |= FIEMAP_EXTENT_ENCODED;
> > > + if (flags & SET_NOT_ALIGNED_FLAGS)
> > > + flags |= FIEMAP_EXTENT_NOT_ALIGNED;
> > > +
> > > + extent->fe_logical = logical;
> > > + extent->fe_physical = phys;
> > > + extent->fe_length = len;
> > > + extent->fe_flags = flags;
> > > +
> > > + fieinfo->fi_extents_mapped++;
> > > +
> > > + if (fieinfo->fi_extents_mapped == fieinfo->fi_extents_max)
> > > + return 1;
> > > + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > > +}
> > > /**
> > > * fiemap_fill_next_extent - Fiemap helper function
> > > * @fieinfo: Fiemap context passed into ->fiemap
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index 7a434979201c..28bb523d532a 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -1711,6 +1711,8 @@ struct fiemap_extent_info {
> > > fiemap_fill_cb fi_cb;
> > > };
> > >
> > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *info, u64 logical,
> > > + u64 phys, u64 len, u32 flags);
> > > int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
> > > u64 phys, u64 len, u32 flags);
> > > int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
> > > --
> > > 2.17.2
> > >
>
> --
> Carlos
next prev parent reply other threads:[~2019-02-04 18:27 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-05 9:17 [PATCH 00/10 V2] New ->fiemap infrastructure and ->bmap removal Carlos Maiolino
2018-12-05 9:17 ` [PATCH 01/10] fs: Enable bmap() function to properly return errors Carlos Maiolino
2018-12-05 9:17 ` [PATCH 02/10] cachefiles: drop direct usage of ->bmap method Carlos Maiolino
2018-12-05 9:17 ` [PATCH 03/10] ecryptfs: drop direct calls to ->bmap Carlos Maiolino
2018-12-05 9:17 ` [PATCH 04/10 V2] fibmap: Use bmap instead of ->bmap method in ioctl_fibmap Carlos Maiolino
2019-01-14 16:49 ` Christoph Hellwig
2019-02-04 11:34 ` Carlos Maiolino
2018-12-05 9:17 ` [PATCH 05/10] fs: Move start and length fiemap fields into fiemap_extent_info Carlos Maiolino
2019-01-14 16:50 ` Christoph Hellwig
2018-12-05 9:17 ` [PATCH 06/10] iomap: Remove length and start fields from iomap_fiemap Carlos Maiolino
2019-01-14 16:51 ` Christoph Hellwig
2018-12-05 9:17 ` [PATCH 07/10] fs: Use a void pointer to store fiemap_extent Carlos Maiolino
2019-01-14 16:53 ` Christoph Hellwig
2018-12-05 9:17 ` [PATCH 08/10 V2] fiemap: Use a callback to fill fiemap extents Carlos Maiolino
2019-01-14 16:53 ` Christoph Hellwig
2018-12-05 9:17 ` [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls Carlos Maiolino
2018-12-05 17:36 ` Darrick J. Wong
2018-12-07 9:09 ` Carlos Maiolino
2018-12-07 20:14 ` Andreas Dilger
2019-02-04 15:11 ` Carlos Maiolino
2019-02-04 18:27 ` Darrick J. Wong [this message]
2019-02-06 13:37 ` Carlos Maiolino
2019-02-06 20:44 ` Darrick J. Wong
2019-02-06 21:13 ` Andreas Dilger
2019-02-07 9:52 ` Carlos Maiolino
2019-02-08 8:43 ` Christoph Hellwig
2019-02-11 12:57 ` Christoph Hellwig
2019-02-11 16:21 ` Carlos Maiolino
2019-02-11 16:48 ` Christoph Hellwig
2019-02-07 11:59 ` Carlos Maiolino
2019-02-07 17:02 ` Darrick J. Wong
2019-02-07 21:25 ` Andreas Dilger
2019-02-08 8:46 ` Christoph Hellwig
2019-02-08 10:36 ` Carlos Maiolino
2019-02-08 21:03 ` Andreas Dilger
2019-02-08 9:08 ` Carlos Maiolino
2019-02-08 9:03 ` Carlos Maiolino
2019-02-07 12:36 ` Carlos Maiolino
2019-02-07 18:16 ` Darrick J. Wong
2019-02-08 8:58 ` Carlos Maiolino
2019-02-06 21:04 ` Andreas Dilger
2019-01-14 16:56 ` Christoph Hellwig
2019-02-05 9:56 ` Carlos Maiolino
2019-02-05 18:25 ` Christoph Hellwig
2019-02-06 9:50 ` Carlos Maiolino
2018-12-05 9:17 ` [PATCH 10/10] xfs: Get rid of ->bmap Carlos Maiolino
2018-12-05 17:37 ` Darrick J. Wong
2018-12-06 13:06 ` Carlos Maiolino
2018-12-06 18:56 ` [PATCH 00/10 V2] New ->fiemap infrastructure and ->bmap removal Andreas Grünbacher
2018-12-07 9:34 ` Carlos Maiolino
2019-01-14 16:50 ` Christoph Hellwig
2019-01-14 17:56 ` Andreas Grünbacher
2019-01-14 17:58 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190204182722.GA32119@magnolia \
--to=darrick.wong@oracle.com \
--cc=adilger@dilger.ca \
--cc=cmaiolino@redhat.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=sandeen@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).