From: "Darrick J. Wong" <darrick.wong@oracle.com> To: Carlos Maiolino <cmaiolino@redhat.com> Cc: linux-fsdevel@vger.kernel.org, hch@lst.de, adilger@dilger.ca, sandeen@redhat.com, david@fromorbit.com Subject: Re: [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls Date: Mon, 4 Feb 2019 10:27:22 -0800 [thread overview] Message-ID: <20190204182722.GA32119@magnolia> (raw) In-Reply-To: <20190204151147.rra4n7k56ec4ndob@hades.usersys.redhat.com> On Mon, Feb 04, 2019 at 04:11:47PM +0100, Carlos Maiolino wrote: > Hi, Sorry for the long delay Darrick. > > > > + fextent.fe_logical = 0; > > > + fextent.fe_physical = 0; > > > + fieinfo.fi_extents_max = 1; > > > + fieinfo.fi_extents_mapped = 0; > > > + fieinfo.fi_extents_start = &fextent; > > > + fieinfo.fi_start = start; > > > + fieinfo.fi_len = 1 << inode->i_blkbits; > > > + fieinfo.fi_flags = 0; > > > + fieinfo.fi_cb = fiemap_fill_kernel_extent; > > > + > > > + error = inode->i_op->fiemap(inode, &fieinfo); > > > + > > > + if (error) > > > + return error; > > > + > > > + if (fieinfo.fi_flags & (FIEMAP_EXTENT_UNKNOWN | > > > + FIEMAP_EXTENT_ENCODED | > > > + FIEMAP_EXTENT_DATA_INLINE | > > > + FIEMAP_EXTENT_UNWRITTEN)) > > > + return -EINVAL; > > > > AFAICT, three of the filesystems that support COW writes (xfs, ocfs2, > > and btrfs) do not return bmap results for files with shared blocks. > > This check here should include FIEMAP_EXTENT_SHARED since external > > overwrites of a COW file block are bad news on btrfs (and ocfs2 and > > xfs). > > ok, np > > > > > > + > > > + *block = (fextent.fe_physical + > > > + (start - fextent.fe_logical)) >> inode->i_blkbits; > > > > Hmmm, so there's nothing here checking that the physical device fiemap > > reports is the same device that was passed into the mount. This is > > trivially true for most of the filesystems that implement bmap and > > fiemap, but definitely not true for xfs or btrfs. I would bet most > > userspace callers of bmap (since it's an ext2-era ioctl) make that > > assumption and don't even know how to find the device. > > Makes sense. > > > > > On xfs, the bmap implementation won't return any results for realtime > > files, but it looks as though we suddenly will start doing that here, > > because in the new bmap implementation we will use fiemap, and fiemap > > reports extents without providing any context about which device they're > > on, and that context-less extent gets passed back to bmap_fiemap. > > > > In any case, I think a better solution to the multi-device problem is to > > start returning device information via struct fiemap_extent, at least > > inside the kernel. Use one of the reserved fields to declare a new > > '__u32 fe_device' field in struct fiemap_extent which can be the dev_t > > device number, and then you can check that against inode->i_sb->s_bdev > > to avoid returning results for the non-primary device of a multi-device > > filesystem. > > I agree we should address it here, but I don't think fiemap_extent is the right > place for it, it is linked to the UAPI, and changing it is usually not a good > idea. Adding a FIEMAP_EXTENT flag or two to turn one of the fe_reserved fields into some sort of dev_t/per-device cookie should be fine. Userspace shouldn't be expecting any meaning in reserved areas. > I think I got your idea anyway, but, what if, instead returning the bdev in > fiemap_extent, we instead, send a flag (via fi_flags) to the filesystem, to > idenfify a FIBMAP or a FIEMAP call, and let the filesystem decide what to do > with such information? I don't like the idea of adding a FIEMAP_FLAG to distinguish callers. --D > > > > > + > > > + return error; > > > +} > > > + > > > /** > > > * bmap - find a block number in a file > > > * @inode: inode owning the block number being requested > > > @@ -1594,10 +1628,14 @@ EXPORT_SYMBOL(iput); > > > */ > > > int bmap(struct inode *inode, sector_t *block) > > > { > > > - if (!inode->i_mapping->a_ops->bmap) > > > + if (inode->i_op->fiemap) > > > + return bmap_fiemap(inode, block); > > > + else if (inode->i_mapping->a_ops->bmap) > > > + *block = inode->i_mapping->a_ops->bmap(inode->i_mapping, > > > + *block); > > > + else > > > return -EINVAL; > > > > Waitaminute. btrfs currently supports fiemap but not bmap, and now > > suddenly it will support this legacy interface they've never supported > > before. Are they on board with this? > > > > --D > > > > > > > > - *block = inode->i_mapping->a_ops->bmap(inode->i_mapping, *block); > > > return 0; > > > } > > > EXPORT_SYMBOL(bmap); > > > diff --git a/fs/ioctl.c b/fs/ioctl.c > > > index 6086978fe01e..bfa59df332bf 100644 > > > --- a/fs/ioctl.c > > > +++ b/fs/ioctl.c > > > @@ -116,6 +116,38 @@ int fiemap_fill_user_extent(struct fiemap_extent_info *fieinfo, u64 logical, > > > return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > > } > > > > > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *fieinfo, u64 logical, > > > + u64 phys, u64 len, u32 flags) > > > +{ > > > + struct fiemap_extent *extent = fieinfo->fi_extents_start; > > > + > > > + /* only count the extents */ > > > + if (fieinfo->fi_extents_max == 0) { > > > + fieinfo->fi_extents_mapped++; > > > + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > > + } > > > + > > > + if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max) > > > + return 1; > > > + > > > + if (flags & SET_UNKNOWN_FLAGS) > > > + flags |= FIEMAP_EXTENT_UNKNOWN; > > > + if (flags & SET_NO_UNMOUNTED_IO_FLAGS) > > > + flags |= FIEMAP_EXTENT_ENCODED; > > > + if (flags & SET_NOT_ALIGNED_FLAGS) > > > + flags |= FIEMAP_EXTENT_NOT_ALIGNED; > > > + > > > + extent->fe_logical = logical; > > > + extent->fe_physical = phys; > > > + extent->fe_length = len; > > > + extent->fe_flags = flags; > > > + > > > + fieinfo->fi_extents_mapped++; > > > + > > > + if (fieinfo->fi_extents_mapped == fieinfo->fi_extents_max) > > > + return 1; > > > + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > > +} > > > /** > > > * fiemap_fill_next_extent - Fiemap helper function > > > * @fieinfo: Fiemap context passed into ->fiemap > > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > > index 7a434979201c..28bb523d532a 100644 > > > --- a/include/linux/fs.h > > > +++ b/include/linux/fs.h > > > @@ -1711,6 +1711,8 @@ struct fiemap_extent_info { > > > fiemap_fill_cb fi_cb; > > > }; > > > > > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *info, u64 logical, > > > + u64 phys, u64 len, u32 flags); > > > int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical, > > > u64 phys, u64 len, u32 flags); > > > int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags); > > > -- > > > 2.17.2 > > > > > -- > Carlos
next prev parent reply other threads:[~2019-02-04 18:27 UTC|newest] Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-12-05 9:17 [PATCH 00/10 V2] New ->fiemap infrastructure and ->bmap removal Carlos Maiolino 2018-12-05 9:17 ` [PATCH 01/10] fs: Enable bmap() function to properly return errors Carlos Maiolino 2018-12-05 9:17 ` [PATCH 02/10] cachefiles: drop direct usage of ->bmap method Carlos Maiolino 2018-12-05 9:17 ` [PATCH 03/10] ecryptfs: drop direct calls to ->bmap Carlos Maiolino 2018-12-05 9:17 ` [PATCH 04/10 V2] fibmap: Use bmap instead of ->bmap method in ioctl_fibmap Carlos Maiolino 2019-01-14 16:49 ` Christoph Hellwig 2019-02-04 11:34 ` Carlos Maiolino 2018-12-05 9:17 ` [PATCH 05/10] fs: Move start and length fiemap fields into fiemap_extent_info Carlos Maiolino 2019-01-14 16:50 ` Christoph Hellwig 2018-12-05 9:17 ` [PATCH 06/10] iomap: Remove length and start fields from iomap_fiemap Carlos Maiolino 2019-01-14 16:51 ` Christoph Hellwig 2018-12-05 9:17 ` [PATCH 07/10] fs: Use a void pointer to store fiemap_extent Carlos Maiolino 2019-01-14 16:53 ` Christoph Hellwig 2018-12-05 9:17 ` [PATCH 08/10 V2] fiemap: Use a callback to fill fiemap extents Carlos Maiolino 2019-01-14 16:53 ` Christoph Hellwig 2018-12-05 9:17 ` [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls Carlos Maiolino 2018-12-05 17:36 ` Darrick J. Wong 2018-12-07 9:09 ` Carlos Maiolino 2018-12-07 20:14 ` Andreas Dilger 2019-02-04 15:11 ` Carlos Maiolino 2019-02-04 18:27 ` Darrick J. Wong [this message] 2019-02-06 13:37 ` Carlos Maiolino 2019-02-06 20:44 ` Darrick J. Wong 2019-02-06 21:13 ` Andreas Dilger 2019-02-07 9:52 ` Carlos Maiolino 2019-02-08 8:43 ` Christoph Hellwig 2019-02-11 12:57 ` Christoph Hellwig 2019-02-11 16:21 ` Carlos Maiolino 2019-02-11 16:48 ` Christoph Hellwig 2019-02-07 11:59 ` Carlos Maiolino 2019-02-07 17:02 ` Darrick J. Wong 2019-02-07 21:25 ` Andreas Dilger 2019-02-08 8:46 ` Christoph Hellwig 2019-02-08 10:36 ` Carlos Maiolino 2019-02-08 21:03 ` Andreas Dilger 2019-02-08 9:08 ` Carlos Maiolino 2019-02-08 9:03 ` Carlos Maiolino 2019-02-07 12:36 ` Carlos Maiolino 2019-02-07 18:16 ` Darrick J. Wong 2019-02-08 8:58 ` Carlos Maiolino 2019-02-06 21:04 ` Andreas Dilger 2019-01-14 16:56 ` Christoph Hellwig 2019-02-05 9:56 ` Carlos Maiolino 2019-02-05 18:25 ` Christoph Hellwig 2019-02-06 9:50 ` Carlos Maiolino 2018-12-05 9:17 ` [PATCH 10/10] xfs: Get rid of ->bmap Carlos Maiolino 2018-12-05 17:37 ` Darrick J. Wong 2018-12-06 13:06 ` Carlos Maiolino 2018-12-06 18:56 ` [PATCH 00/10 V2] New ->fiemap infrastructure and ->bmap removal Andreas Grünbacher 2018-12-07 9:34 ` Carlos Maiolino 2019-01-14 16:50 ` Christoph Hellwig 2019-01-14 17:56 ` Andreas Grünbacher 2019-01-14 17:58 ` Christoph Hellwig
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190204182722.GA32119@magnolia \ --to=darrick.wong@oracle.com \ --cc=adilger@dilger.ca \ --cc=cmaiolino@redhat.com \ --cc=david@fromorbit.com \ --cc=hch@lst.de \ --cc=linux-fsdevel@vger.kernel.org \ --cc=sandeen@redhat.com \ --subject='Re: [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).