Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, hch@lst.de, adilger@dilger.ca,
	sandeen@redhat.com, david@fromorbit.com
Subject: Re: [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls
Date: Mon, 4 Feb 2019 10:27:22 -0800
Message-ID: <20190204182722.GA32119@magnolia> (raw)
In-Reply-To: <20190204151147.rra4n7k56ec4ndob@hades.usersys.redhat.com>

On Mon, Feb 04, 2019 at 04:11:47PM +0100, Carlos Maiolino wrote:
> Hi, Sorry for the long delay Darrick.
> 
> > > +	fextent.fe_logical = 0;
> > > +	fextent.fe_physical = 0;
> > > +	fieinfo.fi_extents_max = 1;
> > > +	fieinfo.fi_extents_mapped = 0;
> > > +	fieinfo.fi_extents_start = &fextent;
> > > +	fieinfo.fi_start = start;
> > > +	fieinfo.fi_len = 1 << inode->i_blkbits;
> > > +	fieinfo.fi_flags = 0;
> > > +	fieinfo.fi_cb = fiemap_fill_kernel_extent;
> > > +
> > > +	error = inode->i_op->fiemap(inode, &fieinfo);
> > > +
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	if (fieinfo.fi_flags & (FIEMAP_EXTENT_UNKNOWN |
> > > +				FIEMAP_EXTENT_ENCODED |
> > > +				FIEMAP_EXTENT_DATA_INLINE |
> > > +				FIEMAP_EXTENT_UNWRITTEN))
> > > +		return -EINVAL;
> > 
> > AFAICT, three of the filesystems that support COW writes (xfs, ocfs2,
> > and btrfs) do not return bmap results for files with shared blocks.
> > This check here should include FIEMAP_EXTENT_SHARED since external
> > overwrites of a COW file block are bad news on btrfs (and ocfs2 and
> > xfs).
> 
> ok, np
> 
> > 
> > > +
> > > +	*block = (fextent.fe_physical +
> > > +		  (start - fextent.fe_logical)) >> inode->i_blkbits;
> > 
> > Hmmm, so there's nothing here checking that the physical device fiemap
> > reports is the same device that was passed into the mount.  This is
> > trivially true for most of the filesystems that implement bmap and
> > fiemap, but definitely not true for xfs or btrfs.  I would bet most
> > userspace callers of bmap (since it's an ext2-era ioctl) make that
> > assumption and don't even know how to find the device.
> 
> Makes sense.
> 
> > 
> > On xfs, the bmap implementation won't return any results for realtime
> > files, but it looks as though we suddenly will start doing that here,
> > because in the new bmap implementation we will use fiemap, and fiemap
> > reports extents without providing any context about which device they're
> > on, and that context-less extent gets passed back to bmap_fiemap.
> > 
> > In any case, I think a better solution to the multi-device problem is to
> > start returning device information via struct fiemap_extent, at least
> > inside the kernel.  Use one of the reserved fields to declare a new
> > '__u32 fe_device' field in struct fiemap_extent which can be the dev_t
> > device number, and then you can check that against inode->i_sb->s_bdev
> > to avoid returning results for the non-primary device of a multi-device
> > filesystem.
> 
> I agree we should address it here, but I don't think fiemap_extent is the right
> place for it, it is linked to the UAPI, and changing it is usually not a good
> idea.

Adding a FIEMAP_EXTENT flag or two to turn one of the fe_reserved fields
into some sort of dev_t/per-device cookie should be fine.  Userspace
shouldn't be expecting any meaning in reserved areas.

> I think I got your idea anyway, but, what if, instead returning the bdev in
> fiemap_extent, we instead, send a flag (via fi_flags) to the filesystem, to
> idenfify a FIBMAP or a FIEMAP call, and let the filesystem decide what to do
> with such information?

I don't like the idea of adding a FIEMAP_FLAG to distinguish callers.

--D

> > 
> > > +
> > > +	return error;
> > > +}
> > > +
> > >  /**
> > >   *	bmap	- find a block number in a file
> > >   *	@inode:  inode owning the block number being requested
> > > @@ -1594,10 +1628,14 @@ EXPORT_SYMBOL(iput);
> > >   */
> > >  int bmap(struct inode *inode, sector_t *block)
> > >  {
> > > -	if (!inode->i_mapping->a_ops->bmap)
> > > +	if (inode->i_op->fiemap)
> > > +		return bmap_fiemap(inode, block);
> > > +	else if (inode->i_mapping->a_ops->bmap)
> > > +		*block = inode->i_mapping->a_ops->bmap(inode->i_mapping,
> > > +						       *block);
> > > +	else
> > >  		return -EINVAL;
> > 
> > Waitaminute.  btrfs currently supports fiemap but not bmap, and now
> > suddenly it will support this legacy interface they've never supported
> > before.  Are they on board with this?
> > 
> > --D
> > 
> > >  
> > > -	*block = inode->i_mapping->a_ops->bmap(inode->i_mapping, *block);
> > >  	return 0;
> > >  }
> > >  EXPORT_SYMBOL(bmap);
> > > diff --git a/fs/ioctl.c b/fs/ioctl.c
> > > index 6086978fe01e..bfa59df332bf 100644
> > > --- a/fs/ioctl.c
> > > +++ b/fs/ioctl.c
> > > @@ -116,6 +116,38 @@ int fiemap_fill_user_extent(struct fiemap_extent_info *fieinfo, u64 logical,
> > >  	return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > >  }
> > >  
> > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *fieinfo, u64 logical,
> > > +			    u64 phys, u64 len, u32 flags)
> > > +{
> > > +	struct fiemap_extent *extent = fieinfo->fi_extents_start;
> > > +
> > > +	/* only count the extents */
> > > +	if (fieinfo->fi_extents_max == 0) {
> > > +		fieinfo->fi_extents_mapped++;
> > > +		return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > > +	}
> > > +
> > > +	if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max)
> > > +		return 1;
> > > +
> > > +	if (flags & SET_UNKNOWN_FLAGS)
> > > +		flags |= FIEMAP_EXTENT_UNKNOWN;
> > > +	if (flags & SET_NO_UNMOUNTED_IO_FLAGS)
> > > +		flags |= FIEMAP_EXTENT_ENCODED;
> > > +	if (flags & SET_NOT_ALIGNED_FLAGS)
> > > +		flags |= FIEMAP_EXTENT_NOT_ALIGNED;
> > > +
> > > +	extent->fe_logical = logical;
> > > +	extent->fe_physical = phys;
> > > +	extent->fe_length = len;
> > > +	extent->fe_flags = flags;
> > > +
> > > +	fieinfo->fi_extents_mapped++;
> > > +
> > > +	if (fieinfo->fi_extents_mapped == fieinfo->fi_extents_max)
> > > +		return 1;
> > > +	return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > > +}
> > >  /**
> > >   * fiemap_fill_next_extent - Fiemap helper function
> > >   * @fieinfo:	Fiemap context passed into ->fiemap
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index 7a434979201c..28bb523d532a 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -1711,6 +1711,8 @@ struct fiemap_extent_info {
> > >  	fiemap_fill_cb	fi_cb;
> > >  };
> > >  
> > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *info, u64 logical,
> > > +			      u64 phys, u64 len, u32 flags);
> > >  int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
> > >  			    u64 phys, u64 len, u32 flags);
> > >  int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
> > > -- 
> > > 2.17.2
> > > 
> 
> -- 
> Carlos

  reply index

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-05  9:17 [PATCH 00/10 V2] New ->fiemap infrastructure and ->bmap removal Carlos Maiolino
2018-12-05  9:17 ` [PATCH 01/10] fs: Enable bmap() function to properly return errors Carlos Maiolino
2018-12-05  9:17 ` [PATCH 02/10] cachefiles: drop direct usage of ->bmap method Carlos Maiolino
2018-12-05  9:17 ` [PATCH 03/10] ecryptfs: drop direct calls to ->bmap Carlos Maiolino
2018-12-05  9:17 ` [PATCH 04/10 V2] fibmap: Use bmap instead of ->bmap method in ioctl_fibmap Carlos Maiolino
2019-01-14 16:49   ` Christoph Hellwig
2019-02-04 11:34     ` Carlos Maiolino
2018-12-05  9:17 ` [PATCH 05/10] fs: Move start and length fiemap fields into fiemap_extent_info Carlos Maiolino
2019-01-14 16:50   ` Christoph Hellwig
2018-12-05  9:17 ` [PATCH 06/10] iomap: Remove length and start fields from iomap_fiemap Carlos Maiolino
2019-01-14 16:51   ` Christoph Hellwig
2018-12-05  9:17 ` [PATCH 07/10] fs: Use a void pointer to store fiemap_extent Carlos Maiolino
2019-01-14 16:53   ` Christoph Hellwig
2018-12-05  9:17 ` [PATCH 08/10 V2] fiemap: Use a callback to fill fiemap extents Carlos Maiolino
2019-01-14 16:53   ` Christoph Hellwig
2018-12-05  9:17 ` [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls Carlos Maiolino
2018-12-05 17:36   ` Darrick J. Wong
2018-12-07  9:09     ` Carlos Maiolino
2018-12-07 20:14       ` Andreas Dilger
2019-02-04 15:11     ` Carlos Maiolino
2019-02-04 18:27       ` Darrick J. Wong [this message]
2019-02-06 13:37         ` Carlos Maiolino
2019-02-06 20:44           ` Darrick J. Wong
2019-02-06 21:13             ` Andreas Dilger
2019-02-07  9:52               ` Carlos Maiolino
2019-02-08  8:43                 ` Christoph Hellwig
2019-02-11 12:57                   ` Christoph Hellwig
2019-02-11 16:21                     ` Carlos Maiolino
2019-02-11 16:48                       ` Christoph Hellwig
2019-02-07 11:59             ` Carlos Maiolino
2019-02-07 17:02               ` Darrick J. Wong
2019-02-07 21:25                 ` Andreas Dilger
2019-02-08  8:46                   ` Christoph Hellwig
2019-02-08 10:36                     ` Carlos Maiolino
2019-02-08 21:03                       ` Andreas Dilger
2019-02-08  9:08                   ` Carlos Maiolino
2019-02-08  9:03                 ` Carlos Maiolino
2019-02-07 12:36             ` Carlos Maiolino
2019-02-07 18:16               ` Darrick J. Wong
2019-02-08  8:58                 ` Carlos Maiolino
2019-02-06 21:04           ` Andreas Dilger
2019-01-14 16:56   ` Christoph Hellwig
2019-02-05  9:56     ` Carlos Maiolino
2019-02-05 18:25       ` Christoph Hellwig
2019-02-06  9:50         ` Carlos Maiolino
2018-12-05  9:17 ` [PATCH 10/10] xfs: Get rid of ->bmap Carlos Maiolino
2018-12-05 17:37   ` Darrick J. Wong
2018-12-06 13:06     ` Carlos Maiolino
2018-12-06 18:56 ` [PATCH 00/10 V2] New ->fiemap infrastructure and ->bmap removal Andreas Grünbacher
2018-12-07  9:34   ` Carlos Maiolino
2019-01-14 16:50     ` Christoph Hellwig
2019-01-14 17:56       ` Andreas Grünbacher
2019-01-14 17:58         ` Christoph Hellwig

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190204182722.GA32119@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=adilger@dilger.ca \
    --cc=cmaiolino@redhat.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox