Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Carlos Maiolino <cmaiolino@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, hch@lst.de, adilger@dilger.ca,
	sandeen@redhat.com, david@fromorbit.com
Subject: Re: [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls
Date: Mon, 4 Feb 2019 16:11:47 +0100
Message-ID: <20190204151147.rra4n7k56ec4ndob@hades.usersys.redhat.com> (raw)
In-Reply-To: <20181205173650.GA8112@magnolia>

Hi, Sorry for the long delay Darrick.

> > +	fextent.fe_logical = 0;
> > +	fextent.fe_physical = 0;
> > +	fieinfo.fi_extents_max = 1;
> > +	fieinfo.fi_extents_mapped = 0;
> > +	fieinfo.fi_extents_start = &fextent;
> > +	fieinfo.fi_start = start;
> > +	fieinfo.fi_len = 1 << inode->i_blkbits;
> > +	fieinfo.fi_flags = 0;
> > +	fieinfo.fi_cb = fiemap_fill_kernel_extent;
> > +
> > +	error = inode->i_op->fiemap(inode, &fieinfo);
> > +
> > +	if (error)
> > +		return error;
> > +
> > +	if (fieinfo.fi_flags & (FIEMAP_EXTENT_UNKNOWN |
> > +				FIEMAP_EXTENT_ENCODED |
> > +				FIEMAP_EXTENT_DATA_INLINE |
> > +				FIEMAP_EXTENT_UNWRITTEN))
> > +		return -EINVAL;
> 
> AFAICT, three of the filesystems that support COW writes (xfs, ocfs2,
> and btrfs) do not return bmap results for files with shared blocks.
> This check here should include FIEMAP_EXTENT_SHARED since external
> overwrites of a COW file block are bad news on btrfs (and ocfs2 and
> xfs).

ok, np

> 
> > +
> > +	*block = (fextent.fe_physical +
> > +		  (start - fextent.fe_logical)) >> inode->i_blkbits;
> 
> Hmmm, so there's nothing here checking that the physical device fiemap
> reports is the same device that was passed into the mount.  This is
> trivially true for most of the filesystems that implement bmap and
> fiemap, but definitely not true for xfs or btrfs.  I would bet most
> userspace callers of bmap (since it's an ext2-era ioctl) make that
> assumption and don't even know how to find the device.

Makes sense.

> 
> On xfs, the bmap implementation won't return any results for realtime
> files, but it looks as though we suddenly will start doing that here,
> because in the new bmap implementation we will use fiemap, and fiemap
> reports extents without providing any context about which device they're
> on, and that context-less extent gets passed back to bmap_fiemap.
> 
> In any case, I think a better solution to the multi-device problem is to
> start returning device information via struct fiemap_extent, at least
> inside the kernel.  Use one of the reserved fields to declare a new
> '__u32 fe_device' field in struct fiemap_extent which can be the dev_t
> device number, and then you can check that against inode->i_sb->s_bdev
> to avoid returning results for the non-primary device of a multi-device
> filesystem.

I agree we should address it here, but I don't think fiemap_extent is the right
place for it, it is linked to the UAPI, and changing it is usually not a good
idea.

I think I got your idea anyway, but, what if, instead returning the bdev in
fiemap_extent, we instead, send a flag (via fi_flags) to the filesystem, to
idenfify a FIBMAP or a FIEMAP call, and let the filesystem decide what to do
with such information?

> 
> > +
> > +	return error;
> > +}
> > +
> >  /**
> >   *	bmap	- find a block number in a file
> >   *	@inode:  inode owning the block number being requested
> > @@ -1594,10 +1628,14 @@ EXPORT_SYMBOL(iput);
> >   */
> >  int bmap(struct inode *inode, sector_t *block)
> >  {
> > -	if (!inode->i_mapping->a_ops->bmap)
> > +	if (inode->i_op->fiemap)
> > +		return bmap_fiemap(inode, block);
> > +	else if (inode->i_mapping->a_ops->bmap)
> > +		*block = inode->i_mapping->a_ops->bmap(inode->i_mapping,
> > +						       *block);
> > +	else
> >  		return -EINVAL;
> 
> Waitaminute.  btrfs currently supports fiemap but not bmap, and now
> suddenly it will support this legacy interface they've never supported
> before.  Are they on board with this?
> 
> --D
> 
> >  
> > -	*block = inode->i_mapping->a_ops->bmap(inode->i_mapping, *block);
> >  	return 0;
> >  }
> >  EXPORT_SYMBOL(bmap);
> > diff --git a/fs/ioctl.c b/fs/ioctl.c
> > index 6086978fe01e..bfa59df332bf 100644
> > --- a/fs/ioctl.c
> > +++ b/fs/ioctl.c
> > @@ -116,6 +116,38 @@ int fiemap_fill_user_extent(struct fiemap_extent_info *fieinfo, u64 logical,
> >  	return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> >  }
> >  
> > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *fieinfo, u64 logical,
> > +			    u64 phys, u64 len, u32 flags)
> > +{
> > +	struct fiemap_extent *extent = fieinfo->fi_extents_start;
> > +
> > +	/* only count the extents */
> > +	if (fieinfo->fi_extents_max == 0) {
> > +		fieinfo->fi_extents_mapped++;
> > +		return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > +	}
> > +
> > +	if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max)
> > +		return 1;
> > +
> > +	if (flags & SET_UNKNOWN_FLAGS)
> > +		flags |= FIEMAP_EXTENT_UNKNOWN;
> > +	if (flags & SET_NO_UNMOUNTED_IO_FLAGS)
> > +		flags |= FIEMAP_EXTENT_ENCODED;
> > +	if (flags & SET_NOT_ALIGNED_FLAGS)
> > +		flags |= FIEMAP_EXTENT_NOT_ALIGNED;
> > +
> > +	extent->fe_logical = logical;
> > +	extent->fe_physical = phys;
> > +	extent->fe_length = len;
> > +	extent->fe_flags = flags;
> > +
> > +	fieinfo->fi_extents_mapped++;
> > +
> > +	if (fieinfo->fi_extents_mapped == fieinfo->fi_extents_max)
> > +		return 1;
> > +	return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > +}
> >  /**
> >   * fiemap_fill_next_extent - Fiemap helper function
> >   * @fieinfo:	Fiemap context passed into ->fiemap
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 7a434979201c..28bb523d532a 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1711,6 +1711,8 @@ struct fiemap_extent_info {
> >  	fiemap_fill_cb	fi_cb;
> >  };
> >  
> > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *info, u64 logical,
> > +			      u64 phys, u64 len, u32 flags);
> >  int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
> >  			    u64 phys, u64 len, u32 flags);
> >  int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
> > -- 
> > 2.17.2
> > 

-- 
Carlos

  parent reply index

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-05  9:17 [PATCH 00/10 V2] New ->fiemap infrastructure and ->bmap removal Carlos Maiolino
2018-12-05  9:17 ` [PATCH 01/10] fs: Enable bmap() function to properly return errors Carlos Maiolino
2018-12-05  9:17 ` [PATCH 02/10] cachefiles: drop direct usage of ->bmap method Carlos Maiolino
2018-12-05  9:17 ` [PATCH 03/10] ecryptfs: drop direct calls to ->bmap Carlos Maiolino
2018-12-05  9:17 ` [PATCH 04/10 V2] fibmap: Use bmap instead of ->bmap method in ioctl_fibmap Carlos Maiolino
2019-01-14 16:49   ` Christoph Hellwig
2019-02-04 11:34     ` Carlos Maiolino
2018-12-05  9:17 ` [PATCH 05/10] fs: Move start and length fiemap fields into fiemap_extent_info Carlos Maiolino
2019-01-14 16:50   ` Christoph Hellwig
2018-12-05  9:17 ` [PATCH 06/10] iomap: Remove length and start fields from iomap_fiemap Carlos Maiolino
2019-01-14 16:51   ` Christoph Hellwig
2018-12-05  9:17 ` [PATCH 07/10] fs: Use a void pointer to store fiemap_extent Carlos Maiolino
2019-01-14 16:53   ` Christoph Hellwig
2018-12-05  9:17 ` [PATCH 08/10 V2] fiemap: Use a callback to fill fiemap extents Carlos Maiolino
2019-01-14 16:53   ` Christoph Hellwig
2018-12-05  9:17 ` [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls Carlos Maiolino
2018-12-05 17:36   ` Darrick J. Wong
2018-12-07  9:09     ` Carlos Maiolino
2018-12-07 20:14       ` Andreas Dilger
2019-02-04 15:11     ` Carlos Maiolino [this message]
2019-02-04 18:27       ` Darrick J. Wong
2019-02-06 13:37         ` Carlos Maiolino
2019-02-06 20:44           ` Darrick J. Wong
2019-02-06 21:13             ` Andreas Dilger
2019-02-07  9:52               ` Carlos Maiolino
2019-02-08  8:43                 ` Christoph Hellwig
2019-02-11 12:57                   ` Christoph Hellwig
2019-02-11 16:21                     ` Carlos Maiolino
2019-02-11 16:48                       ` Christoph Hellwig
2019-02-07 11:59             ` Carlos Maiolino
2019-02-07 17:02               ` Darrick J. Wong
2019-02-07 21:25                 ` Andreas Dilger
2019-02-08  8:46                   ` Christoph Hellwig
2019-02-08 10:36                     ` Carlos Maiolino
2019-02-08 21:03                       ` Andreas Dilger
2019-02-08  9:08                   ` Carlos Maiolino
2019-02-08  9:03                 ` Carlos Maiolino
2019-02-07 12:36             ` Carlos Maiolino
2019-02-07 18:16               ` Darrick J. Wong
2019-02-08  8:58                 ` Carlos Maiolino
2019-02-06 21:04           ` Andreas Dilger
2019-01-14 16:56   ` Christoph Hellwig
2019-02-05  9:56     ` Carlos Maiolino
2019-02-05 18:25       ` Christoph Hellwig
2019-02-06  9:50         ` Carlos Maiolino
2018-12-05  9:17 ` [PATCH 10/10] xfs: Get rid of ->bmap Carlos Maiolino
2018-12-05 17:37   ` Darrick J. Wong
2018-12-06 13:06     ` Carlos Maiolino
2018-12-06 18:56 ` [PATCH 00/10 V2] New ->fiemap infrastructure and ->bmap removal Andreas Grünbacher
2018-12-07  9:34   ` Carlos Maiolino
2019-01-14 16:50     ` Christoph Hellwig
2019-01-14 17:56       ` Andreas Grünbacher
2019-01-14 17:58         ` Christoph Hellwig

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190204151147.rra4n7k56ec4ndob@hades.usersys.redhat.com \
    --to=cmaiolino@redhat.com \
    --cc=adilger@dilger.ca \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox