From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47C38C282C4 for ; Mon, 4 Feb 2019 18:27:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0FAF52083B for ; Mon, 4 Feb 2019 18:27:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Cw8shgFg" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728056AbfBDS1j (ORCPT ); Mon, 4 Feb 2019 13:27:39 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:45526 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726585AbfBDS1j (ORCPT ); Mon, 4 Feb 2019 13:27:39 -0500 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x14INT49067666; Mon, 4 Feb 2019 18:27:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2018-07-02; bh=CtBJuq3xxnrmlub0qKebEl/qW1//gk2xDcmcO7y2G54=; b=Cw8shgFg82fHxewqzMqXXHZDAlqcMAetApnoRtL68pH8bGmSmshf1+yfB0gOCd4zDYir gwglsYSnAZPpTtvwBmZRIpxIsIe94qPP9qLKAcr/kDT2nRXLCTizICMIfoF0wSOcWiCE +J4SOVs9MX3BZcx2xCXVn7NenV48OYVBwfrxx3URJwx3amB2hUneNEXUBOM+NouSUrlN H0OH42luzFNmXsVcH8SFrkUT7C1IzaMVM1nMzOFDRtLgQoDxO+w71rZU2OoLzsoR1rA1 oE4SMKr80FWHFXKKx0EvJZHLu7rmzdMXQ/HvXQkX2ADjhbNpMDwrkPl90ETy3ya41EyN 8w== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2120.oracle.com with ESMTP id 2qd98mxuj9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 04 Feb 2019 18:27:30 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id x14IRSTV027711 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 4 Feb 2019 18:27:28 GMT Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x14IRROa026910; Mon, 4 Feb 2019 18:27:27 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 Feb 2019 18:27:27 +0000 Date: Mon, 4 Feb 2019 10:27:22 -0800 From: "Darrick J. Wong" To: Carlos Maiolino Cc: linux-fsdevel@vger.kernel.org, hch@lst.de, adilger@dilger.ca, sandeen@redhat.com, david@fromorbit.com Subject: Re: [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls Message-ID: <20190204182722.GA32119@magnolia> References: <20181205091728.29903-1-cmaiolino@redhat.com> <20181205091728.29903-10-cmaiolino@redhat.com> <20181205173650.GA8112@magnolia> <20190204151147.rra4n7k56ec4ndob@hades.usersys.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190204151147.rra4n7k56ec4ndob@hades.usersys.redhat.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9157 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1902040141 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Mon, Feb 04, 2019 at 04:11:47PM +0100, Carlos Maiolino wrote: > Hi, Sorry for the long delay Darrick. > > > > + fextent.fe_logical = 0; > > > + fextent.fe_physical = 0; > > > + fieinfo.fi_extents_max = 1; > > > + fieinfo.fi_extents_mapped = 0; > > > + fieinfo.fi_extents_start = &fextent; > > > + fieinfo.fi_start = start; > > > + fieinfo.fi_len = 1 << inode->i_blkbits; > > > + fieinfo.fi_flags = 0; > > > + fieinfo.fi_cb = fiemap_fill_kernel_extent; > > > + > > > + error = inode->i_op->fiemap(inode, &fieinfo); > > > + > > > + if (error) > > > + return error; > > > + > > > + if (fieinfo.fi_flags & (FIEMAP_EXTENT_UNKNOWN | > > > + FIEMAP_EXTENT_ENCODED | > > > + FIEMAP_EXTENT_DATA_INLINE | > > > + FIEMAP_EXTENT_UNWRITTEN)) > > > + return -EINVAL; > > > > AFAICT, three of the filesystems that support COW writes (xfs, ocfs2, > > and btrfs) do not return bmap results for files with shared blocks. > > This check here should include FIEMAP_EXTENT_SHARED since external > > overwrites of a COW file block are bad news on btrfs (and ocfs2 and > > xfs). > > ok, np > > > > > > + > > > + *block = (fextent.fe_physical + > > > + (start - fextent.fe_logical)) >> inode->i_blkbits; > > > > Hmmm, so there's nothing here checking that the physical device fiemap > > reports is the same device that was passed into the mount. This is > > trivially true for most of the filesystems that implement bmap and > > fiemap, but definitely not true for xfs or btrfs. I would bet most > > userspace callers of bmap (since it's an ext2-era ioctl) make that > > assumption and don't even know how to find the device. > > Makes sense. > > > > > On xfs, the bmap implementation won't return any results for realtime > > files, but it looks as though we suddenly will start doing that here, > > because in the new bmap implementation we will use fiemap, and fiemap > > reports extents without providing any context about which device they're > > on, and that context-less extent gets passed back to bmap_fiemap. > > > > In any case, I think a better solution to the multi-device problem is to > > start returning device information via struct fiemap_extent, at least > > inside the kernel. Use one of the reserved fields to declare a new > > '__u32 fe_device' field in struct fiemap_extent which can be the dev_t > > device number, and then you can check that against inode->i_sb->s_bdev > > to avoid returning results for the non-primary device of a multi-device > > filesystem. > > I agree we should address it here, but I don't think fiemap_extent is the right > place for it, it is linked to the UAPI, and changing it is usually not a good > idea. Adding a FIEMAP_EXTENT flag or two to turn one of the fe_reserved fields into some sort of dev_t/per-device cookie should be fine. Userspace shouldn't be expecting any meaning in reserved areas. > I think I got your idea anyway, but, what if, instead returning the bdev in > fiemap_extent, we instead, send a flag (via fi_flags) to the filesystem, to > idenfify a FIBMAP or a FIEMAP call, and let the filesystem decide what to do > with such information? I don't like the idea of adding a FIEMAP_FLAG to distinguish callers. --D > > > > > + > > > + return error; > > > +} > > > + > > > /** > > > * bmap - find a block number in a file > > > * @inode: inode owning the block number being requested > > > @@ -1594,10 +1628,14 @@ EXPORT_SYMBOL(iput); > > > */ > > > int bmap(struct inode *inode, sector_t *block) > > > { > > > - if (!inode->i_mapping->a_ops->bmap) > > > + if (inode->i_op->fiemap) > > > + return bmap_fiemap(inode, block); > > > + else if (inode->i_mapping->a_ops->bmap) > > > + *block = inode->i_mapping->a_ops->bmap(inode->i_mapping, > > > + *block); > > > + else > > > return -EINVAL; > > > > Waitaminute. btrfs currently supports fiemap but not bmap, and now > > suddenly it will support this legacy interface they've never supported > > before. Are they on board with this? > > > > --D > > > > > > > > - *block = inode->i_mapping->a_ops->bmap(inode->i_mapping, *block); > > > return 0; > > > } > > > EXPORT_SYMBOL(bmap); > > > diff --git a/fs/ioctl.c b/fs/ioctl.c > > > index 6086978fe01e..bfa59df332bf 100644 > > > --- a/fs/ioctl.c > > > +++ b/fs/ioctl.c > > > @@ -116,6 +116,38 @@ int fiemap_fill_user_extent(struct fiemap_extent_info *fieinfo, u64 logical, > > > return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > > } > > > > > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *fieinfo, u64 logical, > > > + u64 phys, u64 len, u32 flags) > > > +{ > > > + struct fiemap_extent *extent = fieinfo->fi_extents_start; > > > + > > > + /* only count the extents */ > > > + if (fieinfo->fi_extents_max == 0) { > > > + fieinfo->fi_extents_mapped++; > > > + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > > + } > > > + > > > + if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max) > > > + return 1; > > > + > > > + if (flags & SET_UNKNOWN_FLAGS) > > > + flags |= FIEMAP_EXTENT_UNKNOWN; > > > + if (flags & SET_NO_UNMOUNTED_IO_FLAGS) > > > + flags |= FIEMAP_EXTENT_ENCODED; > > > + if (flags & SET_NOT_ALIGNED_FLAGS) > > > + flags |= FIEMAP_EXTENT_NOT_ALIGNED; > > > + > > > + extent->fe_logical = logical; > > > + extent->fe_physical = phys; > > > + extent->fe_length = len; > > > + extent->fe_flags = flags; > > > + > > > + fieinfo->fi_extents_mapped++; > > > + > > > + if (fieinfo->fi_extents_mapped == fieinfo->fi_extents_max) > > > + return 1; > > > + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > > +} > > > /** > > > * fiemap_fill_next_extent - Fiemap helper function > > > * @fieinfo: Fiemap context passed into ->fiemap > > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > > index 7a434979201c..28bb523d532a 100644 > > > --- a/include/linux/fs.h > > > +++ b/include/linux/fs.h > > > @@ -1711,6 +1711,8 @@ struct fiemap_extent_info { > > > fiemap_fill_cb fi_cb; > > > }; > > > > > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *info, u64 logical, > > > + u64 phys, u64 len, u32 flags); > > > int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical, > > > u64 phys, u64 len, u32 flags); > > > int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags); > > > -- > > > 2.17.2 > > > > > -- > Carlos