From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5E55C282C4 for ; Mon, 4 Feb 2019 15:11:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 92F6D214DA for ; Mon, 4 Feb 2019 15:11:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729962AbfBDPL7 (ORCPT ); Mon, 4 Feb 2019 10:11:59 -0500 Received: from mail-wr1-f68.google.com ([209.85.221.68]:41169 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727217AbfBDPL6 (ORCPT ); Mon, 4 Feb 2019 10:11:58 -0500 Received: by mail-wr1-f68.google.com with SMTP id x10so175220wrs.8 for ; Mon, 04 Feb 2019 07:11:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=ysMRQ1horKB393BpwBAePO/jdhaqTTF0bWR82jT0FeM=; b=QORP5sQ/DLshmqa4tQQfa3v8Qx7vY/IwZOK+m1ZBCuSaNans4Kyfg5n3jrt5GcaSJW hU9TNir7FlOQ30eu2B07S7VHe5xO7ntAA26U6LAoMnLVYb9mNGxeom0mnzeKdH9nUE2P vZGx9O8Q2j7Z2vhQu2e4P3gGUhYnCZ1/htxtqIW1PG6cDRQYTdcVm1k2CBVIIVdMdxWE cD41dYrQsJEV1Q75jwMOvmzRZr2Qc5UImTaZNoCuhOQdyxlFBW+wJCUgtI6TRKfM29N/ w4S/+ZI4pRvhn6Y3XDxSOfKND3Vya9RXNerL6Xv+nlqoV/FBpMKb4b9K/HgjYSaFKqVQ lztQ== X-Gm-Message-State: AHQUAuaNlFwaoL0aMRunYtWcTyAwOFgmFYOugD3LGx8fXCajuIqRP0W2 +8YU3cyab+9eN21Y/qm+tjeoPsNjk+o= X-Google-Smtp-Source: AHgI3IYOdQQKgyNpEGMEElr6prVGdtgoMF1hkj/XUjJrrWG6TIdjTWMSSa9uAmtbxpRuW6u8ZS2aBA== X-Received: by 2002:adf:f410:: with SMTP id g16mr2592335wro.239.1549293116292; Mon, 04 Feb 2019 07:11:56 -0800 (PST) Received: from hades.usersys.redhat.com (ip-89-103-126-188.net.upcbroadband.cz. [89.103.126.188]) by smtp.gmail.com with ESMTPSA id l19sm7472253wme.21.2019.02.04.07.11.54 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 04 Feb 2019 07:11:55 -0800 (PST) Date: Mon, 4 Feb 2019 16:11:47 +0100 From: Carlos Maiolino To: "Darrick J. Wong" Cc: linux-fsdevel@vger.kernel.org, hch@lst.de, adilger@dilger.ca, sandeen@redhat.com, david@fromorbit.com Subject: Re: [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls Message-ID: <20190204151147.rra4n7k56ec4ndob@hades.usersys.redhat.com> References: <20181205091728.29903-1-cmaiolino@redhat.com> <20181205091728.29903-10-cmaiolino@redhat.com> <20181205173650.GA8112@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181205173650.GA8112@magnolia> User-Agent: NeoMutt/20180716 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Hi, Sorry for the long delay Darrick. > > + fextent.fe_logical = 0; > > + fextent.fe_physical = 0; > > + fieinfo.fi_extents_max = 1; > > + fieinfo.fi_extents_mapped = 0; > > + fieinfo.fi_extents_start = &fextent; > > + fieinfo.fi_start = start; > > + fieinfo.fi_len = 1 << inode->i_blkbits; > > + fieinfo.fi_flags = 0; > > + fieinfo.fi_cb = fiemap_fill_kernel_extent; > > + > > + error = inode->i_op->fiemap(inode, &fieinfo); > > + > > + if (error) > > + return error; > > + > > + if (fieinfo.fi_flags & (FIEMAP_EXTENT_UNKNOWN | > > + FIEMAP_EXTENT_ENCODED | > > + FIEMAP_EXTENT_DATA_INLINE | > > + FIEMAP_EXTENT_UNWRITTEN)) > > + return -EINVAL; > > AFAICT, three of the filesystems that support COW writes (xfs, ocfs2, > and btrfs) do not return bmap results for files with shared blocks. > This check here should include FIEMAP_EXTENT_SHARED since external > overwrites of a COW file block are bad news on btrfs (and ocfs2 and > xfs). ok, np > > > + > > + *block = (fextent.fe_physical + > > + (start - fextent.fe_logical)) >> inode->i_blkbits; > > Hmmm, so there's nothing here checking that the physical device fiemap > reports is the same device that was passed into the mount. This is > trivially true for most of the filesystems that implement bmap and > fiemap, but definitely not true for xfs or btrfs. I would bet most > userspace callers of bmap (since it's an ext2-era ioctl) make that > assumption and don't even know how to find the device. Makes sense. > > On xfs, the bmap implementation won't return any results for realtime > files, but it looks as though we suddenly will start doing that here, > because in the new bmap implementation we will use fiemap, and fiemap > reports extents without providing any context about which device they're > on, and that context-less extent gets passed back to bmap_fiemap. > > In any case, I think a better solution to the multi-device problem is to > start returning device information via struct fiemap_extent, at least > inside the kernel. Use one of the reserved fields to declare a new > '__u32 fe_device' field in struct fiemap_extent which can be the dev_t > device number, and then you can check that against inode->i_sb->s_bdev > to avoid returning results for the non-primary device of a multi-device > filesystem. I agree we should address it here, but I don't think fiemap_extent is the right place for it, it is linked to the UAPI, and changing it is usually not a good idea. I think I got your idea anyway, but, what if, instead returning the bdev in fiemap_extent, we instead, send a flag (via fi_flags) to the filesystem, to idenfify a FIBMAP or a FIEMAP call, and let the filesystem decide what to do with such information? > > > + > > + return error; > > +} > > + > > /** > > * bmap - find a block number in a file > > * @inode: inode owning the block number being requested > > @@ -1594,10 +1628,14 @@ EXPORT_SYMBOL(iput); > > */ > > int bmap(struct inode *inode, sector_t *block) > > { > > - if (!inode->i_mapping->a_ops->bmap) > > + if (inode->i_op->fiemap) > > + return bmap_fiemap(inode, block); > > + else if (inode->i_mapping->a_ops->bmap) > > + *block = inode->i_mapping->a_ops->bmap(inode->i_mapping, > > + *block); > > + else > > return -EINVAL; > > Waitaminute. btrfs currently supports fiemap but not bmap, and now > suddenly it will support this legacy interface they've never supported > before. Are they on board with this? > > --D > > > > > - *block = inode->i_mapping->a_ops->bmap(inode->i_mapping, *block); > > return 0; > > } > > EXPORT_SYMBOL(bmap); > > diff --git a/fs/ioctl.c b/fs/ioctl.c > > index 6086978fe01e..bfa59df332bf 100644 > > --- a/fs/ioctl.c > > +++ b/fs/ioctl.c > > @@ -116,6 +116,38 @@ int fiemap_fill_user_extent(struct fiemap_extent_info *fieinfo, u64 logical, > > return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > } > > > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *fieinfo, u64 logical, > > + u64 phys, u64 len, u32 flags) > > +{ > > + struct fiemap_extent *extent = fieinfo->fi_extents_start; > > + > > + /* only count the extents */ > > + if (fieinfo->fi_extents_max == 0) { > > + fieinfo->fi_extents_mapped++; > > + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > + } > > + > > + if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max) > > + return 1; > > + > > + if (flags & SET_UNKNOWN_FLAGS) > > + flags |= FIEMAP_EXTENT_UNKNOWN; > > + if (flags & SET_NO_UNMOUNTED_IO_FLAGS) > > + flags |= FIEMAP_EXTENT_ENCODED; > > + if (flags & SET_NOT_ALIGNED_FLAGS) > > + flags |= FIEMAP_EXTENT_NOT_ALIGNED; > > + > > + extent->fe_logical = logical; > > + extent->fe_physical = phys; > > + extent->fe_length = len; > > + extent->fe_flags = flags; > > + > > + fieinfo->fi_extents_mapped++; > > + > > + if (fieinfo->fi_extents_mapped == fieinfo->fi_extents_max) > > + return 1; > > + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > +} > > /** > > * fiemap_fill_next_extent - Fiemap helper function > > * @fieinfo: Fiemap context passed into ->fiemap > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 7a434979201c..28bb523d532a 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -1711,6 +1711,8 @@ struct fiemap_extent_info { > > fiemap_fill_cb fi_cb; > > }; > > > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *info, u64 logical, > > + u64 phys, u64 len, u32 flags); > > int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical, > > u64 phys, u64 len, u32 flags); > > int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags); > > -- > > 2.17.2 > > -- Carlos