From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-f194.google.com ([209.85.215.194]:43442 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726078AbeLGUO6 (ORCPT ); Fri, 7 Dec 2018 15:14:58 -0500 Received: by mail-pg1-f194.google.com with SMTP id v28so2172943pgk.10 for ; Fri, 07 Dec 2018 12:14:57 -0800 (PST) Content-Type: multipart/signed; boundary="Apple-Mail=_2FD9CB53-D409-4160-9607-8190CE548EC0"; protocol="application/pgp-signature"; micalg=pgp-sha256 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls From: Andreas Dilger In-Reply-To: <20181207090920.jrnwwb35n6r5546c@hades.usersys.redhat.com> Date: Fri, 7 Dec 2018 13:14:53 -0700 Cc: "Darrick J. Wong" , linux-fsdevel@vger.kernel.org, hch@lst.de, Eric Sandeen , david@fromorbit.com Message-Id: <7C812054-F457-4EDD-A527-71052AFD7C3B@dilger.ca> References: <20181205091728.29903-1-cmaiolino@redhat.com> <20181205091728.29903-10-cmaiolino@redhat.com> <20181205173650.GA8112@magnolia> <20181207090920.jrnwwb35n6r5546c@hades.usersys.redhat.com> To: Carlos Maiolino Sender: linux-fsdevel-owner@vger.kernel.org List-ID: --Apple-Mail=_2FD9CB53-D409-4160-9607-8190CE548EC0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Dec 7, 2018, at 2:09 AM, Carlos Maiolino = wrote: > On Wed, Dec 05, 2018 at 09:36:50AM -0800, Darrick J. Wong wrote: >>=20 >>> + >>> + *block =3D (fextent.fe_physical + >>> + (start - fextent.fe_logical)) >> inode->i_blkbits; >>=20 >> Hmmm, so there's nothing here checking that the physical device = fiemap >> reports is the same device that was passed into the mount. This is >> trivially true for most of the filesystems that implement bmap and >> fiemap, but definitely not true for xfs or btrfs. I would bet most >> userspace callers of bmap (since it's an ext2-era ioctl) make that >> assumption and don't even know how to find the device. >>=20 >> On xfs, the bmap implementation won't return any results for realtime >> files, but it looks as though we suddenly will start doing that here, >> because in the new bmap implementation we will use fiemap, and fiemap >> reports extents without providing any context about which device = they're >> on, and that context-less extent gets passed back to bmap_fiemap. >>=20 >> In any case, I think a better solution to the multi-device problem is = to >> start returning device information via struct fiemap_extent, at least >> inside the kernel. Use one of the reserved fields to declare a new >> '__u32 fe_device' field in struct fiemap_extent which can be the = dev_t >> device number, and then you can check that against = inode->i_sb->s_bdev >> to avoid returning results for the non-primary device of a = multi-device >> filesystem. We're using fe_device =3D fe_reserved[0] to return the device number for = Lustre. For Lustre, the "device number" is just a server index, since the = server's block device number is irrelevant on the client. For local filesystems, = it should return the 32-bit st_rdev device number to distinguish devices. I have patches for e2fsprogs filefrag to print the fe_device field. > Yes, you are right, I haven't thought about multi-dev filesystems. I = checked > btrfs code and it doesn't even support fibmap, exactly because of this = problem, > I wonder though, why it does support FIEMAP then, maybe because the = fiemap idea > isn't provide a way to userspace do IO directly to the device?! >=20 > I'm not sure if crossing dev information is enough though, I did a = quick read of > btrfs code, and the assumption that the block/extent location won't = change on > time, could lead to a time bomb in the future. I wonder if it wouldn't = maybe be > better to add a flag, to identify the usage type, and the filesystem = itself > would define if it should return anything, or not. Like, for example, = passing in > fieinfo->fi_flags, something like FIEMAP_FIBMAP, and check inside the = filesystem > for it. The FIEMAP_EXTENT_ENCODED flag is meant to be returned when the extent = cannot be read directly from the block device. For FIBMAP, it should return an error if ENCODED is set, since this file = is not suitable for directly booting a kernel (LILO is the only user of FIBMAP = that I'm aware of). The filefrag utility prefers to use FIEMAP for efficiency, = and only falls back to FIBMAP if FIEMAP fails. > =46rom my shallow understanding of btrfs, looks like the location of = the data, can > be moved inside the same device, so, even if the devices are the same = as you > suggested, there is no guarantee the offset will be the same. On a related note, btrfs also supports compressed extents, which isn't = handled by the current FIEMAP ioctl properly. There was a patch proposed ages = ago to add FIEMAP_EXTENT_DATA_COMPRESSED, but didn't _quite_ make it over the = finish line, https://www.spinics.net/lists/xfs/msg24629.html has the last = discussion. It added EXTENT_DATA_COMPRESSED and used fe_reserved64[0] as = fe_phys_length to return the on-disk extent size, while fe_length would rename to = fe_logi_length (with a compat macro) to still represented the logical extent length. #define FIEMAP_EXTENT_DATA_COMPRESSED 0x00000040 /* Data compressed by = fs. * Sets = EXTENT_DATA_ENCODED */ - __u64 fe_reserved64[2]; + __u64 fe_phys_length; /* physical length in bytes, undefined if + * DATA_COMPRESSED not set */ There was some discussion on whether there should be a second flag like FIEMAP_EXTENT_PHYS_LENGTH that is set when the fe_phys_length field is valid, independent of whether the data is compressed or not. Since you are reworking the FIEMAP code anyway, would you be interested = to revive that patch series? Cheers, Andreas --Apple-Mail=_2FD9CB53-D409-4160-9607-8190CE548EC0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iQIzBAEBCAAdFiEEDb73u6ZejP5ZMprvcqXauRfMH+AFAlwK1L0ACgkQcqXauRfM H+BhBA//QEz+uMRd0fZ+ei/lckQOxhL+VD+hOCJkFfvewwFT/gJvn6080VtEepPB eYZEbtE3b0+9jg1ew7WC7He731DEeJHx4M9wKErz6YqdtRdhErloEySXy3dJockV x/npZqPaR8LJmpSCO2TJzXPsH8+s0b0tDssptjX9CwBJFbj97D40BM+dxr3QYKMo iXB+aylu3+ZzABG2xuUUIKCEaoxJ8gES8UKtxCvCzCpreuWX/o58FL5280i/gVj/ 6KKuTsiwVRtq17tQ5RgzkwcLaIS7jz742MzNY2nUnWJBQULJ6itLZi/Ra/mPG8J0 zojNdpOhET2/TdpUqdWm0xzGJIqWKxI75ru40Pz8rUY4LEJIBX9fmKAyOfaOJOrP 4axMjF/qppRorjteJlvYDw6VYkRS0YQ6tl05P7d4guNh3WYMkraen5YmsisIXVgC 6u51U0+IEmCKRDYtz5JRrQzURpgx2FWzNwawINI8Vton01sGnaiY3Pj0MzHkaIov hlD+Wnt6xmr1nTShq9NQLwyPrt4FbujsjhnCDTTyBSurJ2YX9/Th7AbOVrsHTDJQ bfRNXwxezSiHZN5CpIWg1tIP4V+MCzX03tD0l5MEEt4wKptaxJQZPDFOz00eW80B PEaL3++rk4R4npd19ohvX15Zi0ULM5ChaBlRGj5TxHS9uXJoobQ= =Biq/ -----END PGP SIGNATURE----- --Apple-Mail=_2FD9CB53-D409-4160-9607-8190CE548EC0--