* [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap @ 2018-08-30 16:10 Eric Sandeen 2018-08-30 16:25 ` Christoph Hellwig 0 siblings, 1 reply; 22+ messages in thread From: Eric Sandeen @ 2018-08-30 16:10 UTC (permalink / raw) To: linux-xfs; +Cc: Christoph Hellwig We disabled FIBMAP/bmap calls for reflinked files because swap will then writes directly to the blocks, bypassing COW mechanisms, and breaking copy on write. As noted in commit db1327b, this restriction also breaks bootloaders that want to use the FIBMAP ioctl. Rather than disabling the entire mapping interface for everyone just because swapon may abuse the info, teach xfs_iomap_swapfile_activate() to reject activation for reflinked files, and re-enable the FIBMAP interface. Signed-off-by: Eric Sandeen <sandeen@redhat.com> --- yes, other FIBMAP users could also abuse this information, but disallowing mapping because somebody might do something dumb with it seems very heavy handed... thoughts? diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 8eb3ba3d4d00..b75a7f37bdd9 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -1382,15 +1382,10 @@ xfs_vm_bmap( trace_xfs_vm_bmap(ip); /* - * The swap code (ab-)uses ->bmap to get a block mapping and then - * bypasses the file system for actual I/O. We really can't allow - * that on reflinks inodes, so we have to skip out here. And yes, - * 0 is the magic code for a bmap error. - * * Since we don't pass back blockdev info, we can't return bmap - * information for rt files either. + * information for rt files. */ - if (xfs_is_reflink_inode(ip) || XFS_IS_REALTIME_INODE(ip)) + if (XFS_IS_REALTIME_INODE(ip)) return 0; return iomap_bmap(mapping, block, &xfs_iomap_ops); } @@ -1477,7 +1472,13 @@ xfs_iomap_swapfile_activate( struct file *swap_file, sector_t *span) { - sis->bdev = xfs_find_bdev_for_inode(file_inode(swap_file)); + struct inode *inode = file_inode(swap_file); + + /* Cannot allow swap to write directly to reflinked files/blocks */ + if (xfs_is_reflink_inode(XFS_I(inode))) + return -EOPNOTSUPP; + + sis->bdev = xfs_find_bdev_for_inode(inode); return iomap_swapfile_activate(sis, swap_file, span, &xfs_iomap_ops); } ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-30 16:10 [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap Eric Sandeen @ 2018-08-30 16:25 ` Christoph Hellwig 2018-08-30 16:31 ` Eric Sandeen 0 siblings, 1 reply; 22+ messages in thread From: Christoph Hellwig @ 2018-08-30 16:25 UTC (permalink / raw) To: Eric Sandeen; +Cc: linux-xfs, Christoph Hellwig On Thu, Aug 30, 2018 at 11:10:05AM -0500, Eric Sandeen wrote: > We disabled FIBMAP/bmap calls for reflinked files because swap will then > writes directly to the blocks, bypassing COW mechanisms, and breaking > copy on write. As noted in commit db1327b, this restriction also breaks > bootloaders that want to use the FIBMAP ioctl. > > Rather than disabling the entire mapping interface for everyone just > because swapon may abuse the info, teach xfs_iomap_swapfile_activate() > to reject activation for reflinked files, and re-enable the FIBMAP > interface. Every use of the feature is an abuse, and that includes bootloaders. By the time you've obtained the information it can (and often will) be stale. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-30 16:25 ` Christoph Hellwig @ 2018-08-30 16:31 ` Eric Sandeen 2018-08-30 16:36 ` Christoph Hellwig 0 siblings, 1 reply; 22+ messages in thread From: Eric Sandeen @ 2018-08-30 16:31 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-xfs On 8/30/18 11:25 AM, Christoph Hellwig wrote: > On Thu, Aug 30, 2018 at 11:10:05AM -0500, Eric Sandeen wrote: >> We disabled FIBMAP/bmap calls for reflinked files because swap will then >> writes directly to the blocks, bypassing COW mechanisms, and breaking >> copy on write. As noted in commit db1327b, this restriction also breaks >> bootloaders that want to use the FIBMAP ioctl. >> >> Rather than disabling the entire mapping interface for everyone just >> because swapon may abuse the info, teach xfs_iomap_swapfile_activate() >> to reject activation for reflinked files, and re-enable the FIBMAP >> interface. > > Every use of the feature is an abuse, and that includes bootloaders. > By the time you've obtained the information it can (and often will) > be stale. > That's no reason to uniquely disallow it for reflinked files, though; the problem is universal. It's true for fiemap as well. So I'm not sure that's an argument against the patch? -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-30 16:31 ` Eric Sandeen @ 2018-08-30 16:36 ` Christoph Hellwig 2018-08-30 16:35 ` Eric Sandeen 0 siblings, 1 reply; 22+ messages in thread From: Christoph Hellwig @ 2018-08-30 16:36 UTC (permalink / raw) To: Eric Sandeen; +Cc: Christoph Hellwig, linux-xfs On Thu, Aug 30, 2018 at 11:31:40AM -0500, Eric Sandeen wrote: > That's no reason to uniquely disallow it for reflinked files, though; > the problem is universal. It's true for fiemap as well. So I'm not sure > that's an argument against the patch? fiemap at least tells you an extent is shared, bmap does not. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-30 16:36 ` Christoph Hellwig @ 2018-08-30 16:35 ` Eric Sandeen 2018-08-30 18:02 ` Brian Foster 0 siblings, 1 reply; 22+ messages in thread From: Eric Sandeen @ 2018-08-30 16:35 UTC (permalink / raw) To: Christoph Hellwig, Eric Sandeen; +Cc: linux-xfs On 8/30/18 11:36 AM, Christoph Hellwig wrote: > On Thu, Aug 30, 2018 at 11:31:40AM -0500, Eric Sandeen wrote: >> That's no reason to uniquely disallow it for reflinked files, though; >> the problem is universal. It's true for fiemap as well. So I'm not sure >> that's an argument against the patch? > > fiemap at least tells you an extent is shared, bmap does not. yes, so bmap is clearly the wrong interface to use if you want to write directly to a file's blocks. But if you know enough to check the fiemap shared flag, you know enough to not use fibmap for that purpose... -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-30 16:35 ` Eric Sandeen @ 2018-08-30 18:02 ` Brian Foster 2018-08-30 18:28 ` Darrick J. Wong 0 siblings, 1 reply; 22+ messages in thread From: Brian Foster @ 2018-08-30 18:02 UTC (permalink / raw) To: Eric Sandeen; +Cc: Christoph Hellwig, Eric Sandeen, linux-xfs On Thu, Aug 30, 2018 at 11:35:46AM -0500, Eric Sandeen wrote: > On 8/30/18 11:36 AM, Christoph Hellwig wrote: > > On Thu, Aug 30, 2018 at 11:31:40AM -0500, Eric Sandeen wrote: > >> That's no reason to uniquely disallow it for reflinked files, though; > >> the problem is universal. It's true for fiemap as well. So I'm not sure > >> that's an argument against the patch? > > > > fiemap at least tells you an extent is shared, bmap does not. > > yes, so bmap is clearly the wrong interface to use if you want to > write directly to a file's blocks. But if you know enough to check > the fiemap shared flag, you know enough to not use fibmap for that purpose... > FWIW, this patch seems reasonable to me. To Christoph's point, I don't think either interface really grants license to write to the underlying blocks, so either way it's technically being abused for this purpose. Unless there's a clear way to return an error for a particular type of file, I think it's reasonable behavior for fibmap to expose the data it supports (i.e., block maps) and drop the data it doesn't (reflink state). Brian > -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-30 18:02 ` Brian Foster @ 2018-08-30 18:28 ` Darrick J. Wong 2018-08-30 18:51 ` Eric Sandeen 2018-08-31 6:28 ` Christoph Hellwig 0 siblings, 2 replies; 22+ messages in thread From: Darrick J. Wong @ 2018-08-30 18:28 UTC (permalink / raw) To: Brian Foster; +Cc: Eric Sandeen, Christoph Hellwig, Eric Sandeen, linux-xfs On Thu, Aug 30, 2018 at 02:02:05PM -0400, Brian Foster wrote: > On Thu, Aug 30, 2018 at 11:35:46AM -0500, Eric Sandeen wrote: > > On 8/30/18 11:36 AM, Christoph Hellwig wrote: > > > On Thu, Aug 30, 2018 at 11:31:40AM -0500, Eric Sandeen wrote: > > >> That's no reason to uniquely disallow it for reflinked files, though; > > >> the problem is universal. It's true for fiemap as well. So I'm not sure > > >> that's an argument against the patch? > > > > > > fiemap at least tells you an extent is shared, bmap does not. > > > > yes, so bmap is clearly the wrong interface to use if you want to > > write directly to a file's blocks. But if you know enough to check > > the fiemap shared flag, you know enough to not use fibmap for that purpose... > > > > FWIW, this patch seems reasonable to me. To Christoph's point, I don't > think either interface really grants license to write to the underlying > blocks, so either way it's technically being abused for this purpose. > Unless there's a clear way to return an error for a particular type of > file, I think it's reasonable behavior for fibmap to expose the data it > supports (i.e., block maps) and drop the data it doesn't (reflink > state). But shared block status isn't something that can be dropped lightly. If you write to a shared block without realizing it, you'll corrupt every other file that shares the block. I prefer to have FIBMAP return errors to *cough* encourage people to use FIEMAP. If code are going to abuse the FI[BE]MAP interface they could at least abuse the one that gives it enough context to avoid fs corruption. (A proper fs driver would be preferable, though very difficult). Granted, grub's blocklist code doesn't seem to check for shared blocks when it writes grubenv.... yuck, though TBH I don't have the eye budget to spend on digging through grub2. Frankly I think FIBMAP comes verrry close to "this API is unfixably stupid and shouldn't be enabled for new use cases and should go away some day". --D > Brian > > > -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-30 18:28 ` Darrick J. Wong @ 2018-08-30 18:51 ` Eric Sandeen 2018-08-30 19:39 ` Brian Foster 2018-08-31 0:11 ` Dave Chinner 2018-08-31 6:28 ` Christoph Hellwig 1 sibling, 2 replies; 22+ messages in thread From: Eric Sandeen @ 2018-08-30 18:51 UTC (permalink / raw) To: Darrick J. Wong, Brian Foster; +Cc: Christoph Hellwig, Eric Sandeen, linux-xfs On 8/30/18 1:28 PM, Darrick J. Wong wrote: > On Thu, Aug 30, 2018 at 02:02:05PM -0400, Brian Foster wrote: >> On Thu, Aug 30, 2018 at 11:35:46AM -0500, Eric Sandeen wrote: >>> On 8/30/18 11:36 AM, Christoph Hellwig wrote: >>>> On Thu, Aug 30, 2018 at 11:31:40AM -0500, Eric Sandeen wrote: >>>>> That's no reason to uniquely disallow it for reflinked files, though; >>>>> the problem is universal. It's true for fiemap as well. So I'm not sure >>>>> that's an argument against the patch? >>>> >>>> fiemap at least tells you an extent is shared, bmap does not. >>> >>> yes, so bmap is clearly the wrong interface to use if you want to >>> write directly to a file's blocks. But if you know enough to check >>> the fiemap shared flag, you know enough to not use fibmap for that purpose... >>> >> >> FWIW, this patch seems reasonable to me. To Christoph's point, I don't >> think either interface really grants license to write to the underlying >> blocks, so either way it's technically being abused for this purpose. >> Unless there's a clear way to return an error for a particular type of >> file, I think it's reasonable behavior for fibmap to expose the data it >> supports (i.e., block maps) and drop the data it doesn't (reflink >> state). > > But shared block status isn't something that can be dropped lightly. If > you write to a shared block without realizing it, you'll corrupt every > other file that shares the block. But there is no circumstance under which it is safe to write to a mapped block no matter how you mapped it, tbh. This is just singling out one case of many, and it seems capricious to me. Other than the blast zone being possibly larger for reflinked files ... but I just don't think that's our judgement to make here. > I prefer to have FIBMAP return errors to *cough* encourage people to use > FIEMAP. If code are going to abuse the FI[BE]MAP interface they could > at least abuse the one that gives it enough context to avoid fs > corruption. (A proper fs driver would be preferable, though very > difficult). > > Granted, grub's blocklist code doesn't seem to check for shared blocks > when it writes grubenv.... yuck, though TBH I don't have the eye budget > to spend on digging through grub2. grub2 doesn't even use fiemap or fibmap so I'm not sure it's relevant to this decision... > Frankly I think FIBMAP comes verrry > close to "this API is unfixably stupid and shouldn't be enabled for new > use cases and should go away some day". So instead if anyone asks we'll just give them a successful response which is indistinguishable from a hole. :( I'm pretty strongly of the opinion that we should give the user what they ask for, and not try to intuit their motives for the question. -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-30 18:51 ` Eric Sandeen @ 2018-08-30 19:39 ` Brian Foster 2018-08-30 19:47 ` Eric Sandeen 2018-08-31 0:11 ` Dave Chinner 1 sibling, 1 reply; 22+ messages in thread From: Brian Foster @ 2018-08-30 19:39 UTC (permalink / raw) To: Eric Sandeen; +Cc: Darrick J. Wong, Christoph Hellwig, Eric Sandeen, linux-xfs On Thu, Aug 30, 2018 at 01:51:56PM -0500, Eric Sandeen wrote: > On 8/30/18 1:28 PM, Darrick J. Wong wrote: > > On Thu, Aug 30, 2018 at 02:02:05PM -0400, Brian Foster wrote: > >> On Thu, Aug 30, 2018 at 11:35:46AM -0500, Eric Sandeen wrote: > >>> On 8/30/18 11:36 AM, Christoph Hellwig wrote: > >>>> On Thu, Aug 30, 2018 at 11:31:40AM -0500, Eric Sandeen wrote: > >>>>> That's no reason to uniquely disallow it for reflinked files, though; > >>>>> the problem is universal. It's true for fiemap as well. So I'm not sure > >>>>> that's an argument against the patch? > >>>> > >>>> fiemap at least tells you an extent is shared, bmap does not. > >>> > >>> yes, so bmap is clearly the wrong interface to use if you want to > >>> write directly to a file's blocks. But if you know enough to check > >>> the fiemap shared flag, you know enough to not use fibmap for that purpose... > >>> > >> > >> FWIW, this patch seems reasonable to me. To Christoph's point, I don't > >> think either interface really grants license to write to the underlying > >> blocks, so either way it's technically being abused for this purpose. > >> Unless there's a clear way to return an error for a particular type of > >> file, I think it's reasonable behavior for fibmap to expose the data it > >> supports (i.e., block maps) and drop the data it doesn't (reflink > >> state). > > > > But shared block status isn't something that can be dropped lightly. If > > you write to a shared block without realizing it, you'll corrupt every > > other file that shares the block. > > But there is no circumstance under which it is safe to write to a mapped > block no matter how you mapped it, tbh. This is just singling out one case > of many, and it seems capricious to me. Other than the blast zone being > possibly larger for reflinked files ... but I just don't think that's our > judgement to make here. > > > I prefer to have FIBMAP return errors to *cough* encourage people to use > > FIEMAP. If code are going to abuse the FI[BE]MAP interface they could > > at least abuse the one that gives it enough context to avoid fs > > corruption. (A proper fs driver would be preferable, though very > > difficult). > > I think that's a reasonable option as well.. > > Granted, grub's blocklist code doesn't seem to check for shared blocks > > when it writes grubenv.... yuck, though TBH I don't have the eye budget > > to spend on digging through grub2. > > grub2 doesn't even use fiemap or fibmap so I'm not sure it's relevant > to this decision... > > > Frankly I think FIBMAP comes verrry > > close to "this API is unfixably stupid and shouldn't be enabled for new > > use cases and should go away some day". > So instead if anyone asks we'll just give them a successful response which > is indistinguishable from a hole. :( > ... but this seems to be the crux of the matter (IMO, at least). If we can return -ENOTSUPP or whatever, then it can be made obvious that the user either needs to use fiemap or avoid using reflinked files. ISTM that what we do now is essentially report an incorrect bmap, which leads to these subtle bug reports. I haven't dug into the fibmap code.. does something prevent returning a legitimate error code? Brian > I'm pretty strongly of the opinion that we should give the user what > they ask for, and not try to intuit their motives for the question. > > -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-30 19:39 ` Brian Foster @ 2018-08-30 19:47 ` Eric Sandeen 2018-08-30 19:58 ` Brian Foster 0 siblings, 1 reply; 22+ messages in thread From: Eric Sandeen @ 2018-08-30 19:47 UTC (permalink / raw) To: Brian Foster; +Cc: Darrick J. Wong, Christoph Hellwig, Eric Sandeen, linux-xfs On 8/30/18 2:39 PM, Brian Foster wrote: >>> Frankly I think FIBMAP comes verrry >>> close to "this API is unfixably stupid and shouldn't be enabled for new >>> use cases and should go away some day". Backing up, the interface isn't /that/ dumb, other than being limited to 32 bits. Q: "Where is this logical file block physically located?" A: "It is here." Inefficient, sure. >> So instead if anyone asks we'll just give them a successful response which >> is indistinguishable from a hole. :( >> > ... but this seems to be the crux of the matter (IMO, at least). If we > can return -ENOTSUPP or whatever, then it can be made obvious that the > user either needs to use fiemap or avoid using reflinked files. ISTM > that what we do now is essentially report an incorrect bmap, which leads > to these subtle bug reports. > > I haven't dug into the fibmap code.. does something prevent returning a > legitimate error code? so ->bmap() returns a sector_t, which is a u64 and doesn't really leave room for a negative error return. But the ioctl interface stuffs that return into an int, and returns it to the user. (note to self, wonder why it doesn't return -EOVERFLOW as needed). I suppose ->bmap() could be changed to take a u64 in/out pointer like the user interface does, and return 0 or -errno. *Shrug* this seems like a lot of work to avoid simply giving the user what they asked for. ;) -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-30 19:47 ` Eric Sandeen @ 2018-08-30 19:58 ` Brian Foster 0 siblings, 0 replies; 22+ messages in thread From: Brian Foster @ 2018-08-30 19:58 UTC (permalink / raw) To: Eric Sandeen; +Cc: Darrick J. Wong, Christoph Hellwig, Eric Sandeen, linux-xfs On Thu, Aug 30, 2018 at 02:47:09PM -0500, Eric Sandeen wrote: > On 8/30/18 2:39 PM, Brian Foster wrote: > >>> Frankly I think FIBMAP comes verrry > >>> close to "this API is unfixably stupid and shouldn't be enabled for new > >>> use cases and should go away some day". > > Backing up, the interface isn't /that/ dumb, other than being limited to > 32 bits. Q: "Where is this logical file block physically located?" A: "It is here." > > Inefficient, sure. > > >> So instead if anyone asks we'll just give them a successful response which > >> is indistinguishable from a hole. :( > >> > > ... but this seems to be the crux of the matter (IMO, at least). If we > > can return -ENOTSUPP or whatever, then it can be made obvious that the > > user either needs to use fiemap or avoid using reflinked files. ISTM > > that what we do now is essentially report an incorrect bmap, which leads > > to these subtle bug reports. > > > > I haven't dug into the fibmap code.. does something prevent returning a > > legitimate error code? > > so ->bmap() returns a sector_t, which is a u64 and doesn't really > leave room for a negative error return. But the ioctl interface > stuffs that return into an int, and returns it to the user. > (note to self, wonder why it doesn't return -EOVERFLOW as needed). > Ah, so the syscall can return an error, but the internal interface basically doesn't support error propagation from the fs callback. > I suppose ->bmap() could be changed to take a u64 in/out pointer like the > user interface does, and return 0 or -errno. *Shrug* this seems like > a lot of work to avoid simply giving the user what they asked > for. ;) > FWIW, another option might be to drop the ->bmap() callback on reflinked inodes, which looks like it would trigger an -EINVAL to userspace. I also don't know if that is worth whatever trouble might be required to change function pointers like that, if it can even be done safely. I guess my take boils down to that I think either returning an error (one way or another) or accurate block map info is more sane than the current behavior. Brian > -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-30 18:51 ` Eric Sandeen 2018-08-30 19:39 ` Brian Foster @ 2018-08-31 0:11 ` Dave Chinner 2018-08-31 1:34 ` Eric Sandeen 1 sibling, 1 reply; 22+ messages in thread From: Dave Chinner @ 2018-08-31 0:11 UTC (permalink / raw) To: Eric Sandeen Cc: Darrick J. Wong, Brian Foster, Christoph Hellwig, Eric Sandeen, linux-xfs On Thu, Aug 30, 2018 at 01:51:56PM -0500, Eric Sandeen wrote: > On 8/30/18 1:28 PM, Darrick J. Wong wrote: > > On Thu, Aug 30, 2018 at 02:02:05PM -0400, Brian Foster wrote: > >> On Thu, Aug 30, 2018 at 11:35:46AM -0500, Eric Sandeen wrote: > >>> On 8/30/18 11:36 AM, Christoph Hellwig wrote: > >>>> On Thu, Aug 30, 2018 at 11:31:40AM -0500, Eric Sandeen wrote: > >>>>> That's no reason to uniquely disallow it for reflinked files, though; > >>>>> the problem is universal. It's true for fiemap as well. So I'm not sure > >>>>> that's an argument against the patch? > >>>> > >>>> fiemap at least tells you an extent is shared, bmap does not. > >>> > >>> yes, so bmap is clearly the wrong interface to use if you want to > >>> write directly to a file's blocks. But if you know enough to check > >>> the fiemap shared flag, you know enough to not use fibmap for that purpose... > >>> > >> > >> FWIW, this patch seems reasonable to me. To Christoph's point, I don't > >> think either interface really grants license to write to the underlying > >> blocks, so either way it's technically being abused for this purpose. > >> Unless there's a clear way to return an error for a particular type of > >> file, I think it's reasonable behavior for fibmap to expose the data it > >> supports (i.e., block maps) and drop the data it doesn't (reflink > >> state). > > > > But shared block status isn't something that can be dropped lightly. If > > you write to a shared block without realizing it, you'll corrupt every > > other file that shares the block. > > But there is no circumstance under which it is safe to write to a mapped > block no matter how you mapped it, tbh. <sigh> That's what all the break_layouts() code in XFS provides. It's a mechanism for applications to prevent the block layout from changing unexpected until they - the layout lease owner - give up their exclusive access to the file layout. Seriously, this has been talked about so much in the past year or two in the context of DAX, RDMA, get_user_pages() races in direct IO, etc. it pains me to see this discussion rehashing it all over again. We want applications to do what they need to do safely. FIBMAP is unsafe and, worse, it's unfixable. We need to get apps to move away from it to something is actualayl safe. Adding a file lease interface to block 3rd party changes to the file layout until the app releases the lease is a safe way of allowing userspace apps to use FIEMAP to map and identify file extents they can write directly to if they need to. IOWs, we need to get the FL_LAYOUT flag out into the external file lease interface (IIRC Dan Williams posted patches for this a while back) and get these "FIBMAP + write()" apps to use "FL_LAYOUT, fsync(), FIEMAP, write(), ~FL_LAYOUT". We need to make FIBMAP go away by providing a safer, more robust solution to the problem people are trying to solve. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-31 0:11 ` Dave Chinner @ 2018-08-31 1:34 ` Eric Sandeen 2018-08-31 3:05 ` Dave Chinner 0 siblings, 1 reply; 22+ messages in thread From: Eric Sandeen @ 2018-08-31 1:34 UTC (permalink / raw) To: Dave Chinner Cc: Darrick J. Wong, Brian Foster, Christoph Hellwig, Eric Sandeen, linux-xfs On 8/30/18 7:11 PM, Dave Chinner wrote: > On Thu, Aug 30, 2018 at 01:51:56PM -0500, Eric Sandeen wrote: >> On 8/30/18 1:28 PM, Darrick J. Wong wrote: >>> On Thu, Aug 30, 2018 at 02:02:05PM -0400, Brian Foster wrote: >>>> On Thu, Aug 30, 2018 at 11:35:46AM -0500, Eric Sandeen wrote: >>>>> On 8/30/18 11:36 AM, Christoph Hellwig wrote: >>>>>> On Thu, Aug 30, 2018 at 11:31:40AM -0500, Eric Sandeen wrote: >>>>>>> That's no reason to uniquely disallow it for reflinked files, though; >>>>>>> the problem is universal. It's true for fiemap as well. So I'm not sure >>>>>>> that's an argument against the patch? >>>>>> >>>>>> fiemap at least tells you an extent is shared, bmap does not. >>>>> >>>>> yes, so bmap is clearly the wrong interface to use if you want to >>>>> write directly to a file's blocks. But if you know enough to check >>>>> the fiemap shared flag, you know enough to not use fibmap for that purpose... >>>>> >>>> >>>> FWIW, this patch seems reasonable to me. To Christoph's point, I don't >>>> think either interface really grants license to write to the underlying >>>> blocks, so either way it's technically being abused for this purpose. >>>> Unless there's a clear way to return an error for a particular type of >>>> file, I think it's reasonable behavior for fibmap to expose the data it >>>> supports (i.e., block maps) and drop the data it doesn't (reflink >>>> state). >>> >>> But shared block status isn't something that can be dropped lightly. If >>> you write to a shared block without realizing it, you'll corrupt every >>> other file that shares the block. >> >> But there is no circumstance under which it is safe to write to a mapped >> block no matter how you mapped it, tbh. > > <sigh> > > That's what all the break_layouts() code in XFS provides. It's a > mechanism for applications to prevent the block layout from changing > unexpected until they - the layout lease owner - give up their > exclusive access to the file layout. > Seriously, this has been talked about so much in the past year or > two in the context of DAX, RDMA, get_user_pages() races in direct > IO, etc. it pains me to see this discussion rehashing it all over > again. > > We want applications to do what they need to do safely. FIBMAP is > unsafe and, worse, it's unfixable. We need to get apps to move away > from it to something is actualayl safe. > Adding a file lease interface to block 3rd party changes to the > file layout until the app releases the lease is a safe way > of allowing userspace apps to use FIEMAP to map and identify > file extents they can write directly to if they need to. > > IOWs, we need to get the FL_LAYOUT flag out into the external file > lease interface (IIRC Dan Williams posted patches for this a while > back) and get these "FIBMAP + write()" apps to use "FL_LAYOUT, > fsync(), FIEMAP, write(), ~FL_LAYOUT". > > We need to make FIBMAP go away by providing a safer, more robust > solution to the problem people are trying to solve. Sure. I get it that it's not a great interface. I get it that there are better ways. When those methods are available, we should explicitly deprecate FIBMAP. But until then I can't understand why we'd intentionally break an otherwise reasonably functional interface in subtle and undetectable ways for certain classes of files. We /could/ FIBMAP a reflinked file exactly as well as we can FIBMAP a non-reflinked file, but we choose not to. This choice creates unnecessary problems for existing apps. Until we deprecate the FIBMAP interface, until there is a better way, I think we should make it as predictable and complete as we can, not cripple it intentionally. But I'm clearly in the minority with that opinion, so I guess I'll withdraw the patch. -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-31 1:34 ` Eric Sandeen @ 2018-08-31 3:05 ` Dave Chinner 2018-08-31 13:08 ` Eric Sandeen 0 siblings, 1 reply; 22+ messages in thread From: Dave Chinner @ 2018-08-31 3:05 UTC (permalink / raw) To: Eric Sandeen Cc: Darrick J. Wong, Brian Foster, Christoph Hellwig, Eric Sandeen, linux-xfs On Thu, Aug 30, 2018 at 08:34:02PM -0500, Eric Sandeen wrote: > On 8/30/18 7:11 PM, Dave Chinner wrote: > > On Thu, Aug 30, 2018 at 01:51:56PM -0500, Eric Sandeen wrote: > >> On 8/30/18 1:28 PM, Darrick J. Wong wrote: > >>> On Thu, Aug 30, 2018 at 02:02:05PM -0400, Brian Foster wrote: > >>>> On Thu, Aug 30, 2018 at 11:35:46AM -0500, Eric Sandeen wrote: > >>>>> On 8/30/18 11:36 AM, Christoph Hellwig wrote: > >>>>>> On Thu, Aug 30, 2018 at 11:31:40AM -0500, Eric Sandeen wrote: > >>>>>>> That's no reason to uniquely disallow it for reflinked files, though; > >>>>>>> the problem is universal. It's true for fiemap as well. So I'm not sure > >>>>>>> that's an argument against the patch? > >>>>>> > >>>>>> fiemap at least tells you an extent is shared, bmap does not. > >>>>> > >>>>> yes, so bmap is clearly the wrong interface to use if you want to > >>>>> write directly to a file's blocks. But if you know enough to check > >>>>> the fiemap shared flag, you know enough to not use fibmap for that purpose... > >>>>> > >>>> > >>>> FWIW, this patch seems reasonable to me. To Christoph's point, I don't > >>>> think either interface really grants license to write to the underlying > >>>> blocks, so either way it's technically being abused for this purpose. > >>>> Unless there's a clear way to return an error for a particular type of > >>>> file, I think it's reasonable behavior for fibmap to expose the data it > >>>> supports (i.e., block maps) and drop the data it doesn't (reflink > >>>> state). > >>> > >>> But shared block status isn't something that can be dropped lightly. If > >>> you write to a shared block without realizing it, you'll corrupt every > >>> other file that shares the block. > >> > >> But there is no circumstance under which it is safe to write to a mapped > >> block no matter how you mapped it, tbh. > > > > <sigh> > > > > That's what all the break_layouts() code in XFS provides. It's a > > mechanism for applications to prevent the block layout from changing > > unexpected until they - the layout lease owner - give up their > > exclusive access to the file layout. > > > Seriously, this has been talked about so much in the past year or > > two in the context of DAX, RDMA, get_user_pages() races in direct > > IO, etc. it pains me to see this discussion rehashing it all over > > again. > > > > We want applications to do what they need to do safely. FIBMAP is > > unsafe and, worse, it's unfixable. We need to get apps to move away > > from it to something is actualayl safe. > > > > Adding a file lease interface to block 3rd party changes to the > > file layout until the app releases the lease is a safe way > > of allowing userspace apps to use FIEMAP to map and identify > > file extents they can write directly to if they need to. > > > > IOWs, we need to get the FL_LAYOUT flag out into the external file > > lease interface (IIRC Dan Williams posted patches for this a while > > back) and get these "FIBMAP + write()" apps to use "FL_LAYOUT, > > fsync(), FIEMAP, write(), ~FL_LAYOUT". > > > > We need to make FIBMAP go away by providing a safer, more robust > > solution to the problem people are trying to solve. > > Sure. I get it that it's not a great interface. I get it that there > are better ways. When those methods are available, we should explicitly > deprecate FIBMAP. > > But until then I can't understand why we'd intentionally break an > otherwise reasonably functional interface in subtle and undetectable > ways for certain classes of files. Because that's better than subtle, silent and undetectable data corruption of certain classes of files in certain common FIBMAP use cases. > We /could/ FIBMAP a reflinked file > exactly as well as we can FIBMAP a non-reflinked file, but we choose > not to. This choice creates unnecessary problems for existing apps. Yes, we /could/. But we don't, because because of the typical use case of FIBMAP which is to map files so that IO (read and/or write) can be done directly to the block device without going through the filesystem. For normal files this is /relatively/ safe if the correct precautions are taken. For a shared extent, writing is *never* safe. We don't know what users of FIBMAP are going to do with the block map they get, but we *know* there are apps that use it to write direct to block devices and we *know* that this will cause _silent_ data corruption and/or exposure of privileged information (shared data can have different owners at different permission levels) when it occurs. That's what we're stopping by not exposing shared extents via FIBMAP - applications that are careful and try to do everything as safely and correctly as possible will still get it wrong and have no indication that they've just screwed something up badly. > Until we deprecate the FIBMAP interface, until there is a better way, > I think we should make it as predictable and complete as > we can, not cripple it intentionally. Crippling it is the lesser of two evils, unfortunately. I don't like it either, but I like the idea of silent data corruption or data exposure vectors even less. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-31 3:05 ` Dave Chinner @ 2018-08-31 13:08 ` Eric Sandeen 2018-09-01 8:32 ` Christoph Hellwig 0 siblings, 1 reply; 22+ messages in thread From: Eric Sandeen @ 2018-08-31 13:08 UTC (permalink / raw) To: Dave Chinner Cc: Darrick J. Wong, Brian Foster, Christoph Hellwig, Eric Sandeen, linux-xfs On 8/30/18 10:05 PM, Dave Chinner wrote: > On Thu, Aug 30, 2018 at 08:34:02PM -0500, Eric Sandeen wrote: ... >> We /could/ FIBMAP a reflinked file >> exactly as well as we can FIBMAP a non-reflinked file, but we choose >> not to. This choice creates unnecessary problems for existing apps. > > Yes, we /could/. But we don't, because because of the typical use > case of FIBMAP which is to map files so that IO (read and/or write) > can be done directly to the block device without going through the > filesystem. For normal files this is /relatively/ safe if the > correct precautions are taken. For a shared extent, writing is > *never* safe. > > We don't know what users of FIBMAP are going to do with the block > map they get, but we *know* there are apps that use it to write > direct to block devices and we *know* that this will cause _silent_ > data corruption and/or exposure of privileged information (shared > data can have different owners at different permission levels) when > it occurs. It'd be way better for my blood pressure if people would stop making poor arguments about orthogonal issues. If you can directly read the block device hosting the files, file permissions of shared blocks are completely beside the point. And app writers can make bad/dangerous decisions no matter what mapping mechanisms we provide. (If an app writer knows enough to look at FIEMAP flags to check for shared blocks before writing, they'd know enough to use it instead of FIBMAP in the first place.) But I'm not even aware of any apps that actually /do/ use FI[BE]MAP mapping information to write, in any case. While grub2 does directly write file blocks in one case, it doesn't even /use/ FIBMAP! So it will happily continue doing so, even for reflinked files, despite our best efforts to break the interface for safe users. The "typical use case" is to map a file for a bootloader to /read/ at boot, and our arbitrary restriction has broken systems in the real world, and burned a lot of institutional effort getting to the bottom of the problem and working around it. I know it's our job to be pedantic, secure, and correct, but speculating about what misbehaving privileged apps /could/ do and then plugging a small fraction of the possible misuse vectors while breaking actual, safe, real-world usecases just isn't a compelling story. -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-31 13:08 ` Eric Sandeen @ 2018-09-01 8:32 ` Christoph Hellwig 0 siblings, 0 replies; 22+ messages in thread From: Christoph Hellwig @ 2018-09-01 8:32 UTC (permalink / raw) To: Eric Sandeen Cc: Dave Chinner, Darrick J. Wong, Brian Foster, Christoph Hellwig, Eric Sandeen, linux-xfs On Fri, Aug 31, 2018 at 08:08:11AM -0500, Eric Sandeen wrote: > If you can directly read the block device hosting the files, file > permissions of shared blocks are completely beside the point. The point is that you should never, ever directly read blocks from the block device, nevermind write. And we need to make sure userspace stops doing that instead of catering to it in any way. > The "typical use case" is to map a file for a bootloader to /read/ > at boot, and our arbitrary restriction has broken systems in > the real world, and burned a lot of institutional effort getting > to the bottom of the problem and working around it. And even that 'use case' is utterly broken. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-30 18:28 ` Darrick J. Wong 2018-08-30 18:51 ` Eric Sandeen @ 2018-08-31 6:28 ` Christoph Hellwig 2018-08-31 12:36 ` Brian Foster 1 sibling, 1 reply; 22+ messages in thread From: Christoph Hellwig @ 2018-08-31 6:28 UTC (permalink / raw) To: Darrick J. Wong Cc: Brian Foster, Eric Sandeen, Christoph Hellwig, Eric Sandeen, linux-xfs On Thu, Aug 30, 2018 at 11:28:49AM -0700, Darrick J. Wong wrote: > I prefer to have FIBMAP return errors to *cough* encourage people to use > FIEMAP. If code are going to abuse the FI[BE]MAP interface they could > at least abuse the one that gives it enough context to avoid fs > corruption. (A proper fs driver would be preferable, though very > difficult). I think Carlos was looking into implementing the FIBMAP ioctl using ->fiemap. In that case we could return sensible errors, and centralize policy in a single place.. > Granted, grub's blocklist code doesn't seem to check for shared blocks > when it writes grubenv.... yuck, though TBH I don't have the eye budget > to spend on digging through grub2. Frankly I think FIBMAP comes verrry > close to "this API is unfixably stupid and shouldn't be enabled for new > use cases and should go away some day". .. and that policy should be: always return an error for the slightest unusual file layout (shared, encrypted, inline, etc). ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-31 6:28 ` Christoph Hellwig @ 2018-08-31 12:36 ` Brian Foster 2018-09-01 8:31 ` Christoph Hellwig 2018-09-02 14:08 ` Carlos Maiolino 0 siblings, 2 replies; 22+ messages in thread From: Brian Foster @ 2018-08-31 12:36 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Darrick J. Wong, Eric Sandeen, Eric Sandeen, linux-xfs On Fri, Aug 31, 2018 at 08:28:13AM +0200, Christoph Hellwig wrote: > On Thu, Aug 30, 2018 at 11:28:49AM -0700, Darrick J. Wong wrote: > > I prefer to have FIBMAP return errors to *cough* encourage people to use > > FIEMAP. If code are going to abuse the FI[BE]MAP interface they could > > at least abuse the one that gives it enough context to avoid fs > > corruption. (A proper fs driver would be preferable, though very > > difficult). > > I think Carlos was looking into implementing the FIBMAP ioctl > using ->fiemap. In that case we could return sensible errors, > and centralize policy in a single place.. > So basically ioctl_fibmap() either prioritizes ->fiemap() or looks for some special combination of (fiemap && !bmap) to translate the call.. > > Granted, grub's blocklist code doesn't seem to check for shared blocks > > when it writes grubenv.... yuck, though TBH I don't have the eye budget > > to spend on digging through grub2. Frankly I think FIBMAP comes verrry > > close to "this API is unfixably stupid and shouldn't be enabled for new > > use cases and should go away some day". > > .. and that policy should be: always return an error for the slightest > unusual file layout (shared, encrypted, inline, etc). ... and then return some error if the associate extent is in some state that cannot be described by fibmap..? That sounds like a nice option to me. Carlos..? Maybe it's too late for this, but I think even dropping ->bmap completely for the time being on XFS reflink=1 filesystems is preferable to the current behavior where we return a perfectly valid result and pretend that somehow represents an error to userspace. The arguments for the current behavior essentially apply the "known fibmap usecase of direct block writes" as justification for implementing this policy in the kernel. In practice, the current behavior just trades off one problem (data corruption) for another where the end result is probably the same for that particular use case: the system doesn't boot. If we dropped bmap, then at least there's an obvious error and the user can decide whether to update to fiemap or disable reflink (as opposed to us having to continue to chase down these odd bootloader issues). Brian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-31 12:36 ` Brian Foster @ 2018-09-01 8:31 ` Christoph Hellwig 2018-09-02 14:08 ` Carlos Maiolino 1 sibling, 0 replies; 22+ messages in thread From: Christoph Hellwig @ 2018-09-01 8:31 UTC (permalink / raw) To: Brian Foster Cc: Christoph Hellwig, Darrick J. Wong, Eric Sandeen, Eric Sandeen, linux-xfs, Carlos Maiolino [adding Carlos who was looking into this if I remember correctly] On Fri, Aug 31, 2018 at 08:36:40AM -0400, Brian Foster wrote: > So basically ioctl_fibmap() either prioritizes ->fiemap() or looks for > some special combination of (fiemap && !bmap) to translate the call.. I'd always use fiemap if present, but if there is any good reason to prefer ->bmap if present that should be ok. Hopefully we can just kill of pure ->bmap implementations in the long run.. > > > Granted, grub's blocklist code doesn't seem to check for shared blocks > > > when it writes grubenv.... yuck, though TBH I don't have the eye budget > > > to spend on digging through grub2. Frankly I think FIBMAP comes verrry > > > close to "this API is unfixably stupid and shouldn't be enabled for new > > > use cases and should go away some day". > > > > .. and that policy should be: always return an error for the slightest > > unusual file layout (shared, encrypted, inline, etc). > > ... and then return some error if the associate extent is in some state > that cannot be described by fibmap..? That sounds like a nice option to > me. Carlos..? Yes. > Maybe it's too late for this, but I think even dropping ->bmap > completely for the time being on XFS reflink=1 filesystems is preferable > to the current behavior where we return a perfectly valid result and > pretend that somehow represents an error to userspace. Note that a 0 return on ->bmap has traditionally been treated as an error, including in userspace as block 0 is usually not a valid block for user data. (this isn't intended as support for this interface, just stating history) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-08-31 12:36 ` Brian Foster 2018-09-01 8:31 ` Christoph Hellwig @ 2018-09-02 14:08 ` Carlos Maiolino 2018-09-02 17:52 ` Eric Sandeen 1 sibling, 1 reply; 22+ messages in thread From: Carlos Maiolino @ 2018-09-02 14:08 UTC (permalink / raw) To: Brian Foster Cc: Christoph Hellwig, Darrick J. Wong, Eric Sandeen, Eric Sandeen, linux-xfs Hi Folks, On Fri, Aug 31, 2018 at 08:36:40AM -0400, Brian Foster wrote: > On Fri, Aug 31, 2018 at 08:28:13AM +0200, Christoph Hellwig wrote: > > On Thu, Aug 30, 2018 at 11:28:49AM -0700, Darrick J. Wong wrote: > > > I prefer to have FIBMAP return errors to *cough* encourage people to use > > > FIEMAP. If code are going to abuse the FI[BE]MAP interface they could > > > at least abuse the one that gives it enough context to avoid fs > > > corruption. (A proper fs driver would be preferable, though very > > > difficult). > > > > I think Carlos was looking into implementing the FIBMAP ioctl > > using ->fiemap. In that case we could return sensible errors, > > and centralize policy in a single place.. > > > > So basically ioctl_fibmap() either prioritizes ->fiemap() or looks for > some special combination of (fiemap && !bmap) to translate the call.. > > > > Granted, grub's blocklist code doesn't seem to check for shared blocks > > > when it writes grubenv.... yuck, though TBH I don't have the eye budget > > > to spend on digging through grub2. Frankly I think FIBMAP comes verrry > > > close to "this API is unfixably stupid and shouldn't be enabled for new > > > use cases and should go away some day". > > > > .. and that policy should be: always return an error for the slightest > > unusual file layout (shared, encrypted, inline, etc). > > ... and then return some error if the associate extent is in some state > that cannot be described by fibmap..? That sounds like a nice option to > me. Carlos..? > Yes, I've been working on using FIEMAP interface to handle FIBMAP, it was mostly working, although it needed some extra tweaks due the fact different filesystems return different blocks inside an extent, when a single block query is made on FIEMAP. I mean, if you query for a single block which is in the middle of an extent, ext4 returns the address of the specific block inside the extent, while xfs (using iomap fiemap infra), returns the address of the first block in the extent. Or something like that, I needed to context switch to some other tasks, but I'll come back to this during this week, and let you guys know. Cheers > Maybe it's too late for this, but I think even dropping ->bmap > completely for the time being on XFS reflink=1 filesystems is preferable > to the current behavior where we return a perfectly valid result and > pretend that somehow represents an error to userspace. > > The arguments for the current behavior essentially apply the "known > fibmap usecase of direct block writes" as justification for implementing > this policy in the kernel. In practice, the current behavior just trades > off one problem (data corruption) for another where the end result is > probably the same for that particular use case: the system doesn't boot. > If we dropped bmap, then at least there's an obvious error and the user > can decide whether to update to fiemap or disable reflink (as opposed to > us having to continue to chase down these odd bootloader issues). > > Brian -- Carlos ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-09-02 14:08 ` Carlos Maiolino @ 2018-09-02 17:52 ` Eric Sandeen 2018-09-03 10:21 ` Carlos Maiolino 0 siblings, 1 reply; 22+ messages in thread From: Eric Sandeen @ 2018-09-02 17:52 UTC (permalink / raw) To: Brian Foster, Christoph Hellwig, Darrick J. Wong, Eric Sandeen, linux-xfs On 9/2/18 9:08 AM, Carlos Maiolino wrote: > Hi Folks, > > On Fri, Aug 31, 2018 at 08:36:40AM -0400, Brian Foster wrote: >> On Fri, Aug 31, 2018 at 08:28:13AM +0200, Christoph Hellwig wrote: >>> On Thu, Aug 30, 2018 at 11:28:49AM -0700, Darrick J. Wong wrote: >>>> I prefer to have FIBMAP return errors to *cough* encourage people to use >>>> FIEMAP. If code are going to abuse the FI[BE]MAP interface they could >>>> at least abuse the one that gives it enough context to avoid fs >>>> corruption. (A proper fs driver would be preferable, though very >>>> difficult). >>> >>> I think Carlos was looking into implementing the FIBMAP ioctl >>> using ->fiemap. In that case we could return sensible errors, >>> and centralize policy in a single place.. >>> >> >> So basically ioctl_fibmap() either prioritizes ->fiemap() or looks for >> some special combination of (fiemap && !bmap) to translate the call.. >> >>>> Granted, grub's blocklist code doesn't seem to check for shared blocks >>>> when it writes grubenv.... yuck, though TBH I don't have the eye budget >>>> to spend on digging through grub2. Frankly I think FIBMAP comes verrry >>>> close to "this API is unfixably stupid and shouldn't be enabled for new >>>> use cases and should go away some day". >>> >>> .. and that policy should be: always return an error for the slightest >>> unusual file layout (shared, encrypted, inline, etc). >> >> ... and then return some error if the associate extent is in some state >> that cannot be described by fibmap..? That sounds like a nice option to >> me. Carlos..? >> > > Yes, I've been working on using FIEMAP interface to handle FIBMAP, it was mostly > working, although it needed some extra tweaks due the fact different > filesystems return different blocks inside an extent, when a single block query > is made on FIEMAP. > > I mean, if you query for a single block which is in the middle of an extent, > ext4 returns the address of the specific block inside the extent, while xfs > (using iomap fiemap infra), returns the address of the first block in the > extent. IMHO, with hindsight, FIEMAP really kind of sucks. The call is a pain to set up, and the results are a pain to interpret. Preparing a patch for zipl to use FIEMAP, I realized what a truly crappy and cumbersome interface it is, particularly if you just want to map a small handful of blocks. I doubt it'd fly, but I'm half tempted to propose a new block-at-at-time FIBMAP2 that doesn't overflow 32 bits, uses flags similar to FIEMAP to control behavior and return mapped block state, and can return proper errors. FIEMAP is great that it can efficiently map contiguous extents and all, but it's so cumbersome to use. (ISTR that both xfsprogs' and e2fsprogs' use of FIEMAP had /multiple/ followon commits fixing the original implementation in filefrag and xfs_io.) -Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap 2018-09-02 17:52 ` Eric Sandeen @ 2018-09-03 10:21 ` Carlos Maiolino 0 siblings, 0 replies; 22+ messages in thread From: Carlos Maiolino @ 2018-09-03 10:21 UTC (permalink / raw) To: Eric Sandeen Cc: Brian Foster, Christoph Hellwig, Darrick J. Wong, Eric Sandeen, linux-xfs On Sun, Sep 02, 2018 at 12:52:47PM -0500, Eric Sandeen wrote: > On 9/2/18 9:08 AM, Carlos Maiolino wrote: > > Hi Folks, > > > > On Fri, Aug 31, 2018 at 08:36:40AM -0400, Brian Foster wrote: > >> On Fri, Aug 31, 2018 at 08:28:13AM +0200, Christoph Hellwig wrote: > >>> On Thu, Aug 30, 2018 at 11:28:49AM -0700, Darrick J. Wong wrote: > >>>> I prefer to have FIBMAP return errors to *cough* encourage people to use > >>>> FIEMAP. If code are going to abuse the FI[BE]MAP interface they could > >>>> at least abuse the one that gives it enough context to avoid fs > >>>> corruption. (A proper fs driver would be preferable, though very > >>>> difficult). > >>> > >>> I think Carlos was looking into implementing the FIBMAP ioctl > >>> using ->fiemap. In that case we could return sensible errors, > >>> and centralize policy in a single place.. > >>> > >> > >> So basically ioctl_fibmap() either prioritizes ->fiemap() or looks for > >> some special combination of (fiemap && !bmap) to translate the call.. > >> > >>>> Granted, grub's blocklist code doesn't seem to check for shared blocks > >>>> when it writes grubenv.... yuck, though TBH I don't have the eye budget > >>>> to spend on digging through grub2. Frankly I think FIBMAP comes verrry > >>>> close to "this API is unfixably stupid and shouldn't be enabled for new > >>>> use cases and should go away some day". > >>> > >>> .. and that policy should be: always return an error for the slightest > >>> unusual file layout (shared, encrypted, inline, etc). > >> > >> ... and then return some error if the associate extent is in some state > >> that cannot be described by fibmap..? That sounds like a nice option to > >> me. Carlos..? > >> > > > > Yes, I've been working on using FIEMAP interface to handle FIBMAP, it was mostly > > working, although it needed some extra tweaks due the fact different > > filesystems return different blocks inside an extent, when a single block query > > is made on FIEMAP. > > > > I mean, if you query for a single block which is in the middle of an extent, > > ext4 returns the address of the specific block inside the extent, while xfs > > (using iomap fiemap infra), returns the address of the first block in the > > extent. Ops, after re-reading my notes about it today, it's quite the opposite. xfs (with iomap), returns the requested block, while ext4 returns the beginning of the extent. > > IMHO, with hindsight, FIEMAP really kind of sucks. The call is a pain to > set up, and the results are a pain to interpret. > > Preparing a patch for zipl to use FIEMAP, I realized what a truly crappy and > cumbersome interface it is, particularly if you just want to map a small > handful of blocks. > > I doubt it'd fly, but I'm half tempted to propose a new block-at-at-time > FIBMAP2 that doesn't overflow 32 bits, uses flags similar to FIEMAP to control > behavior and return mapped block state, and can return proper errors. > > FIEMAP is great that it can efficiently map contiguous extents and all, but > it's so cumbersome to use. (ISTR that both xfsprogs' and e2fsprogs' use of > FIEMAP had /multiple/ followon commits fixing the original implementation > in filefrag and xfs_io.) > I'm still refreshing my memory at the subject, but, IIRC the past discussions, we will need to support FIBMAP forever, but, internally we can actually use FIEMAP infra-structure to answer FIBMAP calls. From a user perspective, it changes nothing. The idea, is to replace all the ->bmap calls, by calling into bmap directly, and make bmap() internals to use FIEMAP infra-structure to get the required single block. I made it work successfully on systems using only XFS, until I tested on ext4 and the system didn't boot due the difference between iomap and ext4 FIEMAP implementation I explained above. Actually, IIRC I put this project by side because we haven't decided which interface would be better, but looks like the right approach would be to use an fs-independent solution, which is something I'll try to achieve this week. Cheers > -Eric -- Carlos ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2018-09-03 14:41 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-08-30 16:10 [PATCH, RFC] xfs: re-enable FIBMAP on reflink; disable for swap Eric Sandeen 2018-08-30 16:25 ` Christoph Hellwig 2018-08-30 16:31 ` Eric Sandeen 2018-08-30 16:36 ` Christoph Hellwig 2018-08-30 16:35 ` Eric Sandeen 2018-08-30 18:02 ` Brian Foster 2018-08-30 18:28 ` Darrick J. Wong 2018-08-30 18:51 ` Eric Sandeen 2018-08-30 19:39 ` Brian Foster 2018-08-30 19:47 ` Eric Sandeen 2018-08-30 19:58 ` Brian Foster 2018-08-31 0:11 ` Dave Chinner 2018-08-31 1:34 ` Eric Sandeen 2018-08-31 3:05 ` Dave Chinner 2018-08-31 13:08 ` Eric Sandeen 2018-09-01 8:32 ` Christoph Hellwig 2018-08-31 6:28 ` Christoph Hellwig 2018-08-31 12:36 ` Brian Foster 2018-09-01 8:31 ` Christoph Hellwig 2018-09-02 14:08 ` Carlos Maiolino 2018-09-02 17:52 ` Eric Sandeen 2018-09-03 10:21 ` Carlos Maiolino
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.