All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug]: fiemap returns zero extents for all files
@ 2023-12-05  5:26 Steve Smith
  2023-12-05 14:54 ` Brian Foster
  0 siblings, 1 reply; 6+ messages in thread
From: Steve Smith @ 2023-12-05  5:26 UTC (permalink / raw)
  To: linux-bcachefs

Hi,

I'm doing some testing of xcp[1] against bcachefs, and having issues
with fiemap. Using fiemap on both sparse and non-sparse files always
returns `mapped_extents` of 0. The same tests on other extent-based
FSs return non-zero extents, with consistent values. This is against
-rc4 and master.

I've broken the relevant tests (in Rust) out into a standalone crate
if you want to reproduce:

    https://github.com/tarka/bcachefs-test

The FS in question was created with a simple `bcachefs -L xcp-test /dev/sdh1`.

Cheers,
Steve

[1]: https://github.com/tarka/xcp

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug]: fiemap returns zero extents for all files
  2023-12-05  5:26 [bug]: fiemap returns zero extents for all files Steve Smith
@ 2023-12-05 14:54 ` Brian Foster
  2023-12-05 23:26   ` Steve Smith
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Foster @ 2023-12-05 14:54 UTC (permalink / raw)
  To: Steve Smith; +Cc: linux-bcachefs

On Tue, Dec 05, 2023 at 04:26:40PM +1100, Steve Smith wrote:
> Hi,
> 
> I'm doing some testing of xcp[1] against bcachefs, and having issues
> with fiemap. Using fiemap on both sparse and non-sparse files always
> returns `mapped_extents` of 0. The same tests on other extent-based
> FSs return non-zero extents, with consistent values. This is against
> -rc4 and master.
> 
> I've broken the relevant tests (in Rust) out into a standalone crate
> if you want to reproduce:
> 
>     https://github.com/tarka/bcachefs-test
> 
> The FS in question was created with a simple `bcachefs -L xcp-test /dev/sdh1`.
> 

Hi Steve,

Well I'm not really familiar with xcp or your test harness here, but
have you checked the results you're seeing against known working tools?
For example, one of the subtests just seems to perform a simple write
followed by an fiemap check/assert for a single extent. That is roughly
equivalent to:

# xfs_io -fc "pwrite 0 128k" -c "fiemap -v" ./file
wrote 131072/131072 bytes at offset 0
128 KiB, 32 ops; 0.0011 sec (108.790 MiB/sec and 27850.3046 ops/sec)
./file:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..127]:        339968..340095     128   0x0
   1: [128..255]:      340096..340223     128   0x1

... which mostly seems to DTRT. I have noticed in the past that bcachefs
fiemap seems to break extents into bucket size or some such segments,
regardless of contiguity, but this doesn't seem to be the issue you are
reporting here. I.e. an strace of the above shows the following for the
fiemap ioctl():

ioctl(3, FS_IOC_FIEMAP, {fm_start=0, fm_length=18446744073709551615, fm_flags=FIEMAP_FLAG_SYNC, fm_extent_count=32} => {fm_flags=FIEMAP_FLAG_SYNC, fm_mapped_extents=2, ...}) = 0

... which shows fm_mapped_extents == 2. Hm?

Brian

> Cheers,
> Steve
> 
> [1]: https://github.com/tarka/xcp
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug]: fiemap returns zero extents for all files
  2023-12-05 14:54 ` Brian Foster
@ 2023-12-05 23:26   ` Steve Smith
  2023-12-06 19:01     ` Brian Foster
  0 siblings, 1 reply; 6+ messages in thread
From: Steve Smith @ 2023-12-05 23:26 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-bcachefs

Hi Brian,

Thanks for the suggestion. Yes, the ioctl call that xfs_io makes is
basically identical to the one the test makes, except it has the SYNC
flag set. Adding this flag (or calling fsync) results in expected
behaviour (mapped_extents > 0).

I assume that the bcachefs fiemap impl ignores unwritten data and only
responds with flushed extents? I'm not sure if this is correct; for
comparison, XFS, Btfs, zfs, and ext4, all return the extent map
immediately after write without syncing.

You're right that bcachefs seems to split extents that would be
contiguous on some other FSs, but that's fine here.

Thanks,
Steve



On Wed, 6 Dec 2023 at 01:53, Brian Foster <bfoster@redhat.com> wrote:
>
> On Tue, Dec 05, 2023 at 04:26:40PM +1100, Steve Smith wrote:
> > Hi,
> >
> > I'm doing some testing of xcp[1] against bcachefs, and having issues
> > with fiemap. Using fiemap on both sparse and non-sparse files always
> > returns `mapped_extents` of 0. The same tests on other extent-based
> > FSs return non-zero extents, with consistent values. This is against
> > -rc4 and master.
> >
> > I've broken the relevant tests (in Rust) out into a standalone crate
> > if you want to reproduce:
> >
> >     https://github.com/tarka/bcachefs-test
> >
> > The FS in question was created with a simple `bcachefs -L xcp-test /dev/sdh1`.
> >
>
> Hi Steve,
>
> Well I'm not really familiar with xcp or your test harness here, but
> have you checked the results you're seeing against known working tools?
> For example, one of the subtests just seems to perform a simple write
> followed by an fiemap check/assert for a single extent. That is roughly
> equivalent to:
>
> # xfs_io -fc "pwrite 0 128k" -c "fiemap -v" ./file
> wrote 131072/131072 bytes at offset 0
> 128 KiB, 32 ops; 0.0011 sec (108.790 MiB/sec and 27850.3046 ops/sec)
> ./file:
>  EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>    0: [0..127]:        339968..340095     128   0x0
>    1: [128..255]:      340096..340223     128   0x1
>
> ... which mostly seems to DTRT. I have noticed in the past that bcachefs
> fiemap seems to break extents into bucket size or some such segments,
> regardless of contiguity, but this doesn't seem to be the issue you are
> reporting here. I.e. an strace of the above shows the following for the
> fiemap ioctl():
>
> ioctl(3, FS_IOC_FIEMAP, {fm_start=0, fm_length=18446744073709551615, fm_flags=FIEMAP_FLAG_SYNC, fm_extent_count=32} => {fm_flags=FIEMAP_FLAG_SYNC, fm_mapped_extents=2, ...}) = 0
>
> ... which shows fm_mapped_extents == 2. Hm?
>
> Brian
>
> > Cheers,
> > Steve
> >
> > [1]: https://github.com/tarka/xcp
> >
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug]: fiemap returns zero extents for all files
  2023-12-05 23:26   ` Steve Smith
@ 2023-12-06 19:01     ` Brian Foster
  2023-12-06 22:42       ` Kent Overstreet
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Foster @ 2023-12-06 19:01 UTC (permalink / raw)
  To: Steve Smith; +Cc: linux-bcachefs

On Wed, Dec 06, 2023 at 10:26:14AM +1100, Steve Smith wrote:
> Hi Brian,
> 
> Thanks for the suggestion. Yes, the ioctl call that xfs_io makes is
> basically identical to the one the test makes, except it has the SYNC
> flag set. Adding this flag (or calling fsync) results in expected
> behaviour (mapped_extents > 0).
> 

Ah..

> I assume that the bcachefs fiemap impl ignores unwritten data and only
> responds with flushed extents? I'm not sure if this is correct; for
> comparison, XFS, Btfs, zfs, and ext4, all return the extent map
> immediately after write without syncing.
> 

Ok, thanks for digging into the difference. I can reproduce what you
observe by just removing the sync flag from the xfs_io command.

I know that XFS basically looks up extents in the in-core extent tree,
which will include things like delayed allocation (i.e. buffered writes
that have not yet been physically allocated and flushed to disk). It
looks like bcachefs fiemap just walks the extents btree for associated
inode keys. I suspect the buffered write path is just not updating the
tree in any way, which means fiemap won't see extents until dirty data
is flushed out. FWIW, this can also be observed by doing buffered
overwrites and observing that the underlying block range does not change
until the file data is flushed.

I'll have to read through the buffered write path to grok it more
clearly and maybe think about if/how this could be improved. For the
time being I'd suggest to consider bcachefs fiemap without
FIEMAP_FLAG_SYNC as unsupported if you expect to see most recent inode
state. The most obvious solution may very well be to flush
unconditionally anyways. Thanks for reporting this behavior.

Brian

> You're right that bcachefs seems to split extents that would be
> contiguous on some other FSs, but that's fine here.
> 
> Thanks,
> Steve
> 
> 
> 
> On Wed, 6 Dec 2023 at 01:53, Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Tue, Dec 05, 2023 at 04:26:40PM +1100, Steve Smith wrote:
> > > Hi,
> > >
> > > I'm doing some testing of xcp[1] against bcachefs, and having issues
> > > with fiemap. Using fiemap on both sparse and non-sparse files always
> > > returns `mapped_extents` of 0. The same tests on other extent-based
> > > FSs return non-zero extents, with consistent values. This is against
> > > -rc4 and master.
> > >
> > > I've broken the relevant tests (in Rust) out into a standalone crate
> > > if you want to reproduce:
> > >
> > >     https://github.com/tarka/bcachefs-test
> > >
> > > The FS in question was created with a simple `bcachefs -L xcp-test /dev/sdh1`.
> > >
> >
> > Hi Steve,
> >
> > Well I'm not really familiar with xcp or your test harness here, but
> > have you checked the results you're seeing against known working tools?
> > For example, one of the subtests just seems to perform a simple write
> > followed by an fiemap check/assert for a single extent. That is roughly
> > equivalent to:
> >
> > # xfs_io -fc "pwrite 0 128k" -c "fiemap -v" ./file
> > wrote 131072/131072 bytes at offset 0
> > 128 KiB, 32 ops; 0.0011 sec (108.790 MiB/sec and 27850.3046 ops/sec)
> > ./file:
> >  EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
> >    0: [0..127]:        339968..340095     128   0x0
> >    1: [128..255]:      340096..340223     128   0x1
> >
> > ... which mostly seems to DTRT. I have noticed in the past that bcachefs
> > fiemap seems to break extents into bucket size or some such segments,
> > regardless of contiguity, but this doesn't seem to be the issue you are
> > reporting here. I.e. an strace of the above shows the following for the
> > fiemap ioctl():
> >
> > ioctl(3, FS_IOC_FIEMAP, {fm_start=0, fm_length=18446744073709551615, fm_flags=FIEMAP_FLAG_SYNC, fm_extent_count=32} => {fm_flags=FIEMAP_FLAG_SYNC, fm_mapped_extents=2, ...}) = 0
> >
> > ... which shows fm_mapped_extents == 2. Hm?
> >
> > Brian
> >
> > > Cheers,
> > > Steve
> > >
> > > [1]: https://github.com/tarka/xcp
> > >
> >
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug]: fiemap returns zero extents for all files
  2023-12-06 19:01     ` Brian Foster
@ 2023-12-06 22:42       ` Kent Overstreet
  2023-12-08 12:22         ` Brian Foster
  0 siblings, 1 reply; 6+ messages in thread
From: Kent Overstreet @ 2023-12-06 22:42 UTC (permalink / raw)
  To: Brian Foster; +Cc: Steve Smith, linux-bcachefs

On Wed, Dec 06, 2023 at 02:01:02PM -0500, Brian Foster wrote:
> On Wed, Dec 06, 2023 at 10:26:14AM +1100, Steve Smith wrote:
> > Hi Brian,
> > 
> > Thanks for the suggestion. Yes, the ioctl call that xfs_io makes is
> > basically identical to the one the test makes, except it has the SYNC
> > flag set. Adding this flag (or calling fsync) results in expected
> > behaviour (mapped_extents > 0).
> > 
> 
> Ah..
> 
> > I assume that the bcachefs fiemap impl ignores unwritten data and only
> > responds with flushed extents? I'm not sure if this is correct; for
> > comparison, XFS, Btfs, zfs, and ext4, all return the extent map
> > immediately after write without syncing.
> > 
> 
> Ok, thanks for digging into the difference. I can reproduce what you
> observe by just removing the sync flag from the xfs_io command.

Whoops, yes, this is missing.

> 
> I know that XFS basically looks up extents in the in-core extent tree,
> which will include things like delayed allocation (i.e. buffered writes
> that have not yet been physically allocated and flushed to disk). It
> looks like bcachefs fiemap just walks the extents btree for associated
> inode keys. I suspect the buffered write path is just not updating the
> tree in any way, which means fiemap won't see extents until dirty data
> is flushed out. FWIW, this can also be observed by doing buffered
> overwrites and observing that the underlying block range does not change
> until the file data is flushed.

Correct - the only place dirty data can be found is in the pagecache
(which means the buffered write path doesn't have any other data
structures to update).

Don't know if it'll be useful, but the lseek() (SEEK_DATA, SEEK_HOLE)
also has to do this "check both the extents btree and the pagecache at
the same time" thing - might be worth looking at that to see if any of
that code can be abstracted/reused.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug]: fiemap returns zero extents for all files
  2023-12-06 22:42       ` Kent Overstreet
@ 2023-12-08 12:22         ` Brian Foster
  0 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2023-12-08 12:22 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: Steve Smith, linux-bcachefs

On Wed, Dec 06, 2023 at 05:42:09PM -0500, Kent Overstreet wrote:
> On Wed, Dec 06, 2023 at 02:01:02PM -0500, Brian Foster wrote:
> > On Wed, Dec 06, 2023 at 10:26:14AM +1100, Steve Smith wrote:
> > > Hi Brian,
> > > 
> > > Thanks for the suggestion. Yes, the ioctl call that xfs_io makes is
> > > basically identical to the one the test makes, except it has the SYNC
> > > flag set. Adding this flag (or calling fsync) results in expected
> > > behaviour (mapped_extents > 0).
> > > 
> > 
> > Ah..
> > 
> > > I assume that the bcachefs fiemap impl ignores unwritten data and only
> > > responds with flushed extents? I'm not sure if this is correct; for
> > > comparison, XFS, Btfs, zfs, and ext4, all return the extent map
> > > immediately after write without syncing.
> > > 
> > 
> > Ok, thanks for digging into the difference. I can reproduce what you
> > observe by just removing the sync flag from the xfs_io command.
> 
> Whoops, yes, this is missing.
> 
> > 
> > I know that XFS basically looks up extents in the in-core extent tree,
> > which will include things like delayed allocation (i.e. buffered writes
> > that have not yet been physically allocated and flushed to disk). It
> > looks like bcachefs fiemap just walks the extents btree for associated
> > inode keys. I suspect the buffered write path is just not updating the
> > tree in any way, which means fiemap won't see extents until dirty data
> > is flushed out. FWIW, this can also be observed by doing buffered
> > overwrites and observing that the underlying block range does not change
> > until the file data is flushed.
> 
> Correct - the only place dirty data can be found is in the pagecache
> (which means the buffered write path doesn't have any other data
> structures to update).
> 

Ok, thanks.

> Don't know if it'll be useful, but the lseek() (SEEK_DATA, SEEK_HOLE)
> also has to do this "check both the extents btree and the pagecache at
> the same time" thing - might be worth looking at that to see if any of
> that code can be abstracted/reused.
> 

Yeah that sounds like a decent idea. I'd think we'd just need to merge
data seeks over holes in the btree and then we'd at least have accurate
hole vs. allocated ranges without having to sync.

One thing that still isn't clear is if/how to handle the case of
allocated blocks with dirty pagecache where the latter will COW at
writeback time. I ran a quick test on btrfs out of curiosity and I see
it does show delalloc similar to how XFS does, but also seems to handle
the overwrite case exactly how bcachefs does now. Which is to say the
extent data is subtly inaccurate until writeback (assuming cow mode). I
wonder if it would be preferable here to just treat this as delalloc and
as long as there is pending dirty data, report the extent exactly as if
it were delalloc over a hole...

Brian


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-12-08 12:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-05  5:26 [bug]: fiemap returns zero extents for all files Steve Smith
2023-12-05 14:54 ` Brian Foster
2023-12-05 23:26   ` Steve Smith
2023-12-06 19:01     ` Brian Foster
2023-12-06 22:42       ` Kent Overstreet
2023-12-08 12:22         ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.