linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] btrfs: populate extent_map::generation when reading from disk
@ 2022-02-08  5:31 Qu Wenruo
  2022-02-08 16:39 ` Filipe Manana
  2022-02-08 18:42 ` David Sterba
  0 siblings, 2 replies; 3+ messages in thread
From: Qu Wenruo @ 2022-02-08  5:31 UTC (permalink / raw)
  To: linux-btrfs

[WEIRD BEHAVIOR]

When btrfs_get_extent() tries to get some file extent from disk, it
never populates extent_map::generation , leaving the value to be 0.

On the other hand, for extent map generated by IO, it will get its
generation properly set at finish_ordered_io()

 finish_ordered_io()
 |- unpin_extent_cache(gen = trans->transid)
    |- em->generation = gen;

[CAUSE]
Since extent_map::generation is mostly used by fsync code, and for fsync
they only care about modified extents, which all have their em::generation > 0.

Thus it's fine to not populate em read from disk for fsync.

[CORNER CASE]
However autodefrag also relies on em::geneartion to determine if one extent
needs to be defragged.

This unpopulated extent_map::geneartion can prevent the following autodefrag
case from working:

	mkfs.btrfs -f $dev
	mount $dev $mnt -o autodefrag

	# initial write to queue the inode for autodefrag
	xfs_io -f -c "pwrite 0 4k" $mnt/file
	sync

	# Real fragmented write
	xfs_io -f -s -c "pwrite -b 4096 0 32k" $mnt/file
	sync
	echo "=== before autodefrag ==="
	xfs_io -c "fiemap -v" $mnt/file

	# Drop cache to force em to be read from disk
	echo 3 > /proc/sys/vm/drop_caches
	mount -o remount,commit=1 $mnt
	sleep 3
	sync

	echo "=== After autodefrag ==="
	xfs_io -c "fiemap -v" $mnt/file
	umount $mnt

The result looks like this:

  === before autodefrag ===
  /mnt/btrfs/file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..15]:         26672..26687        16   0x0
     1: [16..31]:        26656..26671        16   0x0
     2: [32..47]:        26640..26655        16   0x0
     3: [48..63]:        26624..26639        16   0x1
  === After autodefrag ===
  /mnt/btrfs/file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..15]:         26672..26687        16   0x0
     1: [16..31]:        26656..26671        16   0x0
     2: [32..47]:        26640..26655        16   0x0
     3: [48..63]:        26624..26639        16   0x1

This fragmented 32K will not be defragged by autodefrag.

[FIX]
To make things less weird, just populate extent_map::generation when
reading file extents from disk.

This would make above fragmented extents to be properly defragged:

  == before autodefrag ===
  /mnt/btrfs/file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..15]:         26672..26687        16   0x0
     1: [16..31]:        26656..26671        16   0x0
     2: [32..47]:        26640..26655        16   0x0
     3: [48..63]:        26624..26639        16   0x1
  === After autodefrag ===
  /mnt/btrfs/file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..63]:         26688..26751        64   0x1

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- Update the commit message to include a reproducer
  Although this is not what we want (to reduce autodefrag IO),
  the behavior still worthy fixing anyway.
---
 fs/btrfs/file-item.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 90c5c38836ab..9a3de652ada8 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -1211,6 +1211,7 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 	extent_start = key.offset;
 	extent_end = btrfs_file_extent_end(path);
 	em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
+	em->generation = btrfs_file_extent_generation(leaf, fi);
 	if (type == BTRFS_FILE_EXTENT_REG ||
 	    type == BTRFS_FILE_EXTENT_PREALLOC) {
 		em->start = extent_start;
-- 
2.35.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] btrfs: populate extent_map::generation when reading from disk
  2022-02-08  5:31 [PATCH v2] btrfs: populate extent_map::generation when reading from disk Qu Wenruo
@ 2022-02-08 16:39 ` Filipe Manana
  2022-02-08 18:42 ` David Sterba
  1 sibling, 0 replies; 3+ messages in thread
From: Filipe Manana @ 2022-02-08 16:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Feb 08, 2022 at 01:31:19PM +0800, Qu Wenruo wrote:
> [WEIRD BEHAVIOR]
> 
> When btrfs_get_extent() tries to get some file extent from disk, it
> never populates extent_map::generation , leaving the value to be 0.
> 
> On the other hand, for extent map generated by IO, it will get its
> generation properly set at finish_ordered_io()
> 
>  finish_ordered_io()
>  |- unpin_extent_cache(gen = trans->transid)
>     |- em->generation = gen;
> 
> [CAUSE]
> Since extent_map::generation is mostly used by fsync code, and for fsync
> they only care about modified extents, which all have their em::generation > 0.
> 
> Thus it's fine to not populate em read from disk for fsync.
> 
> [CORNER CASE]
> However autodefrag also relies on em::geneartion to determine if one extent
> needs to be defragged.

em::geneartion -> em::generation

> 
> This unpopulated extent_map::geneartion can prevent the following autodefrag
> case from working:

Same here.

> 
> 	mkfs.btrfs -f $dev
> 	mount $dev $mnt -o autodefrag
> 
> 	# initial write to queue the inode for autodefrag
> 	xfs_io -f -c "pwrite 0 4k" $mnt/file
> 	sync
> 
> 	# Real fragmented write
> 	xfs_io -f -s -c "pwrite -b 4096 0 32k" $mnt/file
> 	sync
> 	echo "=== before autodefrag ==="
> 	xfs_io -c "fiemap -v" $mnt/file
> 
> 	# Drop cache to force em to be read from disk
> 	echo 3 > /proc/sys/vm/drop_caches
> 	mount -o remount,commit=1 $mnt
> 	sleep 3
> 	sync
> 
> 	echo "=== After autodefrag ==="
> 	xfs_io -c "fiemap -v" $mnt/file
> 	umount $mnt
> 
> The result looks like this:
> 
>   === before autodefrag ===
>   /mnt/btrfs/file:
>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>      0: [0..15]:         26672..26687        16   0x0
>      1: [16..31]:        26656..26671        16   0x0
>      2: [32..47]:        26640..26655        16   0x0
>      3: [48..63]:        26624..26639        16   0x1
>   === After autodefrag ===
>   /mnt/btrfs/file:
>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>      0: [0..15]:         26672..26687        16   0x0
>      1: [16..31]:        26656..26671        16   0x0
>      2: [32..47]:        26640..26655        16   0x0
>      3: [48..63]:        26624..26639        16   0x1
> 
> This fragmented 32K will not be defragged by autodefrag.
> 
> [FIX]
> To make things less weird, just populate extent_map::generation when
> reading file extents from disk.
> 
> This would make above fragmented extents to be properly defragged:
> 
>   == before autodefrag ===
>   /mnt/btrfs/file:
>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>      0: [0..15]:         26672..26687        16   0x0
>      1: [16..31]:        26656..26671        16   0x0
>      2: [32..47]:        26640..26655        16   0x0
>      3: [48..63]:        26624..26639        16   0x1
>   === After autodefrag ===
>   /mnt/btrfs/file:
>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>      0: [0..63]:         26688..26751        64   0x1
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

I don't want to make you send yet another version because only of
a typo in the changelog, so:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> ---
> Changelog:
> v2:
> - Update the commit message to include a reproducer
>   Although this is not what we want (to reduce autodefrag IO),
>   the behavior still worthy fixing anyway.
> ---
>  fs/btrfs/file-item.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index 90c5c38836ab..9a3de652ada8 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -1211,6 +1211,7 @@ void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
>  	extent_start = key.offset;
>  	extent_end = btrfs_file_extent_end(path);
>  	em->ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
> +	em->generation = btrfs_file_extent_generation(leaf, fi);
>  	if (type == BTRFS_FILE_EXTENT_REG ||
>  	    type == BTRFS_FILE_EXTENT_PREALLOC) {
>  		em->start = extent_start;
> -- 
> 2.35.0
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] btrfs: populate extent_map::generation when reading from disk
  2022-02-08  5:31 [PATCH v2] btrfs: populate extent_map::generation when reading from disk Qu Wenruo
  2022-02-08 16:39 ` Filipe Manana
@ 2022-02-08 18:42 ` David Sterba
  1 sibling, 0 replies; 3+ messages in thread
From: David Sterba @ 2022-02-08 18:42 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Feb 08, 2022 at 01:31:19PM +0800, Qu Wenruo wrote:
> [WEIRD BEHAVIOR]
> 
> When btrfs_get_extent() tries to get some file extent from disk, it
> never populates extent_map::generation , leaving the value to be 0.
> 
> On the other hand, for extent map generated by IO, it will get its
> generation properly set at finish_ordered_io()
> 
>  finish_ordered_io()
>  |- unpin_extent_cache(gen = trans->transid)
>     |- em->generation = gen;
> 
> [CAUSE]
> Since extent_map::generation is mostly used by fsync code, and for fsync
> they only care about modified extents, which all have their em::generation > 0.
> 
> Thus it's fine to not populate em read from disk for fsync.
> 
> [CORNER CASE]
> However autodefrag also relies on em::geneartion to determine if one extent
> needs to be defragged.
> 
> This unpopulated extent_map::geneartion can prevent the following autodefrag
> case from working:
> 
> 	mkfs.btrfs -f $dev
> 	mount $dev $mnt -o autodefrag
> 
> 	# initial write to queue the inode for autodefrag
> 	xfs_io -f -c "pwrite 0 4k" $mnt/file
> 	sync
> 
> 	# Real fragmented write
> 	xfs_io -f -s -c "pwrite -b 4096 0 32k" $mnt/file
> 	sync
> 	echo "=== before autodefrag ==="
> 	xfs_io -c "fiemap -v" $mnt/file
> 
> 	# Drop cache to force em to be read from disk
> 	echo 3 > /proc/sys/vm/drop_caches
> 	mount -o remount,commit=1 $mnt
> 	sleep 3
> 	sync
> 
> 	echo "=== After autodefrag ==="
> 	xfs_io -c "fiemap -v" $mnt/file
> 	umount $mnt
> 
> The result looks like this:
> 
>   === before autodefrag ===
>   /mnt/btrfs/file:
>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>      0: [0..15]:         26672..26687        16   0x0
>      1: [16..31]:        26656..26671        16   0x0
>      2: [32..47]:        26640..26655        16   0x0
>      3: [48..63]:        26624..26639        16   0x1
>   === After autodefrag ===
>   /mnt/btrfs/file:
>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>      0: [0..15]:         26672..26687        16   0x0
>      1: [16..31]:        26656..26671        16   0x0
>      2: [32..47]:        26640..26655        16   0x0
>      3: [48..63]:        26624..26639        16   0x1
> 
> This fragmented 32K will not be defragged by autodefrag.
> 
> [FIX]
> To make things less weird, just populate extent_map::generation when
> reading file extents from disk.
> 
> This would make above fragmented extents to be properly defragged:
> 
>   == before autodefrag ===
>   /mnt/btrfs/file:
>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>      0: [0..15]:         26672..26687        16   0x0
>      1: [16..31]:        26656..26671        16   0x0
>      2: [32..47]:        26640..26655        16   0x0
>      3: [48..63]:        26624..26639        16   0x1
>   === After autodefrag ===
>   /mnt/btrfs/file:
>    EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>      0: [0..63]:         26688..26751        64   0x1
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v2:
> - Update the commit message to include a reproducer
>   Although this is not what we want (to reduce autodefrag IO),
>   the behavior still worthy fixing anyway.

With the typos fixed, added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-02-08 18:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-08  5:31 [PATCH v2] btrfs: populate extent_map::generation when reading from disk Qu Wenruo
2022-02-08 16:39 ` Filipe Manana
2022-02-08 18:42 ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).