linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 0/1] fstests: random fixes
@ 2021-09-15 23:42 Darrick J. Wong
  2021-09-15 23:42 ` [PATCH 1/1] common/rc: re-fix detection of device-mapper/persistent memory incompatibility Darrick J. Wong
  2021-09-17  0:48 ` [PATCH 2/1] common/rc: use directio mode for the loop device when possible Darrick J. Wong
  0 siblings, 2 replies; 6+ messages in thread
From: Darrick J. Wong @ 2021-09-15 23:42 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

Hi all,

This week, I only have a single fix to the fsdax support detection code.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=random-fixes

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=random-fixes

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=random-fixes
---
 common/rc |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/1] common/rc: re-fix detection of device-mapper/persistent memory incompatibility
  2021-09-15 23:42 [PATCHSET 0/1] fstests: random fixes Darrick J. Wong
@ 2021-09-15 23:42 ` Darrick J. Wong
  2021-09-16  7:46   ` Christoph Hellwig
  2021-09-17  0:48 ` [PATCH 2/1] common/rc: use directio mode for the loop device when possible Darrick J. Wong
  1 sibling, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2021-09-15 23:42 UTC (permalink / raw)
  To: djwong, guaneryu; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

In commit e05491b3, I tried to resolve false test failures that were a
result of device mapper refusing to change access modes on a block
device that supports the FSDAX access mode.  Unfortunately, I did not
realize that there are two ways that fsdax support can be detected via
sysfs: /sys/block/XXX/queue/dax and /sys/block/XXX/dax/, so I only added
a test for the latter.

As of 5.15-rc1 this doesn't seem to work anymore for some reason.  I
don't know enough about the byzantine world of pmem device driver
initialization, but fsdax mode actually does work even though the
/sys/block/XXX/dax/ path went away.  So clearly we have to detect it
via the other sysfs path.

Fixes: e05491b3 ("common/rc: fix detection of device-mapper/persistent memory incompatibility")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/rc |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)


diff --git a/common/rc b/common/rc
index 154bc2dd..275b1f24 100644
--- a/common/rc
+++ b/common/rc
@@ -1964,6 +1964,20 @@ _require_sane_bdev_flush()
 	fi
 }
 
+# Decide if the scratch filesystem is likely to be mounted in fsdax mode.
+# If there's a dax clause in the mount options we assume the test runner
+# wants us to test DAX; or if the scratch device itself advertises dax mode
+# in sysfs.
+__detect_scratch_fsdax()
+{
+	_normalize_mount_options | egrep -q "dax(=always| |$)" && return 0
+
+	local sysfs="/sys/block/$(_short_dev $SCRATCH_DEV)"
+	test -e "${sysfs}/dax" && return 0
+	test "$(cat "${sysfs}/queue/dax" 2>/dev/null)" = "1" && return 0
+	return 1
+}
+
 # this test requires a specific device mapper target
 _require_dm_target()
 {
@@ -1975,9 +1989,7 @@ _require_dm_target()
 	_require_sane_bdev_flush $SCRATCH_DEV
 	_require_command "$DMSETUP_PROG" dmsetup
 
-	_normalize_mount_options | egrep -q "dax(=always| |$)" || \
-			test -e "/sys/block/$(_short_dev $SCRATCH_DEV)/dax"
-	if [ $? -eq 0 ]; then
+	if __detect_scratch_fsdax; then
 		case $target in
 		stripe|linear|log-writes)
 			;;


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] common/rc: re-fix detection of device-mapper/persistent memory incompatibility
  2021-09-15 23:42 ` [PATCH 1/1] common/rc: re-fix detection of device-mapper/persistent memory incompatibility Darrick J. Wong
@ 2021-09-16  7:46   ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2021-09-16  7:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/1] common/rc: use directio mode for the loop device when possible
  2021-09-15 23:42 [PATCHSET 0/1] fstests: random fixes Darrick J. Wong
  2021-09-15 23:42 ` [PATCH 1/1] common/rc: re-fix detection of device-mapper/persistent memory incompatibility Darrick J. Wong
@ 2021-09-17  0:48 ` Darrick J. Wong
  2021-09-17 20:14   ` riteshh
  1 sibling, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2021-09-17  0:48 UTC (permalink / raw)
  To: guaneryu; +Cc: linux-xfs, fstests, guan, osandov

From: Darrick J. Wong <djwong@kernel.org>

Recently, I've been observing very high runtimes of tests that format a
filesystem atop a loop device and write enough data to fill memory, such
as generic/590 and generic/361.  Logging into the test VMs, I noticed
that the writes to the file on the upper filesystem started fast, but
soon slowed down to about 500KB/s and stayed that way for nearly 20
minutes.  Looking through the D-state processes on the system revealed:

/proc/4350/comm = xfs_io
/proc/4350/stack : [<0>] balance_dirty_pages+0x332/0xda0
[<0>] balance_dirty_pages_ratelimited+0x304/0x400
[<0>] iomap_file_buffered_write+0x1ab/0x260
[<0>] xfs_file_buffered_write+0xba/0x330 [xfs]
[<0>] new_sync_write+0x119/0x1a0
[<0>] vfs_write+0x274/0x310
[<0>] __x64_sys_pwrite64+0x89/0xc0
[<0>] do_syscall_64+0x35/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Here's the xfs_io process performing a buffered write to the file on the
upper filesystem, which at this point has dirtied enough pages to be
ratelimited.

/proc/28/comm = u10:0+flush-8:80
/proc/28/stack : [<0>] blk_mq_get_tag+0x11c/0x280
[<0>] __blk_mq_alloc_request+0xce/0xf0
[<0>] blk_mq_submit_bio+0x139/0x5b0
[<0>] submit_bio_noacct+0x3ba/0x430
[<0>] iomap_submit_ioend+0x4b/0x70
[<0>] xfs_vm_writepages+0x86/0x170 [xfs]
[<0>] do_writepages+0xcc/0x200
[<0>] __writeback_single_inode+0x3d/0x300
[<0>] writeback_sb_inodes+0x207/0x4a0
[<0>] __writeback_inodes_wb+0x4c/0xe0
[<0>] wb_writeback+0x1da/0x2c0
[<0>] wb_workfn+0x2ad/0x4f0
[<0>] process_one_work+0x1e2/0x3d0
[<0>] worker_thread+0x53/0x3c0
[<0>] kthread+0x149/0x170
[<0>] ret_from_fork+0x1f/0x30

This is a flusher thread that has invoked writeback on the upper
filesystem to try to clean memory pages.

/proc/89/comm = u10:7+loop0
/proc/89/stack : [<0>] balance_dirty_pages+0x332/0xda0
[<0>] balance_dirty_pages_ratelimited+0x304/0x400
[<0>] iomap_file_buffered_write+0x1ab/0x260
[<0>] xfs_file_buffered_write+0xba/0x330 [xfs]
[<0>] do_iter_readv_writev+0x14f/0x1a0
[<0>] do_iter_write+0x7b/0x1c0
[<0>] lo_write_bvec+0x62/0x1c0
[<0>] loop_process_work+0x3a4/0xba0
[<0>] process_one_work+0x1e2/0x3d0
[<0>] worker_thread+0x53/0x3c0
[<0>] kthread+0x149/0x170
[<0>] ret_from_fork+0x1f/0x30

Here's the loop device worker handling the writeback IO submitted by the
flusher thread.  Unfortunately, the loop device is using buffered write
mode, which means that /writeback/ is dirtying pages and being throttled
for that.  This is stupid.

Fix this by trying to enable "directio" mode on the loop device, which
delivers two performance benefits: setting directio mode also enables
async io mode, which will allow multiple IOs at once; and using directio
nearly eliminates the chance that writeback will get throttled.

On the author's system with fast storage, this reduces the runtime of
g/590 from 20 minutes to 12 seconds, and g/361 from ~30s to ~3s.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/rc |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/common/rc b/common/rc
index 275b1f24..a174b695 100644
--- a/common/rc
+++ b/common/rc
@@ -3849,6 +3849,14 @@ _create_loop_device()
 {
 	local file=$1 dev
 	dev=`losetup -f --show $file` || _fail "Cannot assign $file to a loop device"
+
+	# Try to enable asynchronous directio mode on the loopback device so
+	# that writeback started by a filesystem mounted on the loop device
+	# won't be throttled by buffered writes to the lower filesystem.  This
+	# is a performance optimization for tests that want to write a lot of
+	# data, so it isn't required to work.
+	test -b "$dev" && losetup --direct-io=on $dev 2> /dev/null
+
 	echo $dev
 }
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/1] common/rc: use directio mode for the loop device when possible
  2021-09-17  0:48 ` [PATCH 2/1] common/rc: use directio mode for the loop device when possible Darrick J. Wong
@ 2021-09-17 20:14   ` riteshh
  2021-09-18  0:49     ` Darrick J. Wong
  0 siblings, 1 reply; 6+ messages in thread
From: riteshh @ 2021-09-17 20:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: guaneryu, linux-xfs, fstests, guan, osandov

On 21/09/16 05:48PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Recently, I've been observing very high runtimes of tests that format a
> filesystem atop a loop device and write enough data to fill memory, such
> as generic/590 and generic/361.  Logging into the test VMs, I noticed
> that the writes to the file on the upper filesystem started fast, but
> soon slowed down to about 500KB/s and stayed that way for nearly 20
> minutes.  Looking through the D-state processes on the system revealed:
>
> /proc/4350/comm = xfs_io
> /proc/4350/stack : [<0>] balance_dirty_pages+0x332/0xda0
> [<0>] balance_dirty_pages_ratelimited+0x304/0x400
> [<0>] iomap_file_buffered_write+0x1ab/0x260
> [<0>] xfs_file_buffered_write+0xba/0x330 [xfs]
> [<0>] new_sync_write+0x119/0x1a0
> [<0>] vfs_write+0x274/0x310
> [<0>] __x64_sys_pwrite64+0x89/0xc0
> [<0>] do_syscall_64+0x35/0x80
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Here's the xfs_io process performing a buffered write to the file on the
> upper filesystem, which at this point has dirtied enough pages to be
> ratelimited.
>
> /proc/28/comm = u10:0+flush-8:80
> /proc/28/stack : [<0>] blk_mq_get_tag+0x11c/0x280
> [<0>] __blk_mq_alloc_request+0xce/0xf0
> [<0>] blk_mq_submit_bio+0x139/0x5b0
> [<0>] submit_bio_noacct+0x3ba/0x430
> [<0>] iomap_submit_ioend+0x4b/0x70
> [<0>] xfs_vm_writepages+0x86/0x170 [xfs]
> [<0>] do_writepages+0xcc/0x200
> [<0>] __writeback_single_inode+0x3d/0x300
> [<0>] writeback_sb_inodes+0x207/0x4a0
> [<0>] __writeback_inodes_wb+0x4c/0xe0
> [<0>] wb_writeback+0x1da/0x2c0
> [<0>] wb_workfn+0x2ad/0x4f0
> [<0>] process_one_work+0x1e2/0x3d0
> [<0>] worker_thread+0x53/0x3c0
> [<0>] kthread+0x149/0x170
> [<0>] ret_from_fork+0x1f/0x30
>
> This is a flusher thread that has invoked writeback on the upper
> filesystem to try to clean memory pages.
>
> /proc/89/comm = u10:7+loop0
> /proc/89/stack : [<0>] balance_dirty_pages+0x332/0xda0
> [<0>] balance_dirty_pages_ratelimited+0x304/0x400
> [<0>] iomap_file_buffered_write+0x1ab/0x260
> [<0>] xfs_file_buffered_write+0xba/0x330 [xfs]
> [<0>] do_iter_readv_writev+0x14f/0x1a0
> [<0>] do_iter_write+0x7b/0x1c0
> [<0>] lo_write_bvec+0x62/0x1c0
> [<0>] loop_process_work+0x3a4/0xba0
> [<0>] process_one_work+0x1e2/0x3d0
> [<0>] worker_thread+0x53/0x3c0
> [<0>] kthread+0x149/0x170
> [<0>] ret_from_fork+0x1f/0x30
>
> Here's the loop device worker handling the writeback IO submitted by the
> flusher thread.  Unfortunately, the loop device is using buffered write
> mode, which means that /writeback/ is dirtying pages and being throttled
> for that.  This is stupid.
>
> Fix this by trying to enable "directio" mode on the loop device, which
> delivers two performance benefits: setting directio mode also enables
> async io mode, which will allow multiple IOs at once; and using directio
> nearly eliminates the chance that writeback will get throttled.
>
> On the author's system with fast storage, this reduces the runtime of
> g/590 from 20 minutes to 12 seconds, and g/361 from ~30s to ~3s.
>

Above observations looked interesting to me. I then happen to test it on
my setup and also observe such behavior. But I really don't see the timings
improvements as such. Both before and after the test g/590 took 90-95secs.
I did verify that after this patch applied, I was mostly seeing DIO path for
writes to ext4 FS.

Note: I had ext4 as my root filesystem where fs images were kept for testfs and
scratchfs.

<pasting call stack observed>
PID: 6010   TASK: c000000030833380  CPU: 18  COMMAND: "kworker/u64:10"
 #0 [c00000002a787480] __schedule at c00000000139ba50
 #1 [c00000002a787550] schedule at c00000000139bcc4
 #2 [c00000002a787580] schedule_timeout at c0000000013a42bc
 #3 [c00000002a787650] io_schedule_timeout at c00000000139abe8
 #4 [c00000002a787680] balance_dirty_pages_ratelimited at c0000000004b01c0
 #5 [c00000002a787810] generic_perform_write at c0000000004965bc
 #6 [c00000002a787900] ext4_buffered_write_iter at c00000000077f3e4
 #7 [c00000002a787950] do_iter_readv_writev at c0000000005ef3a0
 #8 [c00000002a7879c0] do_iter_write at c0000000005f2568
 #9 [c00000002a787a10] lo_write_bvec at c000000000e506e8
#10 [c00000002a787ad0] loop_process_work at c000000000e523c4
#11 [c00000002a787c40] process_one_work at c00000000020e8d0
#12 [c00000002a787d10] worker_thread at c00000000020ef7c
#13 [c00000002a787da0] kthread at c00000000021cb74
#14 [c00000002a787e10] ret_from_kernel_thread at c00000000000cfd4

PID: 20175  TASK: c000000031de9980  CPU: 23  COMMAND: "kworker/u64:8"
 #0 [c000000023967360] __schedule at c00000000139ba50
 #1 [c000000023967430] schedule at c00000000139bcc4
 #2 [c000000023967460] schedule_timeout at c0000000013a42bc
 #3 [c000000023967530] io_schedule_timeout at c00000000139abe8
 #4 [c000000023967560] balance_dirty_pages_ratelimited at c0000000004b01c0
 #5 [c0000000239676f0] iomap_file_buffered_write at c0000000006e3018
 #6 [c0000000239678a0] xfs_file_buffered_write at c0000000009d9d20
 #7 [c000000023967950] do_iter_readv_writev at c0000000005ef3a0
 #8 [c0000000239679c0] do_iter_write at c0000000005f2568
 #9 [c000000023967a10] lo_write_bvec at c000000000e506e8
#10 [c000000023967ad0] loop_process_work at c000000000e523c4
#11 [c000000023967c40] process_one_work at c00000000020e8d0
#12 [c000000023967d10] worker_thread at c00000000020ef7c
#13 [c000000023967da0] kthread at c00000000021cb74
#14 [c000000023967e10] ret_from_kernel_thread at c00000000000cfd4

PID: 23482  TASK: c000000008e3cc80  CPU: 1   COMMAND: "kworker/u64:3"
 #0 [c0000000297c7160] __schedule at c00000000139ba50
 #1 [c0000000297c7230] schedule at c00000000139bcc4
 #2 [c0000000297c7260] io_schedule at c00000000139bdd4
 #3 [c0000000297c7290] blk_mq_get_tag at c000000000c4483c
 #4 [c0000000297c7320] __blk_mq_alloc_request at c000000000c3b038
 #5 [c0000000297c7360] blk_mq_submit_bio at c000000000c403dc
 #6 [c0000000297c7410] submit_bio_noacct at c000000000c2cc38
 #7 [c0000000297c74c0] iomap_do_writepage at c0000000006e2308
 #8 [c0000000297c75a0] write_cache_pages at c0000000004b0fb4
 #9 [c0000000297c76e0] iomap_writepages at c0000000006df2bc
#10 [c0000000297c7710] xfs_vm_writepages at c0000000009c2f2c
#11 [c0000000297c77d0] do_writepages at c0000000004b28e0
#12 [c0000000297c7890] __writeback_single_inode at c00000000064ea4c
#13 [c0000000297c7900] writeback_sb_inodes at c00000000064f954
#14 [c0000000297c79f0] __writeback_inodes_wb at c00000000064fdc4
#15 [c0000000297c7a50] wb_writeback at c0000000006500fc
#16 [c0000000297c7b00] wb_workfn at c000000000651908
#17 [c0000000297c7c40] process_one_work at c00000000020e8d0
#18 [c0000000297c7d10] worker_thread at c00000000020ef7c
#19 [c0000000297c7da0] kthread at c00000000021cb74
#20 [c0000000297c7e10] ret_from_kernel_thread at c00000000000cfd4

PID: 26082  TASK: c000000029740080  CPU: 3   COMMAND: "xfs_io"
 #0 [c000000031937610] __schedule at c00000000139ba50
 #1 [c0000000319376e0] schedule at c00000000139bcc4
 #2 [c000000031937710] schedule_timeout at c0000000013a42bc
 #3 [c0000000319377e0] io_schedule_timeout at c00000000139abe8
 #4 [c000000031937810] balance_dirty_pages_ratelimited at c0000000004b01c0
 #5 [c0000000319379a0] iomap_file_buffered_write at c0000000006e3018
 #6 [c000000031937b50] xfs_file_buffered_write at c0000000009d9d20
 #7 [c000000031937c00] new_sync_write at c0000000005ef194
 #8 [c000000031937ca0] vfs_write at c0000000005f3bb4
 #9 [c000000031937cf0] sys_pwrite64 at c0000000005f3f3c
#10 [c000000031937d50] system_call_exception at c000000000035ba0
#11 [c000000031937e10] system_call_common at c00000000000c764


-ritesh

> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  common/rc |    8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/common/rc b/common/rc
> index 275b1f24..a174b695 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -3849,6 +3849,14 @@ _create_loop_device()
>  {
>  	local file=$1 dev
>  	dev=`losetup -f --show $file` || _fail "Cannot assign $file to a loop device"
> +
> +	# Try to enable asynchronous directio mode on the loopback device so
> +	# that writeback started by a filesystem mounted on the loop device
> +	# won't be throttled by buffered writes to the lower filesystem.  This
> +	# is a performance optimization for tests that want to write a lot of
> +	# data, so it isn't required to work.
> +	test -b "$dev" && losetup --direct-io=on $dev 2> /dev/null
> +
>  	echo $dev
>  }
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/1] common/rc: use directio mode for the loop device when possible
  2021-09-17 20:14   ` riteshh
@ 2021-09-18  0:49     ` Darrick J. Wong
  0 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2021-09-18  0:49 UTC (permalink / raw)
  To: riteshh; +Cc: guaneryu, linux-xfs, fstests, guan, osandov

On Sat, Sep 18, 2021 at 01:44:51AM +0530, riteshh wrote:
> On 21/09/16 05:48PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > Recently, I've been observing very high runtimes of tests that format a
> > filesystem atop a loop device and write enough data to fill memory, such
> > as generic/590 and generic/361.  Logging into the test VMs, I noticed
> > that the writes to the file on the upper filesystem started fast, but
> > soon slowed down to about 500KB/s and stayed that way for nearly 20
> > minutes.  Looking through the D-state processes on the system revealed:
> >
> > /proc/4350/comm = xfs_io
> > /proc/4350/stack : [<0>] balance_dirty_pages+0x332/0xda0
> > [<0>] balance_dirty_pages_ratelimited+0x304/0x400
> > [<0>] iomap_file_buffered_write+0x1ab/0x260
> > [<0>] xfs_file_buffered_write+0xba/0x330 [xfs]
> > [<0>] new_sync_write+0x119/0x1a0
> > [<0>] vfs_write+0x274/0x310
> > [<0>] __x64_sys_pwrite64+0x89/0xc0
> > [<0>] do_syscall_64+0x35/0x80
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > Here's the xfs_io process performing a buffered write to the file on the
> > upper filesystem, which at this point has dirtied enough pages to be
> > ratelimited.
> >
> > /proc/28/comm = u10:0+flush-8:80
> > /proc/28/stack : [<0>] blk_mq_get_tag+0x11c/0x280
> > [<0>] __blk_mq_alloc_request+0xce/0xf0
> > [<0>] blk_mq_submit_bio+0x139/0x5b0
> > [<0>] submit_bio_noacct+0x3ba/0x430
> > [<0>] iomap_submit_ioend+0x4b/0x70
> > [<0>] xfs_vm_writepages+0x86/0x170 [xfs]
> > [<0>] do_writepages+0xcc/0x200
> > [<0>] __writeback_single_inode+0x3d/0x300
> > [<0>] writeback_sb_inodes+0x207/0x4a0
> > [<0>] __writeback_inodes_wb+0x4c/0xe0
> > [<0>] wb_writeback+0x1da/0x2c0
> > [<0>] wb_workfn+0x2ad/0x4f0
> > [<0>] process_one_work+0x1e2/0x3d0
> > [<0>] worker_thread+0x53/0x3c0
> > [<0>] kthread+0x149/0x170
> > [<0>] ret_from_fork+0x1f/0x30
> >
> > This is a flusher thread that has invoked writeback on the upper
> > filesystem to try to clean memory pages.
> >
> > /proc/89/comm = u10:7+loop0
> > /proc/89/stack : [<0>] balance_dirty_pages+0x332/0xda0
> > [<0>] balance_dirty_pages_ratelimited+0x304/0x400
> > [<0>] iomap_file_buffered_write+0x1ab/0x260
> > [<0>] xfs_file_buffered_write+0xba/0x330 [xfs]
> > [<0>] do_iter_readv_writev+0x14f/0x1a0
> > [<0>] do_iter_write+0x7b/0x1c0
> > [<0>] lo_write_bvec+0x62/0x1c0
> > [<0>] loop_process_work+0x3a4/0xba0
> > [<0>] process_one_work+0x1e2/0x3d0
> > [<0>] worker_thread+0x53/0x3c0
> > [<0>] kthread+0x149/0x170
> > [<0>] ret_from_fork+0x1f/0x30
> >
> > Here's the loop device worker handling the writeback IO submitted by the
> > flusher thread.  Unfortunately, the loop device is using buffered write
> > mode, which means that /writeback/ is dirtying pages and being throttled
> > for that.  This is stupid.
> >
> > Fix this by trying to enable "directio" mode on the loop device, which
> > delivers two performance benefits: setting directio mode also enables
> > async io mode, which will allow multiple IOs at once; and using directio
> > nearly eliminates the chance that writeback will get throttled.
> >
> > On the author's system with fast storage, this reduces the runtime of
> > g/590 from 20 minutes to 12 seconds, and g/361 from ~30s to ~3s.
> >
> 
> Above observations looked interesting to me. I then happen to test it on
> my setup and also observe such behavior. But I really don't see the timings
> improvements as such. Both before and after the test g/590 took 90-95secs.
> I did verify that after this patch applied, I was mostly seeing DIO path for
> writes to ext4 FS.

Hm.  I would have thought that we'd see /some/ benefit to ext4 as well,
but I guess it would depend on how much writeback ext4 will push at
once?  i.e. if it issues smallish bios (unlike xfs, which will do
megabytes at once) then that might influence what you see.

--D

> 
> Note: I had ext4 as my root filesystem where fs images were kept for testfs and
> scratchfs.
> 
> <pasting call stack observed>
> PID: 6010   TASK: c000000030833380  CPU: 18  COMMAND: "kworker/u64:10"
>  #0 [c00000002a787480] __schedule at c00000000139ba50
>  #1 [c00000002a787550] schedule at c00000000139bcc4
>  #2 [c00000002a787580] schedule_timeout at c0000000013a42bc
>  #3 [c00000002a787650] io_schedule_timeout at c00000000139abe8
>  #4 [c00000002a787680] balance_dirty_pages_ratelimited at c0000000004b01c0
>  #5 [c00000002a787810] generic_perform_write at c0000000004965bc
>  #6 [c00000002a787900] ext4_buffered_write_iter at c00000000077f3e4
>  #7 [c00000002a787950] do_iter_readv_writev at c0000000005ef3a0
>  #8 [c00000002a7879c0] do_iter_write at c0000000005f2568
>  #9 [c00000002a787a10] lo_write_bvec at c000000000e506e8
> #10 [c00000002a787ad0] loop_process_work at c000000000e523c4
> #11 [c00000002a787c40] process_one_work at c00000000020e8d0
> #12 [c00000002a787d10] worker_thread at c00000000020ef7c
> #13 [c00000002a787da0] kthread at c00000000021cb74
> #14 [c00000002a787e10] ret_from_kernel_thread at c00000000000cfd4
> 
> PID: 20175  TASK: c000000031de9980  CPU: 23  COMMAND: "kworker/u64:8"
>  #0 [c000000023967360] __schedule at c00000000139ba50
>  #1 [c000000023967430] schedule at c00000000139bcc4
>  #2 [c000000023967460] schedule_timeout at c0000000013a42bc
>  #3 [c000000023967530] io_schedule_timeout at c00000000139abe8
>  #4 [c000000023967560] balance_dirty_pages_ratelimited at c0000000004b01c0
>  #5 [c0000000239676f0] iomap_file_buffered_write at c0000000006e3018
>  #6 [c0000000239678a0] xfs_file_buffered_write at c0000000009d9d20
>  #7 [c000000023967950] do_iter_readv_writev at c0000000005ef3a0
>  #8 [c0000000239679c0] do_iter_write at c0000000005f2568
>  #9 [c000000023967a10] lo_write_bvec at c000000000e506e8
> #10 [c000000023967ad0] loop_process_work at c000000000e523c4
> #11 [c000000023967c40] process_one_work at c00000000020e8d0
> #12 [c000000023967d10] worker_thread at c00000000020ef7c
> #13 [c000000023967da0] kthread at c00000000021cb74
> #14 [c000000023967e10] ret_from_kernel_thread at c00000000000cfd4
> 
> PID: 23482  TASK: c000000008e3cc80  CPU: 1   COMMAND: "kworker/u64:3"
>  #0 [c0000000297c7160] __schedule at c00000000139ba50
>  #1 [c0000000297c7230] schedule at c00000000139bcc4
>  #2 [c0000000297c7260] io_schedule at c00000000139bdd4
>  #3 [c0000000297c7290] blk_mq_get_tag at c000000000c4483c
>  #4 [c0000000297c7320] __blk_mq_alloc_request at c000000000c3b038
>  #5 [c0000000297c7360] blk_mq_submit_bio at c000000000c403dc
>  #6 [c0000000297c7410] submit_bio_noacct at c000000000c2cc38
>  #7 [c0000000297c74c0] iomap_do_writepage at c0000000006e2308
>  #8 [c0000000297c75a0] write_cache_pages at c0000000004b0fb4
>  #9 [c0000000297c76e0] iomap_writepages at c0000000006df2bc
> #10 [c0000000297c7710] xfs_vm_writepages at c0000000009c2f2c
> #11 [c0000000297c77d0] do_writepages at c0000000004b28e0
> #12 [c0000000297c7890] __writeback_single_inode at c00000000064ea4c
> #13 [c0000000297c7900] writeback_sb_inodes at c00000000064f954
> #14 [c0000000297c79f0] __writeback_inodes_wb at c00000000064fdc4
> #15 [c0000000297c7a50] wb_writeback at c0000000006500fc
> #16 [c0000000297c7b00] wb_workfn at c000000000651908
> #17 [c0000000297c7c40] process_one_work at c00000000020e8d0
> #18 [c0000000297c7d10] worker_thread at c00000000020ef7c
> #19 [c0000000297c7da0] kthread at c00000000021cb74
> #20 [c0000000297c7e10] ret_from_kernel_thread at c00000000000cfd4
> 
> PID: 26082  TASK: c000000029740080  CPU: 3   COMMAND: "xfs_io"
>  #0 [c000000031937610] __schedule at c00000000139ba50
>  #1 [c0000000319376e0] schedule at c00000000139bcc4
>  #2 [c000000031937710] schedule_timeout at c0000000013a42bc
>  #3 [c0000000319377e0] io_schedule_timeout at c00000000139abe8
>  #4 [c000000031937810] balance_dirty_pages_ratelimited at c0000000004b01c0
>  #5 [c0000000319379a0] iomap_file_buffered_write at c0000000006e3018
>  #6 [c000000031937b50] xfs_file_buffered_write at c0000000009d9d20
>  #7 [c000000031937c00] new_sync_write at c0000000005ef194
>  #8 [c000000031937ca0] vfs_write at c0000000005f3bb4
>  #9 [c000000031937cf0] sys_pwrite64 at c0000000005f3f3c
> #10 [c000000031937d50] system_call_exception at c000000000035ba0
> #11 [c000000031937e10] system_call_common at c00000000000c764
> 
> 
> -ritesh
> 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  common/rc |    8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/common/rc b/common/rc
> > index 275b1f24..a174b695 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -3849,6 +3849,14 @@ _create_loop_device()
> >  {
> >  	local file=$1 dev
> >  	dev=`losetup -f --show $file` || _fail "Cannot assign $file to a loop device"
> > +
> > +	# Try to enable asynchronous directio mode on the loopback device so
> > +	# that writeback started by a filesystem mounted on the loop device
> > +	# won't be throttled by buffered writes to the lower filesystem.  This
> > +	# is a performance optimization for tests that want to write a lot of
> > +	# data, so it isn't required to work.
> > +	test -b "$dev" && losetup --direct-io=on $dev 2> /dev/null
> > +
> >  	echo $dev
> >  }
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-18  0:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-15 23:42 [PATCHSET 0/1] fstests: random fixes Darrick J. Wong
2021-09-15 23:42 ` [PATCH 1/1] common/rc: re-fix detection of device-mapper/persistent memory incompatibility Darrick J. Wong
2021-09-16  7:46   ` Christoph Hellwig
2021-09-17  0:48 ` [PATCH 2/1] common/rc: use directio mode for the loop device when possible Darrick J. Wong
2021-09-17 20:14   ` riteshh
2021-09-18  0:49     ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).