All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yarden Maymon <yarden.maymon@volumez.com>
To: dm-devel@redhat.com
Cc: thornber@redhat.com, Yarden Maymon <yarden.maymon@volumez.com>,
	snitzer@kernel.org, agk@redhat.com
Subject: [PATCH] dm-thin: Improve performance of O_SYNC IOs to mapped data
Date: Sun, 29 Oct 2023 18:17:56 +0200	[thread overview]
Message-ID: <20231029161756.27025-1-yarden.maymon@volumez.com> (raw)

Running random write fio benchmarks on dm-thin with mapped data
there is 50% degradation when using O_SYNC.
* dm-thin without O_SYNC - 438k iops
* dm-thin with O_SYNC on mapped data - 204k iops
* directly on the underlying disk with O_SYNC - 451k iops, showing the
  problem is not the disk.

The data is mapped so the same results are expected with O_SYNC.

Currently, all O_SYNC IOs are routed to a slower path (deferred).
This action is taken early in the procedure, prior to assessing other
ongoing IOs or verifying if the IO is already mapped.

Remove the early test, and move O_SYNC to the regular data path.
O_SYNC io to a mapped space, that does not conflict with other inflight
will be remapped and routed to the faster path.
All the other O_SYNC io's behavior is maintained (deferred).

The O_SYNC IO will be deferred if :

* It is not mapped - dm_thin_find_block will return -ENODATA, the cell
  is deferred.

* There is an inflight to the same virtual key - bio_detain will
  add the io to a cell and defer it.

    build_virtual_key(tc->td, block, &key);
    if (bio_detain(tc->pool, &key, bio, &virt_cell))
        return DM_MAPIO_SUBMITTED;

* There is an inflight to the same physical key - bio_detain will
  add the io to a cell and defer it.

    build_data_key(tc->td, result.block, &key);
    if (bio_detain(tc->pool, &key, bio, &data_cell)) {
        cell_defer_no_holder(tc, virt_cell);
        return DM_MAPIO_SUBMITTED;
    }

-----------------------------------------------------

Benchmark results :

The benchmark was done on top of ubuntu's 6.2.0-1008 with commit
450e8dee51aa ("dm bufio: improve concurrent IO performance") backported.

fio params: --bs=4k --direct=1 --iodepth=32 --numjobs=8m --time_based
--runtime=5m.
dm-thin chunksize is 128k and allocation/thin_pool_zero=0 is set.
The results are in IOPs and represented as: avg_iops (max_iops).

Performance test on the underlying nvme device for baseline:
+-------------------+-----------------------+
| randwrite         | 446k (455k)           |
| randwrite sync    | 451k (455k)           |
| randrw 50/50      | 227k/227k (300k/300k) |
| randrw sync 50/50 | 227k/227k (300k/300k) |
| randread          | 773k (866k)           |
| randread sync     | 773k (861k)           |
+-------------------+-----------------------+

dm-thin blkdev with all data allocated (16GiB):
+-------------------+-----------------------+-----------------------+
|                   | Pre Patch             | Post Patch            |
+-------------------+-----------------------+-----------------------+
| randwrite         | 438k (442k)           | 450k (453k)           |
| randwrite sync    | 204k (228k)           | 450k (454k)           |
| randrw 50/50      | 224k/224k (236k/235k) | 225k/225k (234k/234k) |
| randrw sync 50/50 | 191k/191k (199k/197k) | 225k/225k (235k/235k) |
| randread          | 650k (703k)           | 661k (705k)           |
| randread sync     | 659k (705k)           | 661k (707k)           |
+-------------------+-----------------------+-----------------------+
There's a notable enhancement in random write performance with sync
compared to previous results. In the 50/50 sync test, there's also a
boost in random read due to the availability of extra resources for
reading. Furthermore, no other aspects appear to be impacted.

dm-thin blkdev without allocated data with capacity of 1.6TB (to
increase the random chance of hitting a non allocated block):
+-------------------+-------------------------+------------------------+
|                   | Pre Patch               | Post Patch             |
+-------------------+-------------------------+------------------------+
| randwrite         | 116k (253k)             | 112k (240k)            |
| randwrite sync    | 100k (121k)             | 182k (266k)            |
| randrw 50/50      | 66.7k/66.7k (109k/109k) | 67k/67k (109k/109k)    |
| randrw sync 50/50 | 76.9k/76.8k (101k/101k) | 77.6k/77.6k (122k/122k)|
| randread          | 336k (349k)             | 335k (352k)            |
| randread sync     | 334k (351k)             | 336k (348k)            |
+-------------------+-------------------------+------------------------+
In this case, there isn't a marked difference, with the exception of
random write sync, since the unmapped data path has stayed the same. The
boost in random write sync performance can be explained from random IOs
hitting the same space twice within the test (The second time they are
already mapped).

-----------------------------------------------------

Tests:
I have ran thin tests of https://github.com/jthornber/dmtest-python.
I have ran xfstests on top of thin lvm https://github.com/kdave/xfstests

I conducted a manual data integrity test :
* Constructed a layout with nvme target -> dm-thin -> nvme device.
* Using vdbench from an initiator host writing to this remote nvme
  device, using journal to a local drive.
* Initiated a reboot on the media host.
* Verified the data using vdbench once the reboot process finished.

Signed-off-by: Yarden Maymon <yarden.maymon@volumez.com>
---
 drivers/md/dm-thin.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 07c7f9795b10..ecd429260bee 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2743,7 +2743,7 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
 		return DM_MAPIO_SUBMITTED;
 	}
 
-	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) {
+	if (bio_op(bio) == REQ_OP_DISCARD) {
 		thin_defer_bio_with_throttle(tc, bio);
 		return DM_MAPIO_SUBMITTED;
 	}
-- 
2.25.1


             reply	other threads:[~2023-10-30  8:18 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-29 16:17 Yarden Maymon [this message]
2023-11-15 19:18 ` dm-thin: Improve performance of O_SYNC IOs to mapped data Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231029161756.27025-1-yarden.maymon@volumez.com \
    --to=yarden.maymon@volumez.com \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=snitzer@kernel.org \
    --cc=thornber@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.