All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] [PATCH] dm: dm-mpath: Provide high-resolution timer to HST with bio-mpath
@ 2022-04-27 16:57 Gabriel Krisman Bertazi
  2022-05-09 19:47 ` [dm-devel] " Mike Snitzer
  0 siblings, 1 reply; 4+ messages in thread
From: Gabriel Krisman Bertazi @ 2022-04-27 16:57 UTC (permalink / raw)
  To: snitzer; +Cc: dm-devel, kernel, Gabriel Krisman Bertazi, khazhy

The precision loss of reading IO start_time with jiffies_to_nsecs
instead of using a high resolution timer degrades HST path prediction
for BIO-based mpath on high load workloads.

Below, I show the utilization percentage of a 10 disk multipath with
asymmetrical disk access cost, while being exercised by a randwrite FIO
benchmark with high submission queue depth (depth=64).  It is possible
to see that the HST path selection degrades heavily for high-iops in
BIO-mpath, underutilizing the slower paths way beyond expected.  This
seems to be caused by the start_time truncation, which makes some IO to
seem much slower than they actually is.  In this scenario ST outperforms
HST for bio-mpath, but not for mq-mpath, which already uses ktime_get_ns().

The third column shows utilization with this patch applied.  It is easy
to see that now HST prediction is much closer to the ideal distribution
(calculated considering the real cost of each path).

|     |   ST | HST (orig) | HST(ktime) | Best |
| sdd | 0.17 |       0.20 |       0.17 | 0.18 |
| sde | 0.17 |       0.20 |       0.17 | 0.18 |
| sdf | 0.17 |       0.20 |       0.17 | 0.18 |
| sdg | 0.06 |       0.00 |       0.06 | 0.04 |
| sdh | 0.03 |       0.00 |       0.03 | 0.02 |
| sdi | 0.03 |       0.00 |       0.03 | 0.02 |
| sdj | 0.02 |       0.00 |       0.01 | 0.01 |
| sdk | 0.02 |       0.00 |       0.01 | 0.01 |
| sdl | 0.17 |       0.20 |       0.17 | 0.18 |
| sdm | 0.17 |       0.20 |       0.17 | 0.18 |

This issue was originally discussed [1] when we first merged HST, and
this patch was left as a low hanging fruit to be solved later.  I don't
think anyone is using HST with BIO mpath, but it'd be neat to get it
sorted out.

Regarding the implementation, as suggested by Mike in that mail thread,
in order to avoid the overhead of ktime_get_ns for other selectors, this
patch adds a flag for the selector code to request the high-resolution
timer.

I tested this using the same benchmark used in the original HST submission.

Full test and benchmark scripts are available here:

  https://people.collabora.com/~krisman/HST-BIO-MPATH/

[1] https://lore.kernel.org/lkml/85tv0am9de.fsf@collabora.com/T/

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Acked-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 drivers/md/dm-mpath.c                      | 16 ++++++++++++++--
 drivers/md/dm-path-selector.h              | 13 +++++++++++++
 drivers/md/dm-ps-historical-service-time.c |  1 +
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index f4719b65e5e3..c58cfcd87d04 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -105,6 +105,7 @@ struct multipath {
 struct dm_mpath_io {
 	struct pgpath *pgpath;
 	size_t nr_bytes;
+	u64 start_time_ns;
 };
 
 typedef int (*action_fn) (struct pgpath *pgpath);
@@ -647,6 +648,11 @@ static int __multipath_map_bio(struct multipath *m, struct bio *bio,
 
 	mpio->pgpath = pgpath;
 
+	if (pgpath->pg->ps.type->flags & PS_NEED_HR_TIMER_FL)
+		mpio->start_time_ns = ktime_get_ns();
+	else
+		mpio->start_time_ns = 0;
+
 	bio->bi_status = 0;
 	bio_set_dev(bio, pgpath->path.dev->bdev);
 	bio->bi_opf |= REQ_FAILFAST_TRANSPORT;
@@ -1715,9 +1721,15 @@ static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone,
 	if (pgpath) {
 		struct path_selector *ps = &pgpath->pg->ps;
 
-		if (ps->type->end_io)
+		if (ps->type->end_io) {
+			u64 st_time = mpio->start_time_ns;
+
+			if (!st_time)
+				st_time = dm_start_time_ns_from_clone(clone);
+
 			ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes,
-					 dm_start_time_ns_from_clone(clone));
+					 st_time);
+		}
 	}
 
 	return r;
diff --git a/drivers/md/dm-path-selector.h b/drivers/md/dm-path-selector.h
index c47bc0e20275..46710f364286 100644
--- a/drivers/md/dm-path-selector.h
+++ b/drivers/md/dm-path-selector.h
@@ -26,6 +26,18 @@ struct path_selector {
 	void *context;
 };
 
+/* If a path selector uses this flag, a more precise timer
+ * (ktime_get_ns) is used to account for IO start time in BIO-based
+ * mpath.  This improves performance of some path selectors (i.e. HST),
+ * in exchange of a slightly higher overhead when submiting the
+ * BIO. The extra cost is usually offset by improved path selection for
+ * some benchmarks.
+ *
+ * This has no effect for request-based mpath, since it already uses a
+ * higher precision timer by default.
+ */
+#define PS_NEED_HR_TIMER_FL 0x1
+
 /* Information about a path selector type */
 struct path_selector_type {
 	char *name;
@@ -33,6 +45,7 @@ struct path_selector_type {
 
 	unsigned int table_args;
 	unsigned int info_args;
+	unsigned int flags;
 
 	/*
 	 * Constructs a path selector object, takes custom arguments
diff --git a/drivers/md/dm-ps-historical-service-time.c b/drivers/md/dm-ps-historical-service-time.c
index 82f2a06153dc..d2233cc4ea97 100644
--- a/drivers/md/dm-ps-historical-service-time.c
+++ b/drivers/md/dm-ps-historical-service-time.c
@@ -523,6 +523,7 @@ static int hst_end_io(struct path_selector *ps, struct dm_path *path,
 static struct path_selector_type hst_ps = {
 	.name		= "historical-service-time",
 	.module		= THIS_MODULE,
+	.flags		= PS_NEED_HR_TIMER_FL,
 	.table_args	= 1,
 	.info_args	= 3,
 	.create		= hst_create,
-- 
2.35.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [dm-devel] dm: dm-mpath: Provide high-resolution timer to HST with bio-mpath
  2022-04-27 16:57 [dm-devel] [PATCH] dm: dm-mpath: Provide high-resolution timer to HST with bio-mpath Gabriel Krisman Bertazi
@ 2022-05-09 19:47 ` Mike Snitzer
  2022-05-09 19:48   ` Mike Snitzer
  2022-05-09 20:33   ` Gabriel Krisman Bertazi
  0 siblings, 2 replies; 4+ messages in thread
From: Mike Snitzer @ 2022-05-09 19:47 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: dm-devel, kernel, khazhy

On Wed, Apr 27 2022 at 12:57P -0400,
Gabriel Krisman Bertazi <krisman@collabora.com> wrote:

> The precision loss of reading IO start_time with jiffies_to_nsecs
> instead of using a high resolution timer degrades HST path prediction
> for BIO-based mpath on high load workloads.
> 
> Below, I show the utilization percentage of a 10 disk multipath with
> asymmetrical disk access cost, while being exercised by a randwrite FIO
> benchmark with high submission queue depth (depth=64).  It is possible
> to see that the HST path selection degrades heavily for high-iops in
> BIO-mpath, underutilizing the slower paths way beyond expected.  This
> seems to be caused by the start_time truncation, which makes some IO to
> seem much slower than they actually is.  In this scenario ST outperforms
> HST for bio-mpath, but not for mq-mpath, which already uses ktime_get_ns().
> 
> The third column shows utilization with this patch applied.  It is easy
> to see that now HST prediction is much closer to the ideal distribution
> (calculated considering the real cost of each path).
> 
> |     |   ST | HST (orig) | HST(ktime) | Best |
> | sdd | 0.17 |       0.20 |       0.17 | 0.18 |
> | sde | 0.17 |       0.20 |       0.17 | 0.18 |
> | sdf | 0.17 |       0.20 |       0.17 | 0.18 |
> | sdg | 0.06 |       0.00 |       0.06 | 0.04 |
> | sdh | 0.03 |       0.00 |       0.03 | 0.02 |
> | sdi | 0.03 |       0.00 |       0.03 | 0.02 |
> | sdj | 0.02 |       0.00 |       0.01 | 0.01 |
> | sdk | 0.02 |       0.00 |       0.01 | 0.01 |
> | sdl | 0.17 |       0.20 |       0.17 | 0.18 |
> | sdm | 0.17 |       0.20 |       0.17 | 0.18 |
> 
> This issue was originally discussed [1] when we first merged HST, and
> this patch was left as a low hanging fruit to be solved later.  I don't
> think anyone is using HST with BIO mpath, but it'd be neat to get it
> sorted out.
> 
> Regarding the implementation, as suggested by Mike in that mail thread,
> in order to avoid the overhead of ktime_get_ns for other selectors, this
> patch adds a flag for the selector code to request the high-resolution
> timer.
> 
> I tested this using the same benchmark used in the original HST submission.
> 
> Full test and benchmark scripts are available here:
> 
>   https://people.collabora.com/~krisman/HST-BIO-MPATH/
> 
> [1] https://lore.kernel.org/lkml/85tv0am9de.fsf@collabora.com/T/
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> Acked-by: Gabriel Krisman Bertazi <krisman@collabora.com>

Overall your code was OK, but I nudged it a bit further to be
inkeeping with how 'features' flags have been implemented elsewhere
(e.g. dm_target_type's features) -- by using a healer to test the
flag, etc.

I also tweaked some other small implementation details.  Please see:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.19&id=c06dfd124d46df9c482fbd1319b5fe19bcb1a110

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dm-devel] dm: dm-mpath: Provide high-resolution timer to HST with bio-mpath
  2022-05-09 19:47 ` [dm-devel] " Mike Snitzer
@ 2022-05-09 19:48   ` Mike Snitzer
  2022-05-09 20:33   ` Gabriel Krisman Bertazi
  1 sibling, 0 replies; 4+ messages in thread
From: Mike Snitzer @ 2022-05-09 19:48 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: dm-devel, kernel, khazhy

On Mon, May 09 2022 at  3:47P -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Wed, Apr 27 2022 at 12:57P -0400,
> Gabriel Krisman Bertazi <krisman@collabora.com> wrote:
> 
> > The precision loss of reading IO start_time with jiffies_to_nsecs
> > instead of using a high resolution timer degrades HST path prediction
> > for BIO-based mpath on high load workloads.
> > 
> > Below, I show the utilization percentage of a 10 disk multipath with
> > asymmetrical disk access cost, while being exercised by a randwrite FIO
> > benchmark with high submission queue depth (depth=64).  It is possible
> > to see that the HST path selection degrades heavily for high-iops in
> > BIO-mpath, underutilizing the slower paths way beyond expected.  This
> > seems to be caused by the start_time truncation, which makes some IO to
> > seem much slower than they actually is.  In this scenario ST outperforms
> > HST for bio-mpath, but not for mq-mpath, which already uses ktime_get_ns().
> > 
> > The third column shows utilization with this patch applied.  It is easy
> > to see that now HST prediction is much closer to the ideal distribution
> > (calculated considering the real cost of each path).
> > 
> > |     |   ST | HST (orig) | HST(ktime) | Best |
> > | sdd | 0.17 |       0.20 |       0.17 | 0.18 |
> > | sde | 0.17 |       0.20 |       0.17 | 0.18 |
> > | sdf | 0.17 |       0.20 |       0.17 | 0.18 |
> > | sdg | 0.06 |       0.00 |       0.06 | 0.04 |
> > | sdh | 0.03 |       0.00 |       0.03 | 0.02 |
> > | sdi | 0.03 |       0.00 |       0.03 | 0.02 |
> > | sdj | 0.02 |       0.00 |       0.01 | 0.01 |
> > | sdk | 0.02 |       0.00 |       0.01 | 0.01 |
> > | sdl | 0.17 |       0.20 |       0.17 | 0.18 |
> > | sdm | 0.17 |       0.20 |       0.17 | 0.18 |
> > 
> > This issue was originally discussed [1] when we first merged HST, and
> > this patch was left as a low hanging fruit to be solved later.  I don't
> > think anyone is using HST with BIO mpath, but it'd be neat to get it
> > sorted out.
> > 
> > Regarding the implementation, as suggested by Mike in that mail thread,
> > in order to avoid the overhead of ktime_get_ns for other selectors, this
> > patch adds a flag for the selector code to request the high-resolution
> > timer.
> > 
> > I tested this using the same benchmark used in the original HST submission.
> > 
> > Full test and benchmark scripts are available here:
> > 
> >   https://people.collabora.com/~krisman/HST-BIO-MPATH/
> > 
> > [1] https://lore.kernel.org/lkml/85tv0am9de.fsf@collabora.com/T/
> > 
> > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> > Acked-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> 
> Overall your code was OK, but I nudged it a bit further to be
> inkeeping with how 'features' flags have been implemented elsewhere
> (e.g. dm_target_type's features) -- by using a healer to test the
> flag, etc.

s/healer/helper/ ;)

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dm-devel] dm: dm-mpath: Provide high-resolution timer to HST with bio-mpath
  2022-05-09 19:47 ` [dm-devel] " Mike Snitzer
  2022-05-09 19:48   ` Mike Snitzer
@ 2022-05-09 20:33   ` Gabriel Krisman Bertazi
  1 sibling, 0 replies; 4+ messages in thread
From: Gabriel Krisman Bertazi @ 2022-05-09 20:33 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, kernel, khazhy

Mike Snitzer <snitzer@redhat.com> writes:

> Overall your code was OK, but I nudged it a bit further to be
> inkeeping with how 'features' flags have been implemented elsewhere
> (e.g. dm_target_type's features) -- by using a healer to test the
> flag, etc.
>
> I also tweaked some other small implementation details.  Please see:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.19&id=c06dfd124d46df9c482fbd1319b5fe19bcb1a110

looks good to me.  thanks for the fixups.

-- 
Gabriel Krisman Bertazi

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-09 20:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-27 16:57 [dm-devel] [PATCH] dm: dm-mpath: Provide high-resolution timer to HST with bio-mpath Gabriel Krisman Bertazi
2022-05-09 19:47 ` [dm-devel] " Mike Snitzer
2022-05-09 19:48   ` Mike Snitzer
2022-05-09 20:33   ` Gabriel Krisman Bertazi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.