ntb.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] ntb_perf: add new 'latency' test set
@ 2022-05-13 19:37 Alexander Fomichev
  2022-05-13 19:37 ` [PATCH v3 1/3] ntb_perf: extend with burst latency measurement Alexander Fomichev
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Alexander Fomichev @ 2022-05-13 19:37 UTC (permalink / raw)
  To: ntb
  Cc: linux, Jon Mason, Dave Jiang, Allen Hubbe, Guo Zhengkui,
	fancer.lancer, Alexander Fomichev

From: Alexander Fomichev <a.fomichev@yadro.com>

The ntb_perf test provides a tool for NTB hardware performance
evaluation. For software impact elimination the test uses a simple method
(let's call it 'burst' mode), when the local system sends to the remote
system a data set and counts time interval until hardware completion
report, without the remote side confirming, nor data integrity check.
The measured metric is a 'burst' throughput bandwidth of NTB connection.

The patches extend ntb_perf with 3 new metrics:
1) Burst latency
2) Poll latency
3) Doorbell latency

The resulting test set is fully backward compatible.

Alexander Fomichev (3):
  ntb_perf: extend with burst latency measurement
  ntb_perf: extend with poll latency measurement
  ntb_perf: extend with doorbell latency measurement

 drivers/ntb/test/ntb_perf.c | 826 +++++++++++++++++++++++++++++++++++-
 1 file changed, 810 insertions(+), 16 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/3] ntb_perf: extend with burst latency measurement
  2022-05-13 19:37 [PATCH v3 0/3] ntb_perf: add new 'latency' test set Alexander Fomichev
@ 2022-05-13 19:37 ` Alexander Fomichev
  2022-05-13 19:37 ` [PATCH v3 2/3] ntb_perf: extend with poll " Alexander Fomichev
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Alexander Fomichev @ 2022-05-13 19:37 UTC (permalink / raw)
  To: ntb
  Cc: linux, Jon Mason, Dave Jiang, Allen Hubbe, Guo Zhengkui,
	fancer.lancer, Alexander Fomichev

From: Alexander Fomichev <a.fomichev@yadro.com>

Burst latency is a delay between start to send 1 byte to the
remote system and hardware readiness to send another byte. The
measurement performed within bandwidth test procedure. The DMA Engine is
off. Data integrity is not checked. This mode can be disabled by
'perf_latency=N' module parameter.

Signed-off-by: Alexander Fomichev <a.fomichev@yadro.com>
---
 drivers/ntb/test/ntb_perf.c | 140 ++++++++++++++++++++++++++++++++++--
 1 file changed, 136 insertions(+), 4 deletions(-)

diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index 65e1e5cf1b29..23bde12eaf3d 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -6,6 +6,7 @@
  *
  *   Copyright(c) 2015 Intel Corporation. All rights reserved.
  *   Copyright(c) 2017 T-Platforms. All Rights Reserved.
+ *   Copyright(c) 2022 YADRO. All Rights Reserved.
  *
  *   This program is free software; you can redistribute it and/or modify
  *   it under the terms of version 2 of the GNU General Public License as
@@ -15,6 +16,7 @@
  *
  *   Copyright(c) 2015 Intel Corporation. All rights reserved.
  *   Copyright(c) 2017 T-Platforms. All Rights Reserved.
+ *   Copyright(c) 2022 YADRO. All Rights Reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -85,7 +87,7 @@
 #include <linux/ntb.h>
 
 #define DRIVER_NAME		"ntb_perf"
-#define DRIVER_VERSION		"2.0"
+#define DRIVER_VERSION		"2.1"
 
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_VERSION(DRIVER_VERSION);
@@ -106,6 +108,9 @@ MODULE_DESCRIPTION("PCIe NTB Performance Measurement Tool");
 
 #define PERF_BUF_LEN 1024
 
+#define LAT_MIN_TRIES	20
+#define RESCHEDULE_RATIO	10000
+
 static unsigned long max_mw_size;
 module_param(max_mw_size, ulong, 0644);
 MODULE_PARM_DESC(max_mw_size, "Upper limit of memory window size");
@@ -122,6 +127,14 @@ static bool use_dma; /* default to 0 */
 module_param(use_dma, bool, 0644);
 MODULE_PARM_DESC(use_dma, "Use DMA engine to measure performance");
 
+static bool perf_latency = true;
+module_param(perf_latency, bool, 0644);
+MODULE_PARM_DESC(perf_latency, "Measure burst latency");
+
+static unsigned long lat_time_ms = 1000; /* default 1s */
+module_param(lat_time_ms, ulong, 0644);
+MODULE_PARM_DESC(lat_time_ms, "Time (in ms) to test latency");
+
 /*==============================================================================
  *                         Perf driver data definition
  *==============================================================================
@@ -178,6 +191,8 @@ struct perf_thread {
 	void *src;
 	u64 copied;
 	ktime_t duration;
+	ktime_t latency;
+	u64 tries;
 	int status;
 	struct work_struct work;
 };
@@ -783,7 +798,7 @@ static void perf_dma_copy_callback(void *data)
 }
 
 static int perf_copy_chunk(struct perf_thread *pthr,
-			   void __iomem *dst, void *src, size_t len)
+			   void __iomem *dst, void *src, size_t len, bool _use_dma)
 {
 	struct dma_async_tx_descriptor *tx;
 	struct dmaengine_unmap_data *unmap;
@@ -794,7 +809,7 @@ static int perf_copy_chunk(struct perf_thread *pthr,
 	void __iomem *dst_vaddr;
 	dma_addr_t dst_dma_addr;
 
-	if (!use_dma) {
+	if (!_use_dma) {
 		memcpy_toio(dst, src, len);
 		goto ret_check_tsync;
 	}
@@ -940,7 +955,7 @@ static int perf_run_test(struct perf_thread *pthr)
 
 	/* Copied field is cleared on test launch stage */
 	while (pthr->copied < total_size) {
-		ret = perf_copy_chunk(pthr, flt_dst, flt_src, chunk_size);
+		ret = perf_copy_chunk(pthr, flt_dst, flt_src, chunk_size, use_dma);
 		if (ret) {
 			dev_err(&perf->ntb->dev, "%d: Got error %d on test\n",
 				pthr->tidx, ret);
@@ -1018,6 +1033,67 @@ static void perf_clear_test(struct perf_thread *pthr)
 	kfree(pthr->src);
 }
 
+static int perf_run_latency(struct perf_thread *pthr)
+{
+	struct perf_peer *peer = pthr->perf->test_peer;
+	struct ntb_dev *ntb = pthr->perf->ntb;
+	void __iomem *flt_dst, *bnd_dst;
+	void *flt_src;
+	u64 stop_at;
+	int ret;
+
+	pthr->tries = 0;
+	pthr->latency = ktime_get();
+	flt_src = pthr->src;
+	flt_dst = peer->outbuf;
+	bnd_dst = peer->outbuf + peer->outbuf_size;
+
+	stop_at = ktime_get_real_fast_ns() + lat_time_ms * NSEC_PER_MSEC;
+	while (ktime_get_real_fast_ns() < stop_at) {
+		ret = perf_copy_chunk(pthr, flt_dst, flt_src, 1, false);
+		if (ret) {
+			dev_err(&ntb->dev, "%d: Latency testing error %d\n",
+				pthr->tidx, ret);
+			pthr->latency = ktime_set(0, 0);
+			return ret;
+		}
+
+		pthr->tries++;
+		flt_dst++;
+		flt_src++;
+
+		if (flt_dst >= bnd_dst || flt_dst < peer->outbuf) {
+			flt_dst = peer->outbuf;
+			flt_src = pthr->src;
+		}
+
+		/* Avoid processor soft lock-ups */
+		if (!(pthr->tries % RESCHEDULE_RATIO))
+			schedule();
+	}
+
+	/* Stop timer */
+	pthr->latency = ktime_sub(ktime_get(), pthr->latency);
+
+	if (pthr->tries < LAT_MIN_TRIES) {
+		dev_err(&ntb->dev,
+			"%d: Too few steps (%llu) to measure Latency, recommended > %d. Increase value of 'lat_time_ms' parameter\n",
+			pthr->tidx, pthr->tries, LAT_MIN_TRIES);
+		pthr->latency = ktime_set(0, 0);
+		return -EINVAL;
+	}
+
+	dev_dbg(&ntb->dev, "%d: made %llu tries, lasted %llu usecs\n",
+		pthr->tidx, pthr->tries, ktime_to_us(pthr->latency));
+
+	pthr->latency = ns_to_ktime(ktime_divns(pthr->latency, pthr->tries));
+
+	dev_dbg(&ntb->dev, "%d: latency %llu us (%llu ns)\n", pthr->tidx,
+		ktime_to_us(pthr->latency), ktime_to_ns(pthr->latency));
+
+	return 0;
+}
+
 static void perf_thread_work(struct work_struct *work)
 {
 	struct perf_thread *pthr = to_thread_work(work);
@@ -1043,6 +1119,11 @@ static void perf_thread_work(struct work_struct *work)
 	}
 
 	pthr->status = perf_sync_test(pthr);
+	if (pthr->status)
+		goto err_clear_test;
+
+	if (perf_latency)
+		pthr->status = perf_run_latency(pthr);
 
 err_clear_test:
 	perf_clear_test(pthr);
@@ -1142,6 +1223,18 @@ static int perf_read_stats(struct perf_ctx *perf, char *buf,
 			"%d: copied %llu bytes in %llu usecs, %llu MBytes/s\n",
 			tidx, pthr->copied, ktime_to_us(pthr->duration),
 			div64_u64(pthr->copied, ktime_to_us(pthr->duration)));
+
+		if (perf_latency && ktime_compare(pthr->latency, ktime_set(0, 0))) {
+			if (ktime_to_us(pthr->latency) < 10) {
+				(*pos) += scnprintf(buf + *pos, size - *pos,
+						"%d: latency %llu ns\n",
+						tidx, ktime_to_ns(pthr->latency));
+			} else {
+				(*pos) += scnprintf(buf + *pos, size - *pos,
+						"%d: latency %llu us\n",
+						tidx, ktime_to_us(pthr->latency));
+			}
+		}
 	}
 
 	clear_bit_unlock(0, &perf->busy_flag);
@@ -1344,12 +1437,48 @@ static ssize_t perf_dbgfs_write_tcnt(struct file *filep,
 	return size;
 }
 
+static ssize_t perf_dbgfs_read_lattrs(struct file *filep, char __user *ubuf,
+				    size_t size, loff_t *offp)
+{
+	size_t buf_size = min_t(size_t, size, PERF_BUF_LEN);
+	struct perf_ctx *perf = filep->private_data;
+	ssize_t pos, ret;
+	char *buf;
+	int tidx;
+
+	buf = kmalloc(buf_size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	pos = scnprintf(buf, buf_size, "    Peer %d latency try count:\n",
+			perf->test_peer->pidx);
+
+	for (tidx = 0; tidx < perf->tcnt; tidx++) {
+		struct perf_thread *pthr = &perf->threads[tidx];
+
+		pos += scnprintf(buf + pos, buf_size - pos,
+			"%d: made %llu tries\n",
+			tidx, pthr->tries);
+	}
+
+	ret = simple_read_from_buffer(ubuf, size, offp, buf, pos);
+
+	kfree(buf);
+
+	return ret;
+}
+
 static const struct file_operations perf_dbgfs_tcnt = {
 	.open = simple_open,
 	.read = perf_dbgfs_read_tcnt,
 	.write = perf_dbgfs_write_tcnt
 };
 
+static const struct file_operations perf_dbgfs_lattrs = {
+	.open = simple_open,
+	.read = perf_dbgfs_read_lattrs
+};
+
 static void perf_setup_dbgfs(struct perf_ctx *perf)
 {
 	struct pci_dev *pdev = perf->ntb->pdev;
@@ -1375,6 +1504,9 @@ static void perf_setup_dbgfs(struct perf_ctx *perf)
 	debugfs_create_u8("total_order", 0500, perf->dbgfs_dir, &total_order);
 
 	debugfs_create_bool("use_dma", 0500, perf->dbgfs_dir, &use_dma);
+
+	debugfs_create_file("latency_tries", 0400, perf->dbgfs_dir, perf,
+			    &perf_dbgfs_lattrs);
 }
 
 static void perf_clear_dbgfs(struct perf_ctx *perf)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/3] ntb_perf: extend with poll latency measurement
  2022-05-13 19:37 [PATCH v3 0/3] ntb_perf: add new 'latency' test set Alexander Fomichev
  2022-05-13 19:37 ` [PATCH v3 1/3] ntb_perf: extend with burst latency measurement Alexander Fomichev
@ 2022-05-13 19:37 ` Alexander Fomichev
  2022-05-13 19:37 ` [PATCH v3 3/3] ntb_perf: extend with doorbell " Alexander Fomichev
  2022-05-23 18:38 ` [PATCH v3 0/3] ntb_perf: add new 'latency' test set Dave Jiang
  3 siblings, 0 replies; 12+ messages in thread
From: Alexander Fomichev @ 2022-05-13 19:37 UTC (permalink / raw)
  To: ntb
  Cc: linux, Jon Mason, Dave Jiang, Allen Hubbe, Guo Zhengkui,
	fancer.lancer, Alexander Fomichev

From: Alexander Fomichev <a.fomichev@yadro.com>

Poll latency is a delay between start to send 1 byte to the
remote system and receiving the confirmation. The remote system needs to
be run in server mode beforehand. Then the server polls the input buffer
and on receiving data immediately sends the confirmation back.

Signed-off-by: Alexander Fomichev <a.fomichev@yadro.com>
---
 drivers/ntb/test/ntb_perf.c | 374 +++++++++++++++++++++++++++++++++++-
 1 file changed, 373 insertions(+), 1 deletion(-)

diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index 23bde12eaf3d..f0f3beba70a5 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -67,6 +67,14 @@
  *
  * root@self# echo 0 > $DBG_DIR/run
  * root@self# cat $DBG_DIR/run
+ *-----------------------------------------------------------------------------
+ * Eg: start latency test with peer (index 0) poll-waiting and get the metrics
+ *
+ * Server side:
+ * root@self# echo 0 > $DBG_DIR/poll_latency/run_server
+ * Client side:
+ * root@self# echo 0 > $DBG_DIR/poll_latency/run_client
+ * root@self# cat $DBG_DIR/poll_latency/run_client
  */
 
 #include <linux/init.h>
@@ -87,7 +95,7 @@
 #include <linux/ntb.h>
 
 #define DRIVER_NAME		"ntb_perf"
-#define DRIVER_VERSION		"2.1"
+#define DRIVER_VERSION		"2.2"
 
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_VERSION(DRIVER_VERSION);
@@ -135,6 +143,10 @@ static unsigned long lat_time_ms = 1000; /* default 1s */
 module_param(lat_time_ms, ulong, 0644);
 MODULE_PARM_DESC(lat_time_ms, "Time (in ms) to test latency");
 
+static unsigned long lat_timeout_us = 500;
+module_param(lat_timeout_us, ulong, 0644);
+MODULE_PARM_DESC(lat_timeout_us, "Timeout (in us) to wait for server reply");
+
 /*==============================================================================
  *                         Perf driver data definition
  *==============================================================================
@@ -151,6 +163,11 @@ enum perf_cmd {
 	PERF_STS_LNKUP = 6, /* link up state flag */
 };
 
+enum run_mode {
+	RUN_PL_CLIENT,
+	RUN_PL_SERVER,
+};
+
 struct perf_ctx;
 
 struct perf_peer {
@@ -199,6 +216,21 @@ struct perf_thread {
 #define to_thread_work(__work) \
 	container_of(__work, struct perf_thread, work)
 
+struct perf_poll_lat_data {
+	struct perf_ctx *perf;
+	void *src;
+	ktime_t latency;
+	u64 tries;
+	int status;
+	atomic_t running;
+	struct work_struct clt_work;
+	struct work_struct srv_work;
+};
+#define to_pldata_clt_work(__work) \
+	container_of(__work, struct perf_poll_lat_data, clt_work)
+#define to_pldata_srv_work(__work) \
+	container_of(__work, struct perf_poll_lat_data, srv_work)
+
 struct perf_ctx {
 	struct ntb_dev *ntb;
 
@@ -206,6 +238,7 @@ struct perf_ctx {
 	int gidx;
 	int pcnt;
 	struct perf_peer *peers;
+	struct perf_poll_lat_data pldata;
 
 	/* Performance measuring work-threads interface */
 	unsigned long busy_flag;
@@ -254,6 +287,8 @@ static struct dentry *perf_dbgfs_topdir;
 
 static struct workqueue_struct *perf_wq __read_mostly;
 
+static const u8 stop_word = 0xFF;
+
 /*==============================================================================
  *                  NTB cross-link commands execution service
  *==============================================================================
@@ -1129,6 +1164,185 @@ static void perf_thread_work(struct work_struct *work)
 	perf_clear_test(pthr);
 }
 
+static int perf_init_pl(struct perf_poll_lat_data *pldata)
+{
+	struct perf_ctx *perf = pldata->perf;
+	struct perf_peer *peer = perf->test_peer;
+	u8 *bp;
+
+	pldata->src = kmalloc_node(peer->outbuf_size, GFP_KERNEL,
+				 dev_to_node(&perf->ntb->dev));
+	if (!pldata->src)
+		return -ENOMEM;
+
+	/*
+	 * Prepare random data to send, guaranteed exclusion of 0x00 (unreceived)
+	 * and 0xFF (stop_word)
+	 */
+	get_random_bytes(pldata->src, peer->outbuf_size);
+	for (bp = pldata->src; bp < (u8 *) pldata->src + peer->outbuf_size; bp++)
+		while (*bp == 0 || *bp == stop_word)
+			*bp = (u8)get_random_int();
+
+	memset(peer->inbuf, 0, peer->inbuf_size);
+
+	return 0;
+}
+
+static int perf_poll_peer_reply(volatile u8 *cur)
+{
+	u64 wait_till = ktime_get_real_fast_ns() + lat_timeout_us * NSEC_PER_USEC;
+
+	while (ktime_get_real_fast_ns() < wait_till) {
+		if (*cur == stop_word) {
+			*cur = 0;
+			return 1;
+		}
+		if (*cur != 0) {
+			*cur = 0;
+			return 0;
+		}
+	}
+	return -EINTR;
+}
+
+static int perf_run_pl_client(struct perf_poll_lat_data *pldata)
+{
+	struct perf_peer *peer = pldata->perf->test_peer;
+	struct ntb_dev *ntb = pldata->perf->ntb;
+	void *src = pldata->src;
+	u64 stop_at;
+	int ret;
+
+	dev_dbg(&ntb->dev, "poll_lat: client started.\n");
+
+	pldata->tries = 0;
+	pldata->latency = ktime_get();
+
+	stop_at = ktime_get_real_fast_ns() + lat_time_ms * NSEC_PER_MSEC;
+	while (ktime_get_real_fast_ns() < stop_at) {
+		memcpy_toio(peer->outbuf, src, 1);
+
+		/* Avoid processor soft lock-ups */
+		schedule();
+
+		ret = perf_poll_peer_reply(peer->inbuf);
+		if (ret < 0) {
+			dev_err(&ntb->dev, "Timeout waiting for peer reply on poll latency\n");
+			pldata->latency = ktime_set(0, 0);
+			return -EINTR;
+		} else if (ret == 1) {
+			dev_warn(&ntb->dev, "Server terminated on poll latency, stopping\n");
+			break;
+		} else if (!atomic_read(&pldata->running)) {
+			dev_err(&ntb->dev, "Poll latency client terminated\n");
+			return -EINTR;
+		}
+
+		pldata->tries++;
+		src++;
+
+		if (src >= pldata->src + peer->outbuf_size)
+			src = pldata->src;
+	}
+
+	/* Stop timer */
+	pldata->latency = ktime_sub(ktime_get(), pldata->latency);
+	/* Send stop to peer */
+	memcpy_toio(peer->outbuf, &stop_word, 1);
+
+	if (pldata->tries < LAT_MIN_TRIES) {
+		dev_err(&ntb->dev,
+			"Too few steps (%llu) to measure Latency, recommended > %d. Increase value of 'lat_time_ms' parameter\n",
+			pldata->tries, LAT_MIN_TRIES);
+		pldata->latency = ktime_set(0, 0);
+		return -EINVAL;
+	}
+
+	dev_dbg(&ntb->dev, "poll_lat: made %llu tries, lasted %llu usecs\n",
+		pldata->tries, ktime_to_us(pldata->latency));
+
+	pldata->latency = ns_to_ktime(ktime_divns(pldata->latency, pldata->tries));
+
+	dev_dbg(&ntb->dev, "poll_lat: latency %llu us (%llu ns)\n",
+		ktime_to_us(pldata->latency), ktime_to_ns(pldata->latency));
+
+	return 0;
+}
+
+static int perf_run_pl_server(struct perf_poll_lat_data *pldata)
+{
+	struct perf_peer *peer = pldata->perf->test_peer;
+	struct ntb_dev *ntb = pldata->perf->ntb;
+	void *src = pldata->src;
+	int ret = 0;
+
+	dev_dbg(&ntb->dev, "poll_lat: server started.\n");
+
+	pldata->tries = 0;
+
+	while (ret != 1 && atomic_read(&pldata->running)) {
+		ret = perf_poll_peer_reply(peer->inbuf);
+		if (!ret) {
+			/* Pong to client */
+			memcpy_toio(peer->outbuf, src++, 1);
+			if (src >= pldata->src + peer->outbuf_size)
+				src = pldata->src;
+
+			pldata->tries++;
+		}
+
+		/* Avoid processor soft lock-ups */
+		schedule();
+	}
+
+	if (pldata->tries < LAT_MIN_TRIES)
+		dev_warn(&ntb->dev,
+			"Poll latency test terminated too early. Increase client's test time\n");
+
+	dev_dbg(&ntb->dev, "poll_lat: server stopped, had responded %llu times\n",
+		pldata->tries);
+
+	return atomic_read(&pldata->running) ? -ENODATA : -EINTR;
+}
+
+static void perf_clear_pl(struct perf_poll_lat_data *pldata)
+{
+	struct perf_ctx *perf = pldata->perf;
+	struct perf_peer *peer = perf->test_peer;
+
+	memset(peer->inbuf, stop_word, 1);
+	atomic_set(&pldata->running, 0);
+	wake_up(&perf->twait);
+	kfree(pldata->src);
+}
+
+static void perf_poll_lat_client_work(struct work_struct *work)
+{
+	struct perf_poll_lat_data *pldata = to_pldata_clt_work(work);
+
+	pldata->status = perf_init_pl(pldata);
+	if (pldata->status)
+		return;
+
+	pldata->status = perf_run_pl_client(pldata);
+
+	perf_clear_pl(pldata);
+}
+
+static void perf_poll_lat_server_work(struct work_struct *work)
+{
+	struct perf_poll_lat_data *pldata = to_pldata_srv_work(work);
+
+	pldata->status = perf_init_pl(pldata);
+	if (pldata->status)
+		return;
+
+	pldata->status = perf_run_pl_server(pldata);
+
+	perf_clear_pl(pldata);
+}
+
 static int perf_set_tcnt(struct perf_ctx *perf, u8 tcnt)
 {
 	if (tcnt == 0 || tcnt > MAX_THREADS_CNT)
@@ -1149,7 +1363,10 @@ static void perf_terminate_test(struct perf_ctx *perf)
 	int tidx;
 
 	atomic_set(&perf->tsync, -1);
+	atomic_set(&perf->pldata.running, 0);
 	wake_up(&perf->twait);
+	cancel_work_sync(&perf->pldata.srv_work);
+	cancel_work_sync(&perf->pldata.clt_work);
 
 	for (tidx = 0; tidx < MAX_THREADS_CNT; tidx++) {
 		wake_up(&perf->threads[tidx].dma_wait);
@@ -1195,6 +1412,46 @@ static int perf_submit_test(struct perf_peer *peer)
 	return ret;
 }
 
+static int perf_submit_poll_lat(struct perf_peer *peer, enum run_mode mode)
+{
+	struct perf_ctx *perf = peer->perf;
+	int ret;
+
+	ret = wait_for_completion_interruptible(&peer->init_comp);
+	if (ret < 0)
+		return ret;
+
+	if (test_and_set_bit_lock(0, &perf->busy_flag))
+		return -EBUSY;
+
+	perf->test_peer = peer;
+	atomic_set(&perf->pldata.running, 1);
+	perf->pldata.status = -ENODATA;
+	perf->pldata.tries = 0;
+	perf->pldata.latency = ktime_set(0, 0);
+
+	switch (mode) {
+	case RUN_PL_SERVER:
+		(void)queue_work(perf_wq, &perf->pldata.srv_work);
+		break;
+	case RUN_PL_CLIENT:
+	default:
+		(void)queue_work(perf_wq, &perf->pldata.clt_work);
+		break;
+	}
+
+	ret = wait_event_interruptible(perf->twait,
+				       !atomic_read(&perf->pldata.running));
+	if (ret == -ERESTARTSYS) {
+		perf_terminate_test(perf);
+		ret = -EINTR;
+	}
+
+	clear_bit_unlock(0, &perf->busy_flag);
+
+	return ret;
+}
+
 static int perf_read_stats(struct perf_ctx *perf, char *buf,
 			   size_t size, ssize_t *pos)
 {
@@ -1237,6 +1494,24 @@ static int perf_read_stats(struct perf_ctx *perf, char *buf,
 		}
 	}
 
+	if (perf->pldata.status != -ENODATA) {
+		(*pos) += scnprintf(buf + *pos, size - *pos, "\n");
+		if (perf->pldata.status) {
+			(*pos) += scnprintf(buf + *pos, size - *pos,
+				"poll latency: error status %d\n", perf->pldata.status);
+		} else {
+			if (ktime_to_us(perf->pldata.latency) < 10) {
+				(*pos) += scnprintf(buf + *pos, size - *pos,
+						"poll latency %llu ns\n",
+						ktime_to_ns(perf->pldata.latency));
+			} else {
+				(*pos) += scnprintf(buf + *pos, size - *pos,
+						"poll latency %llu us\n",
+						ktime_to_us(perf->pldata.latency));
+			}
+		}
+	}
+
 	clear_bit_unlock(0, &perf->busy_flag);
 
 	return 0;
@@ -1250,6 +1525,10 @@ static void perf_init_threads(struct perf_ctx *perf)
 	perf->tcnt = DEF_THREADS_CNT;
 	perf->test_peer = &perf->peers[0];
 	init_waitqueue_head(&perf->twait);
+	perf->pldata.perf = perf;
+	INIT_WORK(&perf->pldata.srv_work, perf_poll_lat_server_work);
+	INIT_WORK(&perf->pldata.clt_work, perf_poll_lat_client_work);
+	perf->pldata.status = -ENODATA;
 
 	for (tidx = 0; tidx < MAX_THREADS_CNT; tidx++) {
 		pthr = &perf->threads[tidx];
@@ -1406,6 +1685,64 @@ static const struct file_operations perf_dbgfs_run = {
 	.write = perf_dbgfs_write_run
 };
 
+static ssize_t perf_dbgfs_write_run_pl(struct file *filep, const char __user *ubuf,
+				    size_t size, loff_t *offp, enum run_mode mode)
+{
+	struct perf_ctx *perf = filep->private_data;
+	struct ntb_dev *ntb = perf->ntb;
+	struct perf_peer *peer;
+	int pidx, ret;
+
+	ret = kstrtoint_from_user(ubuf, size, 0, &pidx);
+	if (ret)
+		return ret;
+
+	if (pidx < 0 && mode == RUN_PL_SERVER) {
+		dev_dbg(&ntb->dev, "poll_lat: kill server\n");
+		if (test_bit(0, &perf->busy_flag)) {
+			peer = perf->test_peer;
+			/* Send stop to client */
+			memcpy_toio(peer->outbuf, &stop_word, 1);
+		}
+		perf_terminate_test(perf);
+		clear_bit_unlock(0, &perf->busy_flag);
+		return size;
+	}
+
+	if (pidx < 0 || pidx >= perf->pcnt)
+		return -EINVAL;
+
+	peer = &perf->peers[pidx];
+
+	ret = perf_submit_poll_lat(peer, mode);
+
+	return ret ? ret : size;
+}
+
+static ssize_t perf_dbgfs_write_run_client(struct file *filep, const char __user *ubuf,
+				    size_t size, loff_t *offp)
+{
+	return perf_dbgfs_write_run_pl(filep, ubuf, size, offp, RUN_PL_CLIENT);
+}
+
+static const struct file_operations perf_dbgfs_run_client = {
+	.open = simple_open,
+	.read = perf_dbgfs_read_run,
+	.write = perf_dbgfs_write_run_client
+};
+
+static ssize_t perf_dbgfs_write_run_server(struct file *filep, const char __user *ubuf,
+				    size_t size, loff_t *offp)
+{
+	return perf_dbgfs_write_run_pl(filep, ubuf, size, offp, RUN_PL_SERVER);
+}
+
+static const struct file_operations perf_dbgfs_run_server = {
+	.open = simple_open,
+	.read = perf_dbgfs_read_run,
+	.write = perf_dbgfs_write_run_server
+};
+
 static ssize_t perf_dbgfs_read_tcnt(struct file *filep, char __user *ubuf,
 				    size_t size, loff_t *offp)
 {
@@ -1468,6 +1805,24 @@ static ssize_t perf_dbgfs_read_lattrs(struct file *filep, char __user *ubuf,
 	return ret;
 }
 
+static ssize_t perf_dbgfs_read_inbuf(struct file *filep, char __user *ubuf,
+				    size_t size, loff_t *offp)
+{
+	struct perf_ctx *perf = filep->private_data;
+	char buf[32];
+	ssize_t pos;
+	u64 *value;
+
+	if (!perf->test_peer || !perf->test_peer->inbuf) {
+		pos = scnprintf(buf, sizeof(buf), "NULL\n");
+	} else {
+		value = perf->test_peer->inbuf;
+		pos = scnprintf(buf, sizeof(buf), "0x%llx\n", *value);
+	}
+
+	return simple_read_from_buffer(ubuf, size, offp, buf, pos);
+}
+
 static const struct file_operations perf_dbgfs_tcnt = {
 	.open = simple_open,
 	.read = perf_dbgfs_read_tcnt,
@@ -1479,6 +1834,11 @@ static const struct file_operations perf_dbgfs_lattrs = {
 	.read = perf_dbgfs_read_lattrs
 };
 
+static const struct file_operations perf_dbgfs_inbuf = {
+	.open = simple_open,
+	.read = perf_dbgfs_read_inbuf,
+};
+
 static void perf_setup_dbgfs(struct perf_ctx *perf)
 {
 	struct pci_dev *pdev = perf->ntb->pdev;
@@ -1495,6 +1855,12 @@ static void perf_setup_dbgfs(struct perf_ctx *perf)
 	debugfs_create_file("run", 0600, perf->dbgfs_dir, perf,
 			    &perf_dbgfs_run);
 
+	debugfs_create_file("run_client", 0600, perf->dbgfs_dir, perf,
+			    &perf_dbgfs_run_client);
+
+	debugfs_create_file("run_server", 0600, perf->dbgfs_dir, perf,
+			    &perf_dbgfs_run_server);
+
 	debugfs_create_file("threads_count", 0600, perf->dbgfs_dir, perf,
 			    &perf_dbgfs_tcnt);
 
@@ -1507,6 +1873,12 @@ static void perf_setup_dbgfs(struct perf_ctx *perf)
 
 	debugfs_create_file("latency_tries", 0400, perf->dbgfs_dir, perf,
 			    &perf_dbgfs_lattrs);
+
+	debugfs_create_u64("poll_latency_tries", 0400, perf->dbgfs_dir,
+				&perf->pldata.tries);
+
+	debugfs_create_file("inbuf", 0400, perf->dbgfs_dir, perf,
+			    &perf_dbgfs_inbuf);
 }
 
 static void perf_clear_dbgfs(struct perf_ctx *perf)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/3] ntb_perf: extend with doorbell latency measurement
  2022-05-13 19:37 [PATCH v3 0/3] ntb_perf: add new 'latency' test set Alexander Fomichev
  2022-05-13 19:37 ` [PATCH v3 1/3] ntb_perf: extend with burst latency measurement Alexander Fomichev
  2022-05-13 19:37 ` [PATCH v3 2/3] ntb_perf: extend with poll " Alexander Fomichev
@ 2022-05-13 19:37 ` Alexander Fomichev
  2022-05-23 18:38 ` [PATCH v3 0/3] ntb_perf: add new 'latency' test set Dave Jiang
  3 siblings, 0 replies; 12+ messages in thread
From: Alexander Fomichev @ 2022-05-13 19:37 UTC (permalink / raw)
  To: ntb
  Cc: linux, Jon Mason, Dave Jiang, Allen Hubbe, Guo Zhengkui,
	fancer.lancer, Alexander Fomichev

From: Alexander Fomichev <a.fomichev@yadro.com>

Doorbell latency is a delay between start to ring an NTB doorbell
and receiving the confirmation. The remote system needs to be run in server
mode beforehand. Then the server waits for a doorbell event and immediately
rings self doorbell to confirm.

Thanks-to: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: Alexander Fomichev <a.fomichev@yadro.com>
---
 drivers/ntb/test/ntb_perf.c | 532 ++++++++++++++++++++++++++++--------
 1 file changed, 411 insertions(+), 121 deletions(-)

diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index f0f3beba70a5..23e154bd41b9 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -75,6 +75,14 @@
  * Client side:
  * root@self# echo 0 > $DBG_DIR/poll_latency/run_client
  * root@self# cat $DBG_DIR/poll_latency/run_client
+ *-----------------------------------------------------------------------------
+ * Eg: start doorbell latency test with peer (index 0) and get the metrics
+ *
+ * Server side:
+ * root@self# echo 0 > $DBG_DIR/db_latency/run_server
+ * Client side:
+ * root@self# echo 0 > $DBG_DIR/db_latency/run_client
+ * root@self# cat $DBG_DIR/db_latency/run_client
  */
 
 #include <linux/init.h>
@@ -86,6 +94,7 @@
 #include <linux/dmaengine.h>
 #include <linux/pci.h>
 #include <linux/ktime.h>
+#include <linux/jiffies.h>
 #include <linux/slab.h>
 #include <linux/delay.h>
 #include <linux/sizes.h>
@@ -95,7 +104,7 @@
 #include <linux/ntb.h>
 
 #define DRIVER_NAME		"ntb_perf"
-#define DRIVER_VERSION		"2.2"
+#define DRIVER_VERSION		"2.3"
 
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_VERSION(DRIVER_VERSION);
@@ -147,6 +156,10 @@ static unsigned long lat_timeout_us = 500;
 module_param(lat_timeout_us, ulong, 0644);
 MODULE_PARM_DESC(lat_timeout_us, "Timeout (in us) to wait for server reply");
 
+static unsigned long peer_timeout_s = 60;
+module_param(peer_timeout_s, ulong, 0644);
+MODULE_PARM_DESC(peer_timeout_s, "Timeout (in s) to wait for peer link");
+
 /*==============================================================================
  *                         Perf driver data definition
  *==============================================================================
@@ -166,9 +179,18 @@ enum perf_cmd {
 enum run_mode {
 	RUN_PL_CLIENT,
 	RUN_PL_SERVER,
+	RUN_DBL_CLIENT,
+	RUN_DBL_SERVER,
 };
 
 struct perf_ctx;
+struct perf_ext_lat_data;
+
+struct perf_ext_lat_ops {
+	int (*init)(struct perf_ext_lat_data *data);
+	int (*run)(struct perf_ext_lat_data *data);
+	void (*clear)(struct perf_ext_lat_data *data);
+};
 
 struct perf_peer {
 	struct perf_ctx	*perf;
@@ -216,20 +238,21 @@ struct perf_thread {
 #define to_thread_work(__work) \
 	container_of(__work, struct perf_thread, work)
 
-struct perf_poll_lat_data {
+struct perf_ext_lat_data {
 	struct perf_ctx *perf;
-	void *src;
 	ktime_t latency;
 	u64 tries;
 	int status;
-	atomic_t running;
-	struct work_struct clt_work;
-	struct work_struct srv_work;
+	struct perf_ext_lat_ops ops;
+	struct work_struct work;
+
+	union {
+		void *src;
+		int db;
+	};
 };
-#define to_pldata_clt_work(__work) \
-	container_of(__work, struct perf_poll_lat_data, clt_work)
-#define to_pldata_srv_work(__work) \
-	container_of(__work, struct perf_poll_lat_data, srv_work)
+#define to_ext_lat_data(__work) \
+	container_of(__work, struct perf_ext_lat_data, work)
 
 struct perf_ctx {
 	struct ntb_dev *ntb;
@@ -238,7 +261,12 @@ struct perf_ctx {
 	int gidx;
 	int pcnt;
 	struct perf_peer *peers;
-	struct perf_poll_lat_data pldata;
+
+	/* Ext latency tests interface */
+	enum run_mode mode;
+	struct perf_ext_lat_data pldata;
+	struct perf_ext_lat_data dbldata;
+	atomic_t running;
 
 	/* Performance measuring work-threads interface */
 	unsigned long busy_flag;
@@ -551,6 +579,15 @@ static void perf_link_event(void *ctx)
 	}
 }
 
+static inline void perf_dbl_pong(struct perf_ctx *perf)
+{
+	struct perf_ext_lat_data *data = &perf->dbldata;
+
+	ntb_db_clear(perf->ntb, BIT_ULL(data->db));
+	data->tries++;
+	ntb_peer_db_set(perf->ntb, BIT_ULL(data->db));
+}
+
 static void perf_db_event(void *ctx, int vec)
 {
 	struct perf_ctx *perf = ctx;
@@ -559,7 +596,11 @@ static void perf_db_event(void *ctx, int vec)
 		ntb_db_vector_mask(perf->ntb, vec), ntb_db_read(perf->ntb));
 
 	/* Just receive all available commands */
-	(void)perf_cmd_recv(perf);
+	if (perf->dbldata.db >= 0 &&
+				BIT_ULL(perf->dbldata.db) & ntb_db_read(perf->ntb))
+		perf_dbl_pong(perf);
+	else
+		(void)perf_cmd_recv(perf);
 }
 
 static void perf_msg_event(void *ctx)
@@ -714,6 +755,8 @@ static int perf_init_service(struct perf_ctx *perf)
 		return -EINVAL;
 	}
 
+	perf->dbldata.db = -1;
+
 	if (ntb_msg_count(perf->ntb) >= PERF_MSG_CNT) {
 		perf->cmd_send = perf_msg_cmd_send;
 		perf->cmd_recv = perf_msg_cmd_recv;
@@ -1164,14 +1207,14 @@ static void perf_thread_work(struct work_struct *work)
 	perf_clear_test(pthr);
 }
 
-static int perf_init_pl(struct perf_poll_lat_data *pldata)
+static int perf_init_pl(struct perf_ext_lat_data *pldata)
 {
 	struct perf_ctx *perf = pldata->perf;
 	struct perf_peer *peer = perf->test_peer;
 	u8 *bp;
 
 	pldata->src = kmalloc_node(peer->outbuf_size, GFP_KERNEL,
-				 dev_to_node(&perf->ntb->dev));
+				dev_to_node(&perf->ntb->dev));
 	if (!pldata->src)
 		return -ENOMEM;
 
@@ -1206,10 +1249,11 @@ static int perf_poll_peer_reply(volatile u8 *cur)
 	return -EINTR;
 }
 
-static int perf_run_pl_client(struct perf_poll_lat_data *pldata)
+static int perf_run_pl_client(struct perf_ext_lat_data *pldata)
 {
-	struct perf_peer *peer = pldata->perf->test_peer;
-	struct ntb_dev *ntb = pldata->perf->ntb;
+	struct perf_ctx *perf = pldata->perf;
+	struct perf_peer *peer = perf->test_peer;
+	struct ntb_dev *ntb = perf->ntb;
 	void *src = pldata->src;
 	u64 stop_at;
 	int ret;
@@ -1234,7 +1278,7 @@ static int perf_run_pl_client(struct perf_poll_lat_data *pldata)
 		} else if (ret == 1) {
 			dev_warn(&ntb->dev, "Server terminated on poll latency, stopping\n");
 			break;
-		} else if (!atomic_read(&pldata->running)) {
+		} else if (!atomic_read(&perf->running)) {
 			dev_err(&ntb->dev, "Poll latency client terminated\n");
 			return -EINTR;
 		}
@@ -1270,10 +1314,11 @@ static int perf_run_pl_client(struct perf_poll_lat_data *pldata)
 	return 0;
 }
 
-static int perf_run_pl_server(struct perf_poll_lat_data *pldata)
+static int perf_run_pl_server(struct perf_ext_lat_data *pldata)
 {
-	struct perf_peer *peer = pldata->perf->test_peer;
-	struct ntb_dev *ntb = pldata->perf->ntb;
+	struct perf_ctx *perf = pldata->perf;
+	struct perf_peer *peer = perf->test_peer;
+	struct ntb_dev *ntb = perf->ntb;
 	void *src = pldata->src;
 	int ret = 0;
 
@@ -1281,7 +1326,7 @@ static int perf_run_pl_server(struct perf_poll_lat_data *pldata)
 
 	pldata->tries = 0;
 
-	while (ret != 1 && atomic_read(&pldata->running)) {
+	while (ret != 1 && atomic_read(&perf->running)) {
 		ret = perf_poll_peer_reply(peer->inbuf);
 		if (!ret) {
 			/* Pong to client */
@@ -1303,44 +1348,131 @@ static int perf_run_pl_server(struct perf_poll_lat_data *pldata)
 	dev_dbg(&ntb->dev, "poll_lat: server stopped, had responded %llu times\n",
 		pldata->tries);
 
-	return atomic_read(&pldata->running) ? -ENODATA : -EINTR;
+	return atomic_read(&perf->running) ? -ENODATA : -EINTR;
 }
 
-static void perf_clear_pl(struct perf_poll_lat_data *pldata)
+static void perf_clear_pl(struct perf_ext_lat_data *pldata)
 {
 	struct perf_ctx *perf = pldata->perf;
 	struct perf_peer *peer = perf->test_peer;
 
 	memset(peer->inbuf, stop_word, 1);
-	atomic_set(&pldata->running, 0);
+	atomic_set(&perf->running, 0);
 	wake_up(&perf->twait);
 	kfree(pldata->src);
 }
 
-static void perf_poll_lat_client_work(struct work_struct *work)
+static struct perf_ext_lat_ops perf_pl_client_ops = {
+	.init = perf_init_pl,
+	.run = perf_run_pl_client,
+	.clear = perf_clear_pl
+};
+
+static struct perf_ext_lat_ops perf_pl_server_ops = {
+	.init = perf_init_pl,
+	.run = perf_run_pl_server,
+	.clear = perf_clear_pl
+};
+
+static int perf_init_dbl(struct perf_ext_lat_data *data)
 {
-	struct perf_poll_lat_data *pldata = to_pldata_clt_work(work);
+	struct perf_ctx *perf = data->perf;
 
-	pldata->status = perf_init_pl(pldata);
-	if (pldata->status)
-		return;
+	data->db = get_bitmask_order(ntb_db_valid_mask(perf->ntb)) - 1;
+	dev_dbg(&perf->ntb->dev, "DB bit for latency test: %d\n", data->db);
 
-	pldata->status = perf_run_pl_client(pldata);
+	if (data->db <= perf->gidx) {
+		dev_err(&perf->ntb->dev, "No spare DoorBell found\n");
+		data->db = -1;
+		return -ENOSPC;
+	}
 
-	perf_clear_pl(pldata);
+	return ntb_db_clear_mask(perf->ntb, BIT_ULL(data->db));
 }
 
-static void perf_poll_lat_server_work(struct work_struct *work)
+static int perf_run_dbl_client(struct perf_ext_lat_data *data)
 {
-	struct perf_poll_lat_data *pldata = to_pldata_srv_work(work);
+	struct perf_ctx *perf = data->perf;
+	struct ntb_dev *ntb = perf->ntb;
+	u64 stop_at;
+
+	dev_dbg(&ntb->dev, "db_lat: client started.\n");
+
+	data->tries = 0;
+	data->latency = ktime_get();
+
+	if (ntb_peer_db_set(perf->ntb, BIT_ULL(data->db)))
+		return -EIO;
+
+	stop_at = ktime_get_real_fast_ns() + lat_time_ms * NSEC_PER_MSEC;
+	while (ktime_get_real_fast_ns() < stop_at) {
+		/* Avoid processor soft lock-ups */
+		schedule();
 
-	pldata->status = perf_init_pl(pldata);
-	if (pldata->status)
+		if (!atomic_read(&perf->running)) {
+			dev_err(&ntb->dev, "DoorBell latency client terminated\n");
+			return -EINTR;
+		}
+	}
+
+	/* Stop timer */
+	data->latency = ktime_sub(ktime_get(), data->latency);
+
+	if (data->tries < LAT_MIN_TRIES) {
+		dev_err(&ntb->dev,
+			"Too few steps (%llu) to measure Latency, recommended > %d. Increase value of 'lat_time_ms' parameter\n",
+			data->tries, LAT_MIN_TRIES);
+		data->latency = ktime_set(0, 0);
+		return -EINVAL;
+	}
+
+	dev_dbg(&ntb->dev, "db_lat: made %llu tries, lasted %llu usecs\n",
+		data->tries, ktime_to_us(data->latency));
+
+	data->latency = ns_to_ktime(ktime_divns(data->latency, data->tries));
+
+	dev_dbg(&ntb->dev, "db_lat: latency %llu us (%llu ns)\n",
+		ktime_to_us(data->latency), ktime_to_ns(data->latency));
+
+	return 0;
+}
+
+static void perf_clear_dbl(struct perf_ext_lat_data *data)
+{
+	struct perf_ctx *perf = data->perf;
+
+	data->db = -1;
+	ntb_db_set_mask(perf->ntb, BIT_ULL(data->db));
+	atomic_set(&perf->running, 0);
+	wake_up(&perf->twait);
+}
+
+static struct perf_ext_lat_ops perf_dbl_client_ops = {
+	.init = perf_init_dbl,
+	.run = perf_run_dbl_client,
+	.clear = perf_clear_dbl
+};
+
+static void perf_ext_lat_work(struct work_struct *work)
+{
+	struct perf_ext_lat_data *data = to_ext_lat_data(work);
+
+	if (!data->ops.init || !data->ops.run || !data->ops.clear) {
+		struct perf_ctx *perf = data->perf;
+
+		data->status = -EFAULT;
+		atomic_set(&perf->running, 0);
+		wake_up(&perf->twait);
 		return;
+	}
 
-	pldata->status = perf_run_pl_server(pldata);
+	data->status = data->ops.init(data);
+	if (data->status)
+		return;
 
-	perf_clear_pl(pldata);
+	data->status = data->ops.run(data);
+
+	data->ops.clear(data);
 }
 
 static int perf_set_tcnt(struct perf_ctx *perf, u8 tcnt)
@@ -1363,10 +1495,10 @@ static void perf_terminate_test(struct perf_ctx *perf)
 	int tidx;
 
 	atomic_set(&perf->tsync, -1);
-	atomic_set(&perf->pldata.running, 0);
+	atomic_set(&perf->running, 0);
 	wake_up(&perf->twait);
-	cancel_work_sync(&perf->pldata.srv_work);
-	cancel_work_sync(&perf->pldata.clt_work);
+	cancel_work_sync(&perf->pldata.work);
+	cancel_work_sync(&perf->dbldata.work);
 
 	for (tidx = 0; tidx < MAX_THREADS_CNT; tidx++) {
 		wake_up(&perf->threads[tidx].dma_wait);
@@ -1380,9 +1512,10 @@ static int perf_submit_test(struct perf_peer *peer)
 	struct perf_thread *pthr;
 	int tidx, ret;
 
-	ret = wait_for_completion_interruptible(&peer->init_comp);
-	if (ret < 0)
-		return ret;
+	ret = wait_for_completion_interruptible_timeout(&peer->init_comp,
+			msecs_to_jiffies(peer_timeout_s * 1000));
+	if (ret <= 0)
+		return ret ? ret : -ETIMEDOUT;
 
 	if (test_and_set_bit_lock(0, &perf->busy_flag))
 		return -EBUSY;
@@ -1412,41 +1545,58 @@ static int perf_submit_test(struct perf_peer *peer)
 	return ret;
 }
 
-static int perf_submit_poll_lat(struct perf_peer *peer, enum run_mode mode)
+static int perf_submit_ext_lat(struct perf_peer *peer)
 {
 	struct perf_ctx *perf = peer->perf;
 	int ret;
 
-	ret = wait_for_completion_interruptible(&peer->init_comp);
-	if (ret < 0)
-		return ret;
+	ret = wait_for_completion_interruptible_timeout(&peer->init_comp,
+			msecs_to_jiffies(peer_timeout_s * 1000));
+	if (ret <= 0)
+		return ret ? ret : -ETIMEDOUT;
 
 	if (test_and_set_bit_lock(0, &perf->busy_flag))
 		return -EBUSY;
 
 	perf->test_peer = peer;
-	atomic_set(&perf->pldata.running, 1);
+	atomic_set(&perf->running, 1);
 	perf->pldata.status = -ENODATA;
 	perf->pldata.tries = 0;
 	perf->pldata.latency = ktime_set(0, 0);
+	perf->dbldata.status = -ENODATA;
+	perf->dbldata.tries = 0;
+	perf->dbldata.latency = ktime_set(0, 0);
 
-	switch (mode) {
+	switch (perf->mode) {
 	case RUN_PL_SERVER:
-		(void)queue_work(perf_wq, &perf->pldata.srv_work);
+		perf->pldata.ops = perf_pl_server_ops;
+		(void)queue_work(perf_wq, &perf->pldata.work);
 		break;
 	case RUN_PL_CLIENT:
-	default:
-		(void)queue_work(perf_wq, &perf->pldata.clt_work);
+		perf->pldata.ops = perf_pl_client_ops;
+		(void)queue_work(perf_wq, &perf->pldata.work);
+		break;
+	case RUN_DBL_SERVER:
+		ret = perf_init_dbl(&perf->dbldata);
+		dev_dbg(&perf->ntb->dev, "db_lat: server started.\n");
+		goto submit_exit;
+	case RUN_DBL_CLIENT:
+		perf->dbldata.ops = perf_dbl_client_ops;
+		(void)queue_work(perf_wq, &perf->dbldata.work);
 		break;
+	default:
+		ret = -EINVAL;
+		goto submit_exit;
 	}
 
 	ret = wait_event_interruptible(perf->twait,
-				       !atomic_read(&perf->pldata.running));
+				       !atomic_read(&perf->running));
 	if (ret == -ERESTARTSYS) {
 		perf_terminate_test(perf);
 		ret = -EINTR;
 	}
 
+submit_exit:
 	clear_bit_unlock(0, &perf->busy_flag);
 
 	return ret;
@@ -1494,30 +1644,12 @@ static int perf_read_stats(struct perf_ctx *perf, char *buf,
 		}
 	}
 
-	if (perf->pldata.status != -ENODATA) {
-		(*pos) += scnprintf(buf + *pos, size - *pos, "\n");
-		if (perf->pldata.status) {
-			(*pos) += scnprintf(buf + *pos, size - *pos,
-				"poll latency: error status %d\n", perf->pldata.status);
-		} else {
-			if (ktime_to_us(perf->pldata.latency) < 10) {
-				(*pos) += scnprintf(buf + *pos, size - *pos,
-						"poll latency %llu ns\n",
-						ktime_to_ns(perf->pldata.latency));
-			} else {
-				(*pos) += scnprintf(buf + *pos, size - *pos,
-						"poll latency %llu us\n",
-						ktime_to_us(perf->pldata.latency));
-			}
-		}
-	}
-
 	clear_bit_unlock(0, &perf->busy_flag);
 
 	return 0;
 }
 
-static void perf_init_threads(struct perf_ctx *perf)
+static void perf_init_workers(struct perf_ctx *perf)
 {
 	struct perf_thread *pthr;
 	int tidx;
@@ -1525,11 +1657,15 @@ static void perf_init_threads(struct perf_ctx *perf)
 	perf->tcnt = DEF_THREADS_CNT;
 	perf->test_peer = &perf->peers[0];
 	init_waitqueue_head(&perf->twait);
+
 	perf->pldata.perf = perf;
-	INIT_WORK(&perf->pldata.srv_work, perf_poll_lat_server_work);
-	INIT_WORK(&perf->pldata.clt_work, perf_poll_lat_client_work);
+	INIT_WORK(&perf->pldata.work, perf_ext_lat_work);
 	perf->pldata.status = -ENODATA;
 
+	perf->dbldata.perf = perf;
+	INIT_WORK(&perf->dbldata.work, perf_ext_lat_work);
+	perf->dbldata.status = -ENODATA;
+
 	for (tidx = 0; tidx < MAX_THREADS_CNT; tidx++) {
 		pthr = &perf->threads[tidx];
 
@@ -1541,7 +1677,7 @@ static void perf_init_threads(struct perf_ctx *perf)
 	}
 }
 
-static void perf_clear_threads(struct perf_ctx *perf)
+static void perf_clear_workers(struct perf_ctx *perf)
 {
 	perf_terminate_test(perf);
 }
@@ -1685,8 +1821,55 @@ static const struct file_operations perf_dbgfs_run = {
 	.write = perf_dbgfs_write_run
 };
 
-static ssize_t perf_dbgfs_write_run_pl(struct file *filep, const char __user *ubuf,
-				    size_t size, loff_t *offp, enum run_mode mode)
+static ssize_t perf_dbgfs_read_run_pl(struct file *filep, char __user *ubuf,
+				   size_t fsize, loff_t *offp)
+{
+	struct perf_ctx *perf = filep->private_data;
+	ssize_t size = PERF_BUF_LEN;
+	ssize_t pos = 0;
+	ssize_t ret;
+	char *buf;
+
+	if (test_and_set_bit_lock(0, &perf->busy_flag))
+		return -EBUSY;
+
+	buf = kmalloc(size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	pos += scnprintf(buf + pos, size - pos,
+		"    Peer %d test statistics:\n", perf->test_peer->pidx);
+
+	if (perf->pldata.status != -ENODATA) {
+		if (perf->pldata.status) {
+			pos += scnprintf(buf + pos, size - pos,
+				"poll latency: error status %d\n", perf->pldata.status);
+		} else {
+			if (ktime_to_us(perf->pldata.latency) < 10) {
+				pos += scnprintf(buf + pos, size - pos,
+						"poll latency %llu ns\n",
+						ktime_to_ns(perf->pldata.latency));
+			} else {
+				pos += scnprintf(buf + pos, size - pos,
+						"poll latency %llu us\n",
+						ktime_to_us(perf->pldata.latency));
+			}
+		}
+	} else {
+		pos += scnprintf(buf + pos, size - pos, "Test did not run\n");
+	}
+
+	ret = simple_read_from_buffer(ubuf, fsize, offp, buf, pos);
+
+	kfree(buf);
+
+	clear_bit_unlock(0, &perf->busy_flag);
+
+	return ret;
+}
+
+static ssize_t perf_dbgfs_write_run_ext(struct file *filep, const char __user *ubuf,
+					size_t size, loff_t *offp, enum run_mode mode)
 {
 	struct perf_ctx *perf = filep->private_data;
 	struct ntb_dev *ntb = perf->ntb;
@@ -1697,50 +1880,132 @@ static ssize_t perf_dbgfs_write_run_pl(struct file *filep, const char __user *ub
 	if (ret)
 		return ret;
 
-	if (pidx < 0 && mode == RUN_PL_SERVER) {
-		dev_dbg(&ntb->dev, "poll_lat: kill server\n");
-		if (test_bit(0, &perf->busy_flag)) {
-			peer = perf->test_peer;
-			/* Send stop to client */
-			memcpy_toio(peer->outbuf, &stop_word, 1);
+	if (pidx < 0) {
+		switch (mode) {
+		case RUN_PL_SERVER:
+			dev_dbg(&ntb->dev, "poll_lat: kill server\n");
+			if (test_bit(0, &perf->busy_flag)) {
+				peer = perf->test_peer;
+				/* Send stop to client */
+				memcpy_toio(peer->outbuf, &stop_word, 1);
+			}
+			perf_terminate_test(perf);
+			clear_bit_unlock(0, &perf->busy_flag);
+			return size;
+		case RUN_DBL_SERVER:
+			dev_dbg(&ntb->dev, "db_lat: kill server\n");
+			perf_clear_dbl(&perf->dbldata);
+			clear_bit_unlock(0, &perf->busy_flag);
+			return size;
+		default:
+			return -EINVAL;
 		}
-		perf_terminate_test(perf);
-		clear_bit_unlock(0, &perf->busy_flag);
-		return size;
 	}
 
-	if (pidx < 0 || pidx >= perf->pcnt)
+	if (pidx >= perf->pcnt)
 		return -EINVAL;
 
 	peer = &perf->peers[pidx];
+	perf->mode = mode;
 
-	ret = perf_submit_poll_lat(peer, mode);
+	ret = perf_submit_ext_lat(peer);
 
 	return ret ? ret : size;
 }
 
-static ssize_t perf_dbgfs_write_run_client(struct file *filep, const char __user *ubuf,
-				    size_t size, loff_t *offp)
+static ssize_t perf_dbgfs_write_run_pl_client(struct file *filep,
+					const char __user *ubuf, size_t size, loff_t *offp)
 {
-	return perf_dbgfs_write_run_pl(filep, ubuf, size, offp, RUN_PL_CLIENT);
+	return perf_dbgfs_write_run_ext(filep, ubuf, size, offp, RUN_PL_CLIENT);
 }
 
-static const struct file_operations perf_dbgfs_run_client = {
+static const struct file_operations perf_dbgfs_run_pl_client = {
 	.open = simple_open,
-	.read = perf_dbgfs_read_run,
-	.write = perf_dbgfs_write_run_client
+	.read = perf_dbgfs_read_run_pl,
+	.write = perf_dbgfs_write_run_pl_client
 };
 
-static ssize_t perf_dbgfs_write_run_server(struct file *filep, const char __user *ubuf,
-				    size_t size, loff_t *offp)
+static ssize_t perf_dbgfs_write_run_pl_server(struct file *filep,
+					const char __user *ubuf, size_t size, loff_t *offp)
 {
-	return perf_dbgfs_write_run_pl(filep, ubuf, size, offp, RUN_PL_SERVER);
+	return perf_dbgfs_write_run_ext(filep, ubuf, size, offp, RUN_PL_SERVER);
 }
 
-static const struct file_operations perf_dbgfs_run_server = {
+static const struct file_operations perf_dbgfs_run_pl_server = {
 	.open = simple_open,
-	.read = perf_dbgfs_read_run,
-	.write = perf_dbgfs_write_run_server
+	.read = perf_dbgfs_read_run_pl,
+	.write = perf_dbgfs_write_run_pl_server
+};
+
+static ssize_t perf_dbgfs_read_run_dbl(struct file *filep, char __user *ubuf,
+				   size_t fsize, loff_t *offp)
+{
+	struct perf_ctx *perf = filep->private_data;
+	ssize_t size = PERF_BUF_LEN;
+	ssize_t pos = 0;
+	ssize_t ret;
+	char *buf;
+
+	if (test_and_set_bit_lock(0, &perf->busy_flag))
+		return -EBUSY;
+
+	buf = kmalloc(size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	pos += scnprintf(buf + pos, size - pos,
+		"    Peer %d test statistics:\n", perf->test_peer->pidx);
+
+	if (perf->dbldata.status != -ENODATA) {
+		if (perf->dbldata.status) {
+			pos += scnprintf(buf + pos, size - pos,
+				"doorbell latency: error status %d\n", perf->dbldata.status);
+		} else {
+			if (ktime_to_us(perf->dbldata.latency) < 10) {
+				pos += scnprintf(buf + pos, size - pos,
+						"doorbell latency %llu ns\n",
+						ktime_to_ns(perf->dbldata.latency));
+			} else {
+				pos += scnprintf(buf + pos, size - pos,
+						"doorbell latency %llu us\n",
+						ktime_to_us(perf->dbldata.latency));
+			}
+		}
+	} else {
+		pos += scnprintf(buf + pos, size - pos, "Test did not run\n");
+	}
+
+	ret = simple_read_from_buffer(ubuf, fsize, offp, buf, pos);
+
+	kfree(buf);
+
+	clear_bit_unlock(0, &perf->busy_flag);
+
+	return ret;
+}
+
+static ssize_t perf_dbgfs_write_run_dbl_client(struct file *filep,
+					const char __user *ubuf, size_t size, loff_t *offp)
+{
+	return perf_dbgfs_write_run_ext(filep, ubuf, size, offp, RUN_DBL_CLIENT);
+}
+
+static const struct file_operations perf_dbgfs_run_dbl_client = {
+	.open = simple_open,
+	.read = perf_dbgfs_read_run_dbl,
+	.write = perf_dbgfs_write_run_dbl_client
+};
+
+static ssize_t perf_dbgfs_write_run_dbl_server(struct file *filep,
+					const char __user *ubuf, size_t size, loff_t *offp)
+{
+	return perf_dbgfs_write_run_ext(filep, ubuf, size, offp, RUN_DBL_SERVER);
+}
+
+static const struct file_operations perf_dbgfs_run_dbl_server = {
+	.open = simple_open,
+	.read = perf_dbgfs_read_run_dbl,
+	.write = perf_dbgfs_write_run_dbl_server
 };
 
 static ssize_t perf_dbgfs_read_tcnt(struct file *filep, char __user *ubuf,
@@ -1794,8 +2059,7 @@ static ssize_t perf_dbgfs_read_lattrs(struct file *filep, char __user *ubuf,
 		struct perf_thread *pthr = &perf->threads[tidx];
 
 		pos += scnprintf(buf + pos, buf_size - pos,
-			"%d: made %llu tries\n",
-			tidx, pthr->tries);
+				"%d: made %llu tries\n", tidx, pthr->tries);
 	}
 
 	ret = simple_read_from_buffer(ubuf, size, offp, buf, pos);
@@ -1806,7 +2070,7 @@ static ssize_t perf_dbgfs_read_lattrs(struct file *filep, char __user *ubuf,
 }
 
 static ssize_t perf_dbgfs_read_inbuf(struct file *filep, char __user *ubuf,
-				    size_t size, loff_t *offp)
+					size_t size, loff_t *offp)
 {
 	struct perf_ctx *perf = filep->private_data;
 	char buf[32];
@@ -1842,6 +2106,9 @@ static const struct file_operations perf_dbgfs_inbuf = {
 static void perf_setup_dbgfs(struct perf_ctx *perf)
 {
 	struct pci_dev *pdev = perf->ntb->pdev;
+	struct dentry *burst_lat_dir;
+	struct dentry *poll_lat_dir;
+	struct dentry *db_lat_dir;
 
 	perf->dbgfs_dir = debugfs_create_dir(pci_name(pdev), perf_dbgfs_topdir);
 	if (!perf->dbgfs_dir) {
@@ -1852,17 +2119,10 @@ static void perf_setup_dbgfs(struct perf_ctx *perf)
 	debugfs_create_file("info", 0600, perf->dbgfs_dir, perf,
 			    &perf_dbgfs_info);
 
-	debugfs_create_file("run", 0600, perf->dbgfs_dir, perf,
-			    &perf_dbgfs_run);
+	debugfs_create_symlink("run", perf->dbgfs_dir, "burst_latency/run");
 
-	debugfs_create_file("run_client", 0600, perf->dbgfs_dir, perf,
-			    &perf_dbgfs_run_client);
-
-	debugfs_create_file("run_server", 0600, perf->dbgfs_dir, perf,
-			    &perf_dbgfs_run_server);
-
-	debugfs_create_file("threads_count", 0600, perf->dbgfs_dir, perf,
-			    &perf_dbgfs_tcnt);
+	debugfs_create_symlink("threads_count", perf->dbgfs_dir,
+				"burst_latency/threads_count");
 
 	/* They are made read-only for test exec safety and integrity */
 	debugfs_create_u8("chunk_order", 0500, perf->dbgfs_dir, &chunk_order);
@@ -1871,14 +2131,44 @@ static void perf_setup_dbgfs(struct perf_ctx *perf)
 
 	debugfs_create_bool("use_dma", 0500, perf->dbgfs_dir, &use_dma);
 
-	debugfs_create_file("latency_tries", 0400, perf->dbgfs_dir, perf,
+	debugfs_create_file("inbuf", 0400, perf->dbgfs_dir, perf,
+			    &perf_dbgfs_inbuf);
+
+	/* burst_latency subdir */
+
+	burst_lat_dir = debugfs_create_dir("burst_latency", perf->dbgfs_dir);
+
+	debugfs_create_file("run", 0600, burst_lat_dir, perf, &perf_dbgfs_run);
+
+	debugfs_create_file("threads_count", 0600, burst_lat_dir, perf,
+			    &perf_dbgfs_tcnt);
+
+	debugfs_create_file("tries", 0400, burst_lat_dir, perf,
 			    &perf_dbgfs_lattrs);
 
-	debugfs_create_u64("poll_latency_tries", 0400, perf->dbgfs_dir,
-				&perf->pldata.tries);
+	/* poll_latency subdir */
 
-	debugfs_create_file("inbuf", 0400, perf->dbgfs_dir, perf,
-			    &perf_dbgfs_inbuf);
+	poll_lat_dir = debugfs_create_dir("poll_latency", perf->dbgfs_dir);
+
+	debugfs_create_file("run_client", 0600, poll_lat_dir, perf,
+			    &perf_dbgfs_run_pl_client);
+
+	debugfs_create_file("run_server", 0600, poll_lat_dir, perf,
+			    &perf_dbgfs_run_pl_server);
+
+	debugfs_create_u64("tries", 0400, poll_lat_dir, &perf->pldata.tries);
+
+	/* db_latency subdir */
+
+	db_lat_dir = debugfs_create_dir("db_latency", perf->dbgfs_dir);
+
+	debugfs_create_file("run_client", 0600, db_lat_dir, perf,
+			    &perf_dbgfs_run_dbl_client);
+
+	debugfs_create_file("run_server", 0600, db_lat_dir, perf,
+			    &perf_dbgfs_run_dbl_server);
+
+	debugfs_create_u64("tries", 0400, db_lat_dir, &perf->dbldata.tries);
 }
 
 static void perf_clear_dbgfs(struct perf_ctx *perf)
@@ -1998,7 +2288,7 @@ static int perf_probe(struct ntb_client *client, struct ntb_dev *ntb)
 	if (ret)
 		return ret;
 
-	perf_init_threads(perf);
+	perf_init_workers(perf);
 
 	ret = perf_init_service(perf);
 	if (ret)
@@ -2021,7 +2311,7 @@ static void perf_remove(struct ntb_client *client, struct ntb_dev *ntb)
 
 	perf_disable_service(perf);
 
-	perf_clear_threads(perf);
+	perf_clear_workers(perf);
 }
 
 static struct ntb_client perf_client = {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] ntb_perf: add new 'latency' test set
  2022-05-13 19:37 [PATCH v3 0/3] ntb_perf: add new 'latency' test set Alexander Fomichev
                   ` (2 preceding siblings ...)
  2022-05-13 19:37 ` [PATCH v3 3/3] ntb_perf: extend with doorbell " Alexander Fomichev
@ 2022-05-23 18:38 ` Dave Jiang
  2022-05-25  8:58   ` Serge Semin
  3 siblings, 1 reply; 12+ messages in thread
From: Dave Jiang @ 2022-05-23 18:38 UTC (permalink / raw)
  To: Alexander Fomichev, ntb
  Cc: linux, Jon Mason, Allen Hubbe, Guo Zhengkui, fancer.lancer,
	Alexander Fomichev


On 5/13/2022 12:37 PM, Alexander Fomichev wrote:
> From: Alexander Fomichev <a.fomichev@yadro.com>
>
> The ntb_perf test provides a tool for NTB hardware performance
> evaluation. For software impact elimination the test uses a simple method
> (let's call it 'burst' mode), when the local system sends to the remote
> system a data set and counts time interval until hardware completion
> report, without the remote side confirming, nor data integrity check.
> The measured metric is a 'burst' throughput bandwidth of NTB connection.
>
> The patches extend ntb_perf with 3 new metrics:
> 1) Burst latency
> 2) Poll latency
> 3) Doorbell latency
>
> The resulting test set is fully backward compatible.
>
> Alexander Fomichev (3):
>    ntb_perf: extend with burst latency measurement
>    ntb_perf: extend with poll latency measurement
>    ntb_perf: extend with doorbell latency measurement
>
>   drivers/ntb/test/ntb_perf.c | 826 +++++++++++++++++++++++++++++++++++-
>   1 file changed, 810 insertions(+), 16 deletions(-)

Looks ok to me. Probably should get Serge's ack.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] ntb_perf: add new 'latency' test set
  2022-05-23 18:38 ` [PATCH v3 0/3] ntb_perf: add new 'latency' test set Dave Jiang
@ 2022-05-25  8:58   ` Serge Semin
  2022-06-20 10:20     ` Alexander Fomichev
  0 siblings, 1 reply; 12+ messages in thread
From: Serge Semin @ 2022-05-25  8:58 UTC (permalink / raw)
  To: Dave Jiang
  Cc: Alexander Fomichev, ntb, linux, Jon Mason, Allen Hubbe,
	Guo Zhengkui, Alexander Fomichev

Hi Dave

On Mon, May 23, 2022 at 11:38:35AM -0700, Dave Jiang wrote:
> 
> On 5/13/2022 12:37 PM, Alexander Fomichev wrote:
> > From: Alexander Fomichev <a.fomichev@yadro.com>
> > 
> > The ntb_perf test provides a tool for NTB hardware performance
> > evaluation. For software impact elimination the test uses a simple method
> > (let's call it 'burst' mode), when the local system sends to the remote
> > system a data set and counts time interval until hardware completion
> > report, without the remote side confirming, nor data integrity check.
> > The measured metric is a 'burst' throughput bandwidth of NTB connection.
> > 
> > The patches extend ntb_perf with 3 new metrics:
> > 1) Burst latency
> > 2) Poll latency
> > 3) Doorbell latency
> > 
> > The resulting test set is fully backward compatible.
> > 
> > Alexander Fomichev (3):
> >    ntb_perf: extend with burst latency measurement
> >    ntb_perf: extend with poll latency measurement
> >    ntb_perf: extend with doorbell latency measurement
> > 
> >   drivers/ntb/test/ntb_perf.c | 826 +++++++++++++++++++++++++++++++++++-
> >   1 file changed, 810 insertions(+), 16 deletions(-)
> 

> Looks ok to me. Probably should get Serge's ack.

Ok. I'll have a look at the series on this week.

-Sergey

> 
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] ntb_perf: add new 'latency' test set
  2022-05-25  8:58   ` Serge Semin
@ 2022-06-20 10:20     ` Alexander Fomichev
  2022-06-20 14:42       ` Jon Mason
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Fomichev @ 2022-06-20 10:20 UTC (permalink / raw)
  To: Serge Semin
  Cc: Dave Jiang, ntb, linux, Jon Mason, Allen Hubbe, Guo Zhengkui,
	Alexander Fomichev

Hi Serge,

On Wed, May 25, 2022 at 11:58:01AM +0300, Serge Semin wrote:
> 
> Ok. I'll have a look at the series on this week.
> 

I kindly remind you about the review.
Thanks.

-- 
Regards,
  Alexander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] ntb_perf: add new 'latency' test set
  2022-06-20 10:20     ` Alexander Fomichev
@ 2022-06-20 14:42       ` Jon Mason
  2022-06-21 14:05         ` Serge Semin
  0 siblings, 1 reply; 12+ messages in thread
From: Jon Mason @ 2022-06-20 14:42 UTC (permalink / raw)
  To: Alexander Fomichev
  Cc: Serge Semin, Dave Jiang, ntb, linux, Allen Hubbe, Guo Zhengkui,
	Alexander Fomichev

On Mon, Jun 20, 2022 at 01:20:14PM +0300, Alexander Fomichev wrote:
> Hi Serge,
> 
> On Wed, May 25, 2022 at 11:58:01AM +0300, Serge Semin wrote:
> > 
> > Ok. I'll have a look at the series on this week.
> > 
> 
> I kindly remind you about the review.

I've pulled this into my ntb branch.  So, barring a nack by Serge,
I'll push it for v5.20.  And if I see an ack by him, I'll try to add
that in as well :)

Thanks,
Jon

> Thanks.
> 
> -- 
> Regards,
>   Alexander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] ntb_perf: add new 'latency' test set
  2022-06-20 14:42       ` Jon Mason
@ 2022-06-21 14:05         ` Serge Semin
  2022-06-22  9:44           ` Alexander Fomichev
  0 siblings, 1 reply; 12+ messages in thread
From: Serge Semin @ 2022-06-21 14:05 UTC (permalink / raw)
  To: Jon Mason
  Cc: Alexander Fomichev, Dave Jiang, ntb, linux, Allen Hubbe,
	Guo Zhengkui, Alexander Fomichev

Hi folks,

On Mon, Jun 20, 2022 at 10:42:54AM -0400, Jon Mason wrote:
> On Mon, Jun 20, 2022 at 01:20:14PM +0300, Alexander Fomichev wrote:
> > Hi Serge,
> > 
> > On Wed, May 25, 2022 at 11:58:01AM +0300, Serge Semin wrote:
> > > 
> > > Ok. I'll have a look at the series on this week.
> > > 
> > 
> > I kindly remind you about the review.

I am very sorry for the huge delay with the review especially after I
promised to have a look at the series one month ago.

> 

> I've pulled this into my ntb branch.  So, barring a nack by Serge,
> I'll push it for v5.20.  And if I see an ack by him, I'll try to add
> that in as well :)

I've thoroughly looked at the entire series and IMO in the current
state I wouldn't recommend to merge it in.
The main issue is that after applying all the changes the ntb_perf
driver will get extended greatly with three additional sub-tests
and thus will loose its coherency. It gets to be obvious after
the patch 2 and 3 applied, which introduce additional client-server
semantic and imply allocating their-own private data. All of that
makes the code much harder to read and breaks the driver specialization.

The latency tests as them-self are very useful though. But it would be
much better to have them implemented in a separate driver
"ntb_latency" or something.

Please also note, there is a special test-script, which rely on the
certain test drivers semantic:
tools/testing/selftests/ntb/ntb_test.sh
It looks like the suggested patches don't change the already defined
ntb_perf DebugFS interface, but still may cause the script to fail if the
latency on the particular system will get measured too high. So should
we have the latency-part in a separate driver, the script won't get
affected by it. If it is required the script could be updated accordingly
taking the new driver specifics into account.

Regarding the tests implementation. As I see it failing the latency
measurements if they're performed with the too few retries isn't a good
idea. Alexander, you said that normally performing 1000 retries is
enough to get the latency with a good precision, but the test driver
returns an error if the number of retries is less than 20. So what
happens between 20 and 1000? The tests get passed, but the results
aren't accurate or what? If so then why don't the test fail in the
case of 30 iterations too? IMO as long as you don't define the strong
accuracy criteria, the failure condition shouldn't be determined by
the number of iterations. So if I were you I would execute the latency
tests with the specified "lat_time_ms" duration and printed a warning
if the number of iterations turned to be too low (100, 200?) most
likely causing to have inaccurate results, but still would calculate
the latency from the determined numbers (even if there were only one
iteration performed).

I am very sorry to spilling it out at this stage. I should have done
it on v1 or v2. Anyway it's up to the driver/subsystem maintainers
(Dave, Jon) to decide whether the suggested update is suitable despite
of all my thoughts.

-Sergey

> 
> Thanks,
> Jon
> 
> > Thanks.
> > 
> > -- 
> > Regards,
> >   Alexander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] ntb_perf: add new 'latency' test set
  2022-06-21 14:05         ` Serge Semin
@ 2022-06-22  9:44           ` Alexander Fomichev
  2022-06-22 20:36             ` Serge Semin
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Fomichev @ 2022-06-22  9:44 UTC (permalink / raw)
  To: Serge Semin
  Cc: Jon Mason, Dave Jiang, ntb, linux, Allen Hubbe, Guo Zhengkui,
	Alexander Fomichev

Hi Serge,

Thank you for the very detailed comments.

On Tue, Jun 21, 2022 at 05:05:37PM +0300, Serge Semin wrote:
> 
> Please also note, there is a special test-script, which rely on the
> certain test drivers semantic:
> tools/testing/selftests/ntb/ntb_test.sh
> It looks like the suggested patches don't change the already defined
> ntb_perf DebugFS interface, but still may cause the script to fail if the
> latency on the particular system will get measured too high. So should
> we have the latency-part in a separate driver, the script won't get
> affected by it. If it is required the script could be updated accordingly
> taking the new driver specifics into account.
>
As described in the cover commit message, the resulting test is fully
backward compatible. I mean that if we don't fiddle with the new sysfs
entries, then no latency measures are performed, and the test works as
it did before the patch set.
Also, I plan to make changes to "ntb_test.sh" script accordingly, after
this patch set is merged.

> Regarding the tests implementation. As I see it failing the latency
> measurements if they're performed with the too few retries isn't a good
> idea. Alexander, you said that normally performing 1000 retries is
> enough to get the latency with a good precision, but the test driver
> returns an error if the number of retries is less than 20. So what
> happens between 20 and 1000? The tests get passed, but the results
> aren't accurate or what? If so then why don't the test fail in the
> case of 30 iterations too? IMO as long as you don't define the strong
> accuracy criteria, the failure condition shouldn't be determined by
> the number of iterations. So if I were you I would execute the latency
> tests with the specified "lat_time_ms" duration and printed a warning
> if the number of iterations turned to be too low (100, 200?) most
> likely causing to have inaccurate results, but still would calculate
> the latency from the determined numbers (even if there were only one
> iteration performed).
> 
Reasonable. I can easily change this part.

> The main issue is that after applying all the changes the ntb_perf
> driver will get extended greatly with three additional sub-tests
> and thus will loose its coherency. It gets to be obvious after
> the patch 2 and 3 applied, which introduce additional client-server
> semantic and imply allocating their-own private data. All of that
> makes the code much harder to read and breaks the driver specialization.
> 
> The latency tests as them-self are very useful though. But it would be
> much better to have them implemented in a separate driver
> "ntb_latency" or something.
>
The whole 'latency' part relies on 'ntb_perf' infrastructure. Moreover,
the first patch adds only one meaningful function.  Thus separatin theg
'latency' part will make me copy a lot of code. As a compromise, I can
offer to put latency-related code into a separate .c file but leave the
whole test in a single module. That should increase readability and
eliminate code duplication.

> I am very sorry to spilling it out at this stage. I should have done
> it on v1 or v2. Anyway it's up to the driver/subsystem maintainers
> (Dave, Jon) to decide whether the suggested update is suitable despite
> of all my thoughts.
> 
Let's call for the third opinion.


-- 
Regards,
  Alexander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] ntb_perf: add new 'latency' test set
  2022-06-22  9:44           ` Alexander Fomichev
@ 2022-06-22 20:36             ` Serge Semin
  2022-08-09 15:43               ` Jon Mason
  0 siblings, 1 reply; 12+ messages in thread
From: Serge Semin @ 2022-06-22 20:36 UTC (permalink / raw)
  To: Alexander Fomichev
  Cc: Jon Mason, Dave Jiang, ntb, linux, Allen Hubbe, Guo Zhengkui,
	Alexander Fomichev

On Wed, Jun 22, 2022 at 12:44:57PM +0300, Alexander Fomichev wrote:
> Hi Serge,
> 
> Thank you for the very detailed comments.
> 
> On Tue, Jun 21, 2022 at 05:05:37PM +0300, Serge Semin wrote:
> > 
> > Please also note, there is a special test-script, which rely on the
> > certain test drivers semantic:
> > tools/testing/selftests/ntb/ntb_test.sh
> > It looks like the suggested patches don't change the already defined
> > ntb_perf DebugFS interface, but still may cause the script to fail if the
> > latency on the particular system will get measured too high. So should
> > we have the latency-part in a separate driver, the script won't get
> > affected by it. If it is required the script could be updated accordingly
> > taking the new driver specifics into account.
> >

> As described in the cover commit message, the resulting test is fully
> backward compatible. I mean that if we don't fiddle with the new sysfs
> entries, then no latency measures are performed, and the test works as
> it did before the patch set.

sysfs? Did you mean debugfs?
Anyway the DebugFS interface will be indeed backward compatible, but
functionally the performance test won't be the same. AFAICS at the
very least the burst-latency test is executed by default together with
the standard performance test. It may cause ntb_test.sh regressions on
the systems (the test will fail if the latency is too high), which
haven't had any problem before.

> Also, I plan to make changes to "ntb_test.sh" script accordingly, after
> this patch set is merged.

Good. It will be much easier to do should you have the latency test
implemented as a separate driver.

> 
> > Regarding the tests implementation. As I see it failing the latency
> > measurements if they're performed with the too few retries isn't a good
> > idea. Alexander, you said that normally performing 1000 retries is
> > enough to get the latency with a good precision, but the test driver
> > returns an error if the number of retries is less than 20. So what
> > happens between 20 and 1000? The tests get passed, but the results
> > aren't accurate or what? If so then why don't the test fail in the
> > case of 30 iterations too? IMO as long as you don't define the strong
> > accuracy criteria, the failure condition shouldn't be determined by
> > the number of iterations. So if I were you I would execute the latency
> > tests with the specified "lat_time_ms" duration and printed a warning
> > if the number of iterations turned to be too low (100, 200?) most
> > likely causing to have inaccurate results, but still would calculate
> > the latency from the determined numbers (even if there were only one
> > iteration performed).
> > 
> Reasonable. I can easily change this part.
> 
> > The main issue is that after applying all the changes the ntb_perf
> > driver will get extended greatly with three additional sub-tests
> > and thus will loose its coherency. It gets to be obvious after
> > the patch 2 and 3 applied, which introduce additional client-server
> > semantic and imply allocating their-own private data. All of that
> > makes the code much harder to read and breaks the driver specialization.
> > 
> > The latency tests as them-self are very useful though. But it would be
> > much better to have them implemented in a separate driver
> > "ntb_latency" or something.
> >

> The whole 'latency' part relies on 'ntb_perf' infrastructure.

Yeah, it's very handy, isn't it? =)
Originally it has been created in a way so the perf-test would be
portable across all the supported NTB HW types: local- and peer-based
MW xlate address setup (Intel/AMD/Switchtec NTB HW vs IDT NTB HW),
Scratchpad and Messages capable devices (Intel/AMD/Switchtec NTB HW vs
IDT NTB HW). My idea was to provide a reference of how a portable NTB
application could be designed. (It took some hard time to debug that
part on the Intel/AMD hardware since nobody of the current maintainers
had an access to one at that moment.) On the next step I was going to
move the communication part of the ntb_perf driver to a separate
kernel module as a communication library (which could be used for the
inter-domains basic communications on top of the DB+Spad/MSG
interface), but alas didn't find a time to get to work on it.

> Moreover,
> the first patch adds only one meaningful function.  Thus separatin theg
> 'latency' part will make me copy a lot of code. As a compromise, I can
> offer to put latency-related code into a separate .c file but leave the
> whole test in a single module. That should increase readability and
> eliminate code duplication.

What would be much better if you detached the infrastructure part into
a separate module, which could be afterwards used by the ntb_perf and
your ntb_latency drivers.

What would be the best if you created it as a kernel library with a
well-defined interface, which could be used not only by the test
drivers, but by any kernel application (most importantly by the NTB
transport driver) for the basic communications (like MW xlate address
exchange, portable MW xlate address setup, etc).

> 
> > I am very sorry to spilling it out at this stage. I should have done
> > it on v1 or v2. Anyway it's up to the driver/subsystem maintainers
> > (Dave, Jon) to decide whether the suggested update is suitable despite
> > of all my thoughts.
> > 

> Let's call for the third opinion.

Ok.

-Sergey

> 
> 
> -- 
> Regards,
>   Alexander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] ntb_perf: add new 'latency' test set
  2022-06-22 20:36             ` Serge Semin
@ 2022-08-09 15:43               ` Jon Mason
  0 siblings, 0 replies; 12+ messages in thread
From: Jon Mason @ 2022-08-09 15:43 UTC (permalink / raw)
  To: Serge Semin
  Cc: Alexander Fomichev, Dave Jiang, ntb, linux, Allen Hubbe,
	Guo Zhengkui, Alexander Fomichev

On Wed, Jun 22, 2022 at 4:36 PM Serge Semin <fancer.lancer@gmail.com> wrote:
>
> On Wed, Jun 22, 2022 at 12:44:57PM +0300, Alexander Fomichev wrote:
> > Hi Serge,
> >
> > Thank you for the very detailed comments.
> >
> > On Tue, Jun 21, 2022 at 05:05:37PM +0300, Serge Semin wrote:
> > >
> > > Please also note, there is a special test-script, which rely on the
> > > certain test drivers semantic:
> > > tools/testing/selftests/ntb/ntb_test.sh
> > > It looks like the suggested patches don't change the already defined
> > > ntb_perf DebugFS interface, but still may cause the script to fail if the
> > > latency on the particular system will get measured too high. So should
> > > we have the latency-part in a separate driver, the script won't get
> > > affected by it. If it is required the script could be updated accordingly
> > > taking the new driver specifics into account.
> > >
>
> > As described in the cover commit message, the resulting test is fully
> > backward compatible. I mean that if we don't fiddle with the new sysfs
> > entries, then no latency measures are performed, and the test works as
> > it did before the patch set.
>
> sysfs? Did you mean debugfs?
> Anyway the DebugFS interface will be indeed backward compatible, but
> functionally the performance test won't be the same. AFAICS at the
> very least the burst-latency test is executed by default together with
> the standard performance test. It may cause ntb_test.sh regressions on
> the systems (the test will fail if the latency is too high), which
> haven't had any problem before.
>
> > Also, I plan to make changes to "ntb_test.sh" script accordingly, after
> > this patch set is merged.
>
> Good. It will be much easier to do should you have the latency test
> implemented as a separate driver.
>
> >
> > > Regarding the tests implementation. As I see it failing the latency
> > > measurements if they're performed with the too few retries isn't a good
> > > idea. Alexander, you said that normally performing 1000 retries is
> > > enough to get the latency with a good precision, but the test driver
> > > returns an error if the number of retries is less than 20. So what
> > > happens between 20 and 1000? The tests get passed, but the results
> > > aren't accurate or what? If so then why don't the test fail in the
> > > case of 30 iterations too? IMO as long as you don't define the strong
> > > accuracy criteria, the failure condition shouldn't be determined by
> > > the number of iterations. So if I were you I would execute the latency
> > > tests with the specified "lat_time_ms" duration and printed a warning
> > > if the number of iterations turned to be too low (100, 200?) most
> > > likely causing to have inaccurate results, but still would calculate
> > > the latency from the determined numbers (even if there were only one
> > > iteration performed).
> > >
> > Reasonable. I can easily change this part.
> >
> > > The main issue is that after applying all the changes the ntb_perf
> > > driver will get extended greatly with three additional sub-tests
> > > and thus will loose its coherency. It gets to be obvious after
> > > the patch 2 and 3 applied, which introduce additional client-server
> > > semantic and imply allocating their-own private data. All of that
> > > makes the code much harder to read and breaks the driver specialization.
> > >
> > > The latency tests as them-self are very useful though. But it would be
> > > much better to have them implemented in a separate driver
> > > "ntb_latency" or something.
> > >
>
> > The whole 'latency' part relies on 'ntb_perf' infrastructure.
>
> Yeah, it's very handy, isn't it? =)
> Originally it has been created in a way so the perf-test would be
> portable across all the supported NTB HW types: local- and peer-based
> MW xlate address setup (Intel/AMD/Switchtec NTB HW vs IDT NTB HW),
> Scratchpad and Messages capable devices (Intel/AMD/Switchtec NTB HW vs
> IDT NTB HW). My idea was to provide a reference of how a portable NTB
> application could be designed. (It took some hard time to debug that
> part on the Intel/AMD hardware since nobody of the current maintainers
> had an access to one at that moment.) On the next step I was going to
> move the communication part of the ntb_perf driver to a separate
> kernel module as a communication library (which could be used for the
> inter-domains basic communications on top of the DB+Spad/MSG
> interface), but alas didn't find a time to get to work on it.
>
> > Moreover,
> > the first patch adds only one meaningful function.  Thus separatin theg
> > 'latency' part will make me copy a lot of code. As a compromise, I can
> > offer to put latency-related code into a separate .c file but leave the
> > whole test in a single module. That should increase readability and
> > eliminate code duplication.
>
> What would be much better if you detached the infrastructure part into
> a separate module, which could be afterwards used by the ntb_perf and
> your ntb_latency drivers.
>
> What would be the best if you created it as a kernel library with a
> well-defined interface, which could be used not only by the test
> drivers, but by any kernel application (most importantly by the NTB
> transport driver) for the basic communications (like MW xlate address
> exchange, portable MW xlate address setup, etc).
>
> >
> > > I am very sorry to spilling it out at this stage. I should have done
> > > it on v1 or v2. Anyway it's up to the driver/subsystem maintainers
> > > (Dave, Jon) to decide whether the suggested update is suitable despite
> > > of all my thoughts.
> > >
>
> > Let's call for the third opinion.
>
> Ok.
>
> -Sergey

The merge window is closing soon. I'm going to drop the v3 that I
currently have and let this cook for a little longer.  Let's do a v4
with all of the feedback that has been agreed on, and then we can take
a look again.

Thanks,
Jon

> >
> >
> > --
> > Regards,
> >   Alexander

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-08-09 15:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-13 19:37 [PATCH v3 0/3] ntb_perf: add new 'latency' test set Alexander Fomichev
2022-05-13 19:37 ` [PATCH v3 1/3] ntb_perf: extend with burst latency measurement Alexander Fomichev
2022-05-13 19:37 ` [PATCH v3 2/3] ntb_perf: extend with poll " Alexander Fomichev
2022-05-13 19:37 ` [PATCH v3 3/3] ntb_perf: extend with doorbell " Alexander Fomichev
2022-05-23 18:38 ` [PATCH v3 0/3] ntb_perf: add new 'latency' test set Dave Jiang
2022-05-25  8:58   ` Serge Semin
2022-06-20 10:20     ` Alexander Fomichev
2022-06-20 14:42       ` Jon Mason
2022-06-21 14:05         ` Serge Semin
2022-06-22  9:44           ` Alexander Fomichev
2022-06-22 20:36             ` Serge Semin
2022-08-09 15:43               ` Jon Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).