linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] ntb perf, ntb tool and ntb-hw improvements
@ 2020-03-10 20:54 Sanjay R Mehta
  2020-03-10 20:54 ` [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA transfers Sanjay R Mehta
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Sanjay R Mehta @ 2020-03-10 20:54 UTC (permalink / raw)
  To: jdmason, dave.jiang, allenbh, arindam.nath, logang, Shyam-sundar.S-k
  Cc: linux-ntb, linux-kernel, Sanjay R Mehta

This patch series modifies the ntb perf code to
have separate functions for CPU and DMA transfers.
It also adds handling for -EAGAIN error code so
that we re-try sending commands later.

Fixes have been made to ntb_perf, ntb_tool and 
ntb_hw* files.

The patches are based on Linus' tree,

git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

v2: Incorporated improvements suggested by Logan Gunthorpe

Arindam Nath (2):
  ntb_perf: refactor code for CPU and DMA transfers
  ntb_perf: send command in response to EAGAIN

Logan Gunthorpe (1):
  ntb: hw: remove the code that sets the DMA mask

Sanjay R Mehta (2):
  ntb_perf: pass correct struct device to dma_alloc_coherent
  ntb_tool: pass correct struct device to dma_alloc_coherent

 drivers/ntb/hw/amd/ntb_hw_amd.c    |   4 -
 drivers/ntb/hw/idt/ntb_hw_idt.c    |   6 --
 drivers/ntb/hw/intel/ntb_hw_gen1.c |   4 -
 drivers/ntb/test/ntb_perf.c        | 167 +++++++++++++++++++++++++++----------
 drivers/ntb/test/ntb_tool.c        |   6 +-
 5 files changed, 126 insertions(+), 61 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA transfers
  2020-03-10 20:54 [PATCH v2 0/5] ntb perf, ntb tool and ntb-hw improvements Sanjay R Mehta
@ 2020-03-10 20:54 ` Sanjay R Mehta
  2020-03-10 21:21   ` Logan Gunthorpe
  2020-03-10 20:54 ` [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN Sanjay R Mehta
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Sanjay R Mehta @ 2020-03-10 20:54 UTC (permalink / raw)
  To: jdmason, dave.jiang, allenbh, arindam.nath, logang, Shyam-sundar.S-k
  Cc: linux-ntb, linux-kernel, Sanjay R Mehta

From: Arindam Nath <arindam.nath@amd.com>

This patch creates separate function to handle CPU
and DMA transfers. Since CPU transfers use memcopy
and DMA transfers use dmaengine APIs, these changes
not only allow logical separation between the two,
but also allows someone to clearly see the difference
in the way the two are handled.

In the case of DMA, we DMA from system memory to the
memory window(MW) of NTB, which is a MMIO region, we
should not use dma_map_page() for mapping MW. The
correct way to map a MMIO region is to use
dma_map_resource(), so the code is modified
accordingly.

dma_map_resource() expects physical address of the
region to be mapped for DMA, we add a new field,
outbuf_phys_addr, to struct perf_peer, and also
another field, outbuf_dma_addr, to store the
corresponding mapped address returned by the API.

Since the MW is contiguous, rather than mapping
chunk-by-chunk, we map the entire MW before the
actual DMA transfer happens. Then for each chunk,
we simply pass offset into the mapped region and
DMA to that region. Then later, we unmap the MW
during perf_clear_test().

The above means that now we need to have different
function parameters to deal with in the case of
CPU and DMA transfers. In the case of CPU transfers,
we simply need the CPU virtual addresses for memcopy,
but in the case of DMA, we need dma_addr_t, which
will be different from CPU physical address depending
on whether IOMMU is enabled or not. Thus we now
have two separate functions, perf_copy_chunk_cpu(),
and perf_copy_chunk_dma() to take care of above
consideration.

Signed-off-by: Arindam Nath <arindam.nath@amd.com>
Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
---
 drivers/ntb/test/ntb_perf.c | 141 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 105 insertions(+), 36 deletions(-)

diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index e9b7c2d..6d16628 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -149,6 +149,8 @@ struct perf_peer {
 	u64 outbuf_xlat;
 	resource_size_t outbuf_size;
 	void __iomem *outbuf;
+	phys_addr_t outbuf_phys_addr;
+	dma_addr_t outbuf_dma_addr;
 
 	/* Inbound MW params */
 	dma_addr_t inbuf_xlat;
@@ -775,26 +777,24 @@ static void perf_dma_copy_callback(void *data)
 	wake_up(&pthr->dma_wait);
 }
 
-static int perf_copy_chunk(struct perf_thread *pthr,
-			   void __iomem *dst, void *src, size_t len)
+static int perf_copy_chunk_cpu(struct perf_thread *pthr,
+			       void __iomem *dst, void *src, size_t len)
+{
+	memcpy_toio(dst, src, len);
+
+	return likely(atomic_read(&pthr->perf->tsync) > 0) ? 0 : -EINTR;
+}
+
+static int perf_copy_chunk_dma(struct perf_thread *pthr,
+			       dma_addr_t dst, void *src, size_t len)
 {
 	struct dma_async_tx_descriptor *tx;
 	struct dmaengine_unmap_data *unmap;
 	struct device *dma_dev;
 	int try = 0, ret = 0;
 
-	if (!use_dma) {
-		memcpy_toio(dst, src, len);
-		goto ret_check_tsync;
-	}
-
 	dma_dev = pthr->dma_chan->device->dev;
-
-	if (!is_dma_copy_aligned(pthr->dma_chan->device, offset_in_page(src),
-				 offset_in_page(dst), len))
-		return -EIO;
-
-	unmap = dmaengine_get_unmap_data(dma_dev, 2, GFP_NOWAIT);
+	unmap = dmaengine_get_unmap_data(dma_dev, 1, GFP_NOWAIT);
 	if (!unmap)
 		return -ENOMEM;
 
@@ -807,17 +807,12 @@ static int perf_copy_chunk(struct perf_thread *pthr,
 	}
 	unmap->to_cnt = 1;
 
-	unmap->addr[1] = dma_map_page(dma_dev, virt_to_page(dst),
-		offset_in_page(dst), len, DMA_FROM_DEVICE);
-	if (dma_mapping_error(dma_dev, unmap->addr[1])) {
-		ret = -EIO;
-		goto err_free_resource;
-	}
-	unmap->from_cnt = 1;
-
 	do {
-		tx = dmaengine_prep_dma_memcpy(pthr->dma_chan, unmap->addr[1],
-			unmap->addr[0], len, DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+		tx = dmaengine_prep_dma_memcpy(pthr->dma_chan, dst,
+					       unmap->addr[0], len,
+					       DMA_PREP_INTERRUPT |
+					       DMA_CTRL_ACK);
+
 		if (!tx)
 			msleep(DMA_MDELAY);
 	} while (!tx && (try++ < DMA_TRIES));
@@ -833,22 +828,16 @@ static int perf_copy_chunk(struct perf_thread *pthr,
 
 	ret = dma_submit_error(dmaengine_submit(tx));
 	if (ret) {
-		dmaengine_unmap_put(unmap);
 		goto err_free_resource;
 	}
 
-	dmaengine_unmap_put(unmap);
-
 	atomic_inc(&pthr->dma_sync);
 	dma_async_issue_pending(pthr->dma_chan);
 
-ret_check_tsync:
-	return likely(atomic_read(&pthr->perf->tsync) > 0) ? 0 : -EINTR;
-
 err_free_resource:
 	dmaengine_unmap_put(unmap);
 
-	return ret;
+	return likely(atomic_read(&pthr->perf->tsync) > 0) ? ret : -EINTR;
 }
 
 static bool perf_dma_filter(struct dma_chan *chan, void *data)
@@ -893,7 +882,7 @@ static int perf_init_test(struct perf_thread *pthr)
 	return 0;
 }
 
-static int perf_run_test(struct perf_thread *pthr)
+static int perf_run_test_cpu(struct perf_thread *pthr)
 {
 	struct perf_peer *peer = pthr->perf->test_peer;
 	struct perf_ctx *perf = pthr->perf;
@@ -914,7 +903,7 @@ static int perf_run_test(struct perf_thread *pthr)
 
 	/* Copied field is cleared on test launch stage */
 	while (pthr->copied < total_size) {
-		ret = perf_copy_chunk(pthr, flt_dst, flt_src, chunk_size);
+		ret = perf_copy_chunk_cpu(pthr, flt_dst, flt_src, chunk_size);
 		if (ret) {
 			dev_err(&perf->ntb->dev, "%d: Got error %d on test\n",
 				pthr->tidx, ret);
@@ -937,6 +926,74 @@ static int perf_run_test(struct perf_thread *pthr)
 	return 0;
 }
 
+static int perf_run_test_dma(struct perf_thread *pthr)
+{
+	struct perf_peer *peer = pthr->perf->test_peer;
+	struct perf_ctx *perf = pthr->perf;
+	struct device *dma_dev;
+	dma_addr_t flt_dst, bnd_dst;
+	u64 total_size, chunk_size;
+	void *flt_src;
+	int ret = 0;
+
+	total_size = 1ULL << total_order;
+	chunk_size = 1ULL << chunk_order;
+	chunk_size = min_t(u64, peer->outbuf_size, chunk_size);
+
+	/* Map MW for DMA */
+	dma_dev = pthr->dma_chan->device->dev;
+	peer->outbuf_dma_addr = dma_map_resource(dma_dev,
+						 peer->outbuf_phys_addr,
+						 peer->outbuf_size,
+						 DMA_FROM_DEVICE, 0);
+	if (dma_mapping_error(dma_dev, peer->outbuf_dma_addr)) {
+		dma_unmap_resource(dma_dev, peer->outbuf_dma_addr,
+				   peer->outbuf_size, DMA_FROM_DEVICE, 0);
+		return -EIO;
+	}
+
+	flt_src = pthr->src;
+	bnd_dst = peer->outbuf_dma_addr + peer->outbuf_size;
+	flt_dst = peer->outbuf_dma_addr;
+
+	pthr->duration = ktime_get();
+	/* Copied field is cleared on test launch stage */
+	while (pthr->copied < total_size) {
+		ret = perf_copy_chunk_dma(pthr, flt_dst, flt_src, chunk_size);
+		if (ret) {
+			dev_err(&perf->ntb->dev, "%d: Got error %d on test\n",
+				pthr->tidx, ret);
+			return ret;
+		}
+
+		pthr->copied += chunk_size;
+
+		flt_dst += chunk_size;
+		flt_src += chunk_size;
+		if (flt_dst >= bnd_dst || flt_dst < peer->outbuf_dma_addr) {
+			flt_dst = peer->outbuf_dma_addr;
+			flt_src = pthr->src;
+		}
+
+		/* Give up CPU to give a chance for other threads to use it */
+		schedule();
+	}
+
+	return 0;
+}
+
+static int perf_run_test(struct perf_thread *pthr)
+{
+	int ret = 0;
+
+	if (!use_dma)
+		ret = perf_run_test_cpu(pthr);
+	else
+		ret = perf_run_test_dma(pthr);
+
+	return ret;
+}
+
 static int perf_sync_test(struct perf_thread *pthr)
 {
 	struct perf_ctx *perf = pthr->perf;
@@ -969,6 +1026,8 @@ static int perf_sync_test(struct perf_thread *pthr)
 static void perf_clear_test(struct perf_thread *pthr)
 {
 	struct perf_ctx *perf = pthr->perf;
+	struct perf_peer *peer = pthr->perf->test_peer;
+	struct device *dma_dev;
 
 	if (!use_dma)
 		goto no_dma_notify;
@@ -978,6 +1037,10 @@ static void perf_clear_test(struct perf_thread *pthr)
 	 * We call it anyway just to be sure of the transfers completion.
 	 */
 	(void)dmaengine_terminate_sync(pthr->dma_chan);
+	/* Un-map MW */
+	dma_dev = pthr->dma_chan->device->dev;
+	dma_unmap_resource(dma_dev, peer->outbuf_dma_addr, peer->outbuf_size,
+			   DMA_FROM_DEVICE, 0);
 
 	dma_release_channel(pthr->dma_chan);
 
@@ -1383,10 +1446,16 @@ static int perf_setup_peer_mw(struct perf_peer *peer)
 	if (ret)
 		return ret;
 
-	peer->outbuf = devm_ioremap_wc(&perf->ntb->dev, phys_addr,
-					peer->outbuf_size);
-	if (!peer->outbuf)
-		return -ENOMEM;
+	if (use_dma) {
+		/* For DMA to/from MW */
+		peer->outbuf_phys_addr = phys_addr;
+	} else {
+		/* For CPU read(from)/write(to) MW */
+		peer->outbuf = devm_ioremap_wc(&perf->ntb->dev, phys_addr,
+					       peer->outbuf_size);
+		if (!peer->outbuf)
+			return -ENOMEM;
+	}
 
 	if (max_mw_size && peer->outbuf_size > max_mw_size) {
 		peer->outbuf_size = max_mw_size;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN
  2020-03-10 20:54 [PATCH v2 0/5] ntb perf, ntb tool and ntb-hw improvements Sanjay R Mehta
  2020-03-10 20:54 ` [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA transfers Sanjay R Mehta
@ 2020-03-10 20:54 ` Sanjay R Mehta
  2020-03-10 21:31   ` Logan Gunthorpe
  2020-03-10 20:54 ` [PATCH v2 3/5] ntb_perf: pass correct struct device to dma_alloc_coherent Sanjay R Mehta
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Sanjay R Mehta @ 2020-03-10 20:54 UTC (permalink / raw)
  To: jdmason, dave.jiang, allenbh, arindam.nath, logang, Shyam-sundar.S-k
  Cc: linux-ntb, linux-kernel, Sanjay R Mehta

From: Arindam Nath <arindam.nath@amd.com>

perf_spad_cmd_send() and perf_msg_cmd_send() return
-EAGAIN after trying to send commands for a maximum
of MSG_TRIES re-tries. But currently there is no
handling for this error. These functions are invoked
from perf_service_work() through function pointers,
so rather than simply call these functions is not
enough. We need to make sure to invoke them again in
case of -EAGAIN. Since peer status bits were cleared
before calling these functions, we set the same status
bits before queueing the work again for later invocation.
This way we simply won't go ahead and initialize the
XLAT registers wrongfully in case sending the very first
command itself fails.

Signed-off-by: Arindam Nath <arindam.nath@amd.com>
Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
---
 drivers/ntb/test/ntb_perf.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index 6d16628..9068e42 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -625,14 +625,24 @@ static void perf_service_work(struct work_struct *work)
 {
 	struct perf_peer *peer = to_peer_service(work);
 
-	if (test_and_clear_bit(PERF_CMD_SSIZE, &peer->sts))
-		perf_cmd_send(peer, PERF_CMD_SSIZE, peer->outbuf_size);
+	if (test_and_clear_bit(PERF_CMD_SSIZE, &peer->sts)) {
+		if (perf_cmd_send(peer, PERF_CMD_SSIZE, peer->outbuf_size)
+		    == -EAGAIN) {
+			set_bit(PERF_CMD_SSIZE, &peer->sts);
+			(void)queue_work(system_highpri_wq, &peer->service);
+		}
+	}
 
 	if (test_and_clear_bit(PERF_CMD_RSIZE, &peer->sts))
 		perf_setup_inbuf(peer);
 
-	if (test_and_clear_bit(PERF_CMD_SXLAT, &peer->sts))
-		perf_cmd_send(peer, PERF_CMD_SXLAT, peer->inbuf_xlat);
+	if (test_and_clear_bit(PERF_CMD_SXLAT, &peer->sts)) {
+		if (perf_cmd_send(peer, PERF_CMD_SXLAT, peer->inbuf_xlat)
+		    == -EAGAIN) {
+			set_bit(PERF_CMD_SXLAT, &peer->sts);
+			(void)queue_work(system_highpri_wq, &peer->service);
+		}
+	}
 
 	if (test_and_clear_bit(PERF_CMD_RXLAT, &peer->sts))
 		perf_setup_outbuf(peer);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/5] ntb_perf: pass correct struct device to dma_alloc_coherent
  2020-03-10 20:54 [PATCH v2 0/5] ntb perf, ntb tool and ntb-hw improvements Sanjay R Mehta
  2020-03-10 20:54 ` [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA transfers Sanjay R Mehta
  2020-03-10 20:54 ` [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN Sanjay R Mehta
@ 2020-03-10 20:54 ` Sanjay R Mehta
  2020-03-10 20:54 ` [PATCH v2 4/5] ntb_tool: " Sanjay R Mehta
  2020-03-10 20:54 ` [PATCH v2 5/5] ntb: hw: remove the code that sets the DMA mask Sanjay R Mehta
  4 siblings, 0 replies; 12+ messages in thread
From: Sanjay R Mehta @ 2020-03-10 20:54 UTC (permalink / raw)
  To: jdmason, dave.jiang, allenbh, arindam.nath, logang, Shyam-sundar.S-k
  Cc: linux-ntb, linux-kernel, Sanjay R Mehta

Currently, ntb->dev is passed to dma_alloc_coherent
and dma_free_coherent calls. The returned dma_addr_t
is the CPU physical address. This works fine as long
as IOMMU is disabled. But when IOMMU is enabled, we
need to make sure that IOVA is returned for dma_addr_t.
So the correct way to achieve this is by changing the
first parameter of dma_alloc_coherent() as ntb->pdev->dev
instead.

Fixes: 5648e56 ("NTB: ntb_perf: Add full multi-port NTB API support")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
Signed-off-by: Arindam Nath <arindam.nath@amd.com>
---
 drivers/ntb/test/ntb_perf.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index 9068e42..c193d62 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -558,7 +558,7 @@ static void perf_free_inbuf(struct perf_peer *peer)
 		return;
 
 	(void)ntb_mw_clear_trans(peer->perf->ntb, peer->pidx, peer->gidx);
-	dma_free_coherent(&peer->perf->ntb->dev, peer->inbuf_size,
+	dma_free_coherent(&peer->perf->ntb->pdev->dev, peer->inbuf_size,
 			  peer->inbuf, peer->inbuf_xlat);
 	peer->inbuf = NULL;
 }
@@ -587,8 +587,9 @@ static int perf_setup_inbuf(struct perf_peer *peer)
 
 	perf_free_inbuf(peer);
 
-	peer->inbuf = dma_alloc_coherent(&perf->ntb->dev, peer->inbuf_size,
-					 &peer->inbuf_xlat, GFP_KERNEL);
+	peer->inbuf = dma_alloc_coherent(&perf->ntb->pdev->dev,
+					 peer->inbuf_size, &peer->inbuf_xlat,
+					 GFP_KERNEL);
 	if (!peer->inbuf) {
 		dev_err(&perf->ntb->dev, "Failed to alloc inbuf of %pa\n",
 			&peer->inbuf_size);
@@ -1596,4 +1597,3 @@ static void __exit perf_exit(void)
 	destroy_workqueue(perf_wq);
 }
 module_exit(perf_exit);
-
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/5] ntb_tool: pass correct struct device to dma_alloc_coherent
  2020-03-10 20:54 [PATCH v2 0/5] ntb perf, ntb tool and ntb-hw improvements Sanjay R Mehta
                   ` (2 preceding siblings ...)
  2020-03-10 20:54 ` [PATCH v2 3/5] ntb_perf: pass correct struct device to dma_alloc_coherent Sanjay R Mehta
@ 2020-03-10 20:54 ` Sanjay R Mehta
  2020-03-10 20:54 ` [PATCH v2 5/5] ntb: hw: remove the code that sets the DMA mask Sanjay R Mehta
  4 siblings, 0 replies; 12+ messages in thread
From: Sanjay R Mehta @ 2020-03-10 20:54 UTC (permalink / raw)
  To: jdmason, dave.jiang, allenbh, arindam.nath, logang, Shyam-sundar.S-k
  Cc: linux-ntb, linux-kernel, Sanjay R Mehta

Currently, ntb->dev is passed to dma_alloc_coherent
and dma_free_coherent calls. The returned dma_addr_t
is the CPU physical address. This works fine as long
as IOMMU is disabled. But when IOMMU is enabled, we
need to make sure that IOVA is returned for dma_addr_t.
So the correct way to achieve this is by changing the
first parameter of dma_alloc_coherent() as ntb->pdev->dev
instead.

Fixes: 5648e56 ("NTB: ntb_perf: Add full multi-port NTB API support")
Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
Signed-off-by: Arindam Nath <arindam.nath@amd.com>
---
 drivers/ntb/test/ntb_tool.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/ntb/test/ntb_tool.c b/drivers/ntb/test/ntb_tool.c
index d592c0f..025747c 100644
--- a/drivers/ntb/test/ntb_tool.c
+++ b/drivers/ntb/test/ntb_tool.c
@@ -590,7 +590,7 @@ static int tool_setup_mw(struct tool_ctx *tc, int pidx, int widx,
 	inmw->size = min_t(resource_size_t, req_size, size);
 	inmw->size = round_up(inmw->size, addr_align);
 	inmw->size = round_up(inmw->size, size_align);
-	inmw->mm_base = dma_alloc_coherent(&tc->ntb->dev, inmw->size,
+	inmw->mm_base = dma_alloc_coherent(&tc->ntb->pdev->dev, inmw->size,
 					   &inmw->dma_base, GFP_KERNEL);
 	if (!inmw->mm_base)
 		return -ENOMEM;
@@ -612,7 +612,7 @@ static int tool_setup_mw(struct tool_ctx *tc, int pidx, int widx,
 	return 0;
 
 err_free_dma:
-	dma_free_coherent(&tc->ntb->dev, inmw->size, inmw->mm_base,
+	dma_free_coherent(&tc->ntb->pdev->dev, inmw->size, inmw->mm_base,
 			  inmw->dma_base);
 	inmw->mm_base = NULL;
 	inmw->dma_base = 0;
@@ -629,7 +629,7 @@ static void tool_free_mw(struct tool_ctx *tc, int pidx, int widx)
 
 	if (inmw->mm_base != NULL) {
 		ntb_mw_clear_trans(tc->ntb, pidx, widx);
-		dma_free_coherent(&tc->ntb->dev, inmw->size,
+		dma_free_coherent(&tc->ntb->pdev->dev, inmw->size,
 				  inmw->mm_base, inmw->dma_base);
 	}
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 5/5] ntb: hw: remove the code that sets the DMA mask
  2020-03-10 20:54 [PATCH v2 0/5] ntb perf, ntb tool and ntb-hw improvements Sanjay R Mehta
                   ` (3 preceding siblings ...)
  2020-03-10 20:54 ` [PATCH v2 4/5] ntb_tool: " Sanjay R Mehta
@ 2020-03-10 20:54 ` Sanjay R Mehta
  4 siblings, 0 replies; 12+ messages in thread
From: Sanjay R Mehta @ 2020-03-10 20:54 UTC (permalink / raw)
  To: jdmason, dave.jiang, allenbh, arindam.nath, logang, Shyam-sundar.S-k
  Cc: linux-ntb, linux-kernel, Sanjay R Mehta

From: Logan Gunthorpe <logang@deltatee.com>

This patch removes the code that sets the DMA mask as it no longer
makes sense to do this.

Fixes: 7f46c8b3a552 ("NTB: ntb_tool: Add full multi-port NTB API support")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Tested-by: Alexander Fomichev <fomichev.ru@gmail.com>
Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
---
 drivers/ntb/hw/amd/ntb_hw_amd.c    | 4 ----
 drivers/ntb/hw/idt/ntb_hw_idt.c    | 6 ------
 drivers/ntb/hw/intel/ntb_hw_gen1.c | 4 ----
 3 files changed, 14 deletions(-)

diff --git a/drivers/ntb/hw/amd/ntb_hw_amd.c b/drivers/ntb/hw/amd/ntb_hw_amd.c
index e52b300..a3ae59a 100644
--- a/drivers/ntb/hw/amd/ntb_hw_amd.c
+++ b/drivers/ntb/hw/amd/ntb_hw_amd.c
@@ -1020,10 +1020,6 @@ static int amd_ntb_init_pci(struct amd_ntb_dev *ndev,
 			goto err_dma_mask;
 		dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
 	}
-	rc = dma_coerce_mask_and_coherent(&ndev->ntb.dev,
-					  dma_get_mask(&pdev->dev));
-	if (rc)
-		goto err_dma_mask;
 
 	ndev->self_mmio = pci_iomap(pdev, 0, 0);
 	if (!ndev->self_mmio) {
diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index dcf2346..a86600d 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2660,12 +2660,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
 		dev_warn(&pdev->dev,
 			"Cannot set consistent DMA highmem bit mask\n");
 	}
-	ret = dma_coerce_mask_and_coherent(&ndev->ntb.dev,
-					   dma_get_mask(&pdev->dev));
-	if (ret != 0) {
-		dev_err(&pdev->dev, "Failed to set NTB device DMA bit mask\n");
-		return ret;
-	}
 
 	/*
 	 * Enable the device advanced error reporting. It's not critical to
diff --git a/drivers/ntb/hw/intel/ntb_hw_gen1.c b/drivers/ntb/hw/intel/ntb_hw_gen1.c
index bb57ec2..e053012 100644
--- a/drivers/ntb/hw/intel/ntb_hw_gen1.c
+++ b/drivers/ntb/hw/intel/ntb_hw_gen1.c
@@ -1783,10 +1783,6 @@ static int intel_ntb_init_pci(struct intel_ntb_dev *ndev, struct pci_dev *pdev)
 			goto err_dma_mask;
 		dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
 	}
-	rc = dma_coerce_mask_and_coherent(&ndev->ntb.dev,
-					  dma_get_mask(&pdev->dev));
-	if (rc)
-		goto err_dma_mask;
 
 	ndev->self_mmio = pci_iomap(pdev, 0, 0);
 	if (!ndev->self_mmio) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA transfers
  2020-03-10 20:54 ` [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA transfers Sanjay R Mehta
@ 2020-03-10 21:21   ` Logan Gunthorpe
  2020-03-11 17:44     ` Nath, Arindam
  0 siblings, 1 reply; 12+ messages in thread
From: Logan Gunthorpe @ 2020-03-10 21:21 UTC (permalink / raw)
  To: Sanjay R Mehta, jdmason, dave.jiang, allenbh, arindam.nath,
	Shyam-sundar.S-k
  Cc: linux-ntb, linux-kernel



On 2020-03-10 2:54 p.m., Sanjay R Mehta wrote:
> From: Arindam Nath <arindam.nath@amd.com>
> 
> This patch creates separate function to handle CPU
> and DMA transfers. Since CPU transfers use memcopy
> and DMA transfers use dmaengine APIs, these changes
> not only allow logical separation between the two,
> but also allows someone to clearly see the difference
> in the way the two are handled.
> 
> In the case of DMA, we DMA from system memory to the
> memory window(MW) of NTB, which is a MMIO region, we
> should not use dma_map_page() for mapping MW. The
> correct way to map a MMIO region is to use
> dma_map_resource(), so the code is modified
> accordingly.
> 
> dma_map_resource() expects physical address of the
> region to be mapped for DMA, we add a new field,
> outbuf_phys_addr, to struct perf_peer, and also
> another field, outbuf_dma_addr, to store the
> corresponding mapped address returned by the API.
> 
> Since the MW is contiguous, rather than mapping
> chunk-by-chunk, we map the entire MW before the
> actual DMA transfer happens. Then for each chunk,
> we simply pass offset into the mapped region and
> DMA to that region. Then later, we unmap the MW
> during perf_clear_test().
> 
> The above means that now we need to have different
> function parameters to deal with in the case of
> CPU and DMA transfers. In the case of CPU transfers,
> we simply need the CPU virtual addresses for memcopy,
> but in the case of DMA, we need dma_addr_t, which
> will be different from CPU physical address depending
> on whether IOMMU is enabled or not. Thus we now
> have two separate functions, perf_copy_chunk_cpu(),
> and perf_copy_chunk_dma() to take care of above
> consideration.
> 
> Signed-off-by: Arindam Nath <arindam.nath@amd.com>
> Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
> ---
>  drivers/ntb/test/ntb_perf.c | 141 +++++++++++++++++++++++++++++++++-----------
>  1 file changed, 105 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
> index e9b7c2d..6d16628 100644
> --- a/drivers/ntb/test/ntb_perf.c
> +++ b/drivers/ntb/test/ntb_perf.c
> @@ -149,6 +149,8 @@ struct perf_peer {
>  	u64 outbuf_xlat;
>  	resource_size_t outbuf_size;
>  	void __iomem *outbuf;
> +	phys_addr_t outbuf_phys_addr;
> +	dma_addr_t outbuf_dma_addr;
>  
>  	/* Inbound MW params */
>  	dma_addr_t inbuf_xlat;
> @@ -775,26 +777,24 @@ static void perf_dma_copy_callback(void *data)
>  	wake_up(&pthr->dma_wait);
>  }
>  
> -static int perf_copy_chunk(struct perf_thread *pthr,
> -			   void __iomem *dst, void *src, size_t len)
> +static int perf_copy_chunk_cpu(struct perf_thread *pthr,
> +			       void __iomem *dst, void *src, size_t len)
> +{
> +	memcpy_toio(dst, src, len);
> +
> +	return likely(atomic_read(&pthr->perf->tsync) > 0) ? 0 : -EINTR;
> +}
> +
> +static int perf_copy_chunk_dma(struct perf_thread *pthr,
> +			       dma_addr_t dst, void *src, size_t len)
>  {
>  	struct dma_async_tx_descriptor *tx;
>  	struct dmaengine_unmap_data *unmap;
>  	struct device *dma_dev;
>  	int try = 0, ret = 0;
>  
> -	if (!use_dma) {
> -		memcpy_toio(dst, src, len);
> -		goto ret_check_tsync;
> -	}
> -
>  	dma_dev = pthr->dma_chan->device->dev;
> -
> -	if (!is_dma_copy_aligned(pthr->dma_chan->device, offset_in_page(src),
> -				 offset_in_page(dst), len))
> -		return -EIO;

Can you please split this patch into multiple patches? It is hard to
review and part of the reason this code is such a mess is because we
merged large patches with a bunch of different changes rolled into one,
many of which didn't get sufficient reviewer attention.

Patches that refactor things shouldn't be making functional changes
(like adding dma_map_resources()).


> -static int perf_run_test(struct perf_thread *pthr)
> +static int perf_run_test_cpu(struct perf_thread *pthr)
>  {
>  	struct perf_peer *peer = pthr->perf->test_peer;
>  	struct perf_ctx *perf = pthr->perf;
> @@ -914,7 +903,7 @@ static int perf_run_test(struct perf_thread *pthr)
>  
>  	/* Copied field is cleared on test launch stage */
>  	while (pthr->copied < total_size) {
> -		ret = perf_copy_chunk(pthr, flt_dst, flt_src, chunk_size);
> +		ret = perf_copy_chunk_cpu(pthr, flt_dst, flt_src, chunk_size);
>  		if (ret) {
>  			dev_err(&perf->ntb->dev, "%d: Got error %d on test\n",
>  				pthr->tidx, ret);
> @@ -937,6 +926,74 @@ static int perf_run_test(struct perf_thread *pthr)
>  	return 0;
>  }
>  
> +static int perf_run_test_dma(struct perf_thread *pthr)
> +{
> +	struct perf_peer *peer = pthr->perf->test_peer;
> +	struct perf_ctx *perf = pthr->perf;
> +	struct device *dma_dev;
> +	dma_addr_t flt_dst, bnd_dst;
> +	u64 total_size, chunk_size;
> +	void *flt_src;
> +	int ret = 0;
> +
> +	total_size = 1ULL << total_order;
> +	chunk_size = 1ULL << chunk_order;
> +	chunk_size = min_t(u64, peer->outbuf_size, chunk_size);
> +
> +	/* Map MW for DMA */
> +	dma_dev = pthr->dma_chan->device->dev;
> +	peer->outbuf_dma_addr = dma_map_resource(dma_dev,
> +						 peer->outbuf_phys_addr,
> +						 peer->outbuf_size,
> +						 DMA_FROM_DEVICE, 0);
> +	if (dma_mapping_error(dma_dev, peer->outbuf_dma_addr)) {
> +		dma_unmap_resource(dma_dev, peer->outbuf_dma_addr,
> +				   peer->outbuf_size, DMA_FROM_DEVICE, 0);
> +		return -EIO;
> +	}
> +
> +	flt_src = pthr->src;
> +	bnd_dst = peer->outbuf_dma_addr + peer->outbuf_size;
> +	flt_dst = peer->outbuf_dma_addr;
> +
> +	pthr->duration = ktime_get();
> +	/* Copied field is cleared on test launch stage */
> +	while (pthr->copied < total_size) {
> +		ret = perf_copy_chunk_dma(pthr, flt_dst, flt_src, chunk_size);
> +		if (ret) {
> +			dev_err(&perf->ntb->dev, "%d: Got error %d on test\n",
> +				pthr->tidx, ret);
> +			return ret;
> +		}
> +

Honestly, this doesn't seem like a good approach to me. Duplicating the
majority of the perf_run_test() function is making the code more
complicated and harder to maintain.

You should be able to just selectively call dma_map_resources() in
perf_run_test(), or even in perf_setup_peer_mw() without needing to add
so much extra duplicate code.

Logan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN
  2020-03-10 20:54 ` [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN Sanjay R Mehta
@ 2020-03-10 21:31   ` Logan Gunthorpe
  2020-03-11 18:11     ` Nath, Arindam
  0 siblings, 1 reply; 12+ messages in thread
From: Logan Gunthorpe @ 2020-03-10 21:31 UTC (permalink / raw)
  To: Sanjay R Mehta, jdmason, dave.jiang, allenbh, arindam.nath,
	Shyam-sundar.S-k
  Cc: linux-ntb, linux-kernel



On 2020-03-10 2:54 p.m., Sanjay R Mehta wrote:
> From: Arindam Nath <arindam.nath@amd.com>
> 
> perf_spad_cmd_send() and perf_msg_cmd_send() return
> -EAGAIN after trying to send commands for a maximum
> of MSG_TRIES re-tries. But currently there is no
> handling for this error. These functions are invoked
> from perf_service_work() through function pointers,
> so rather than simply call these functions is not
> enough. We need to make sure to invoke them again in
> case of -EAGAIN. Since peer status bits were cleared
> before calling these functions, we set the same status
> bits before queueing the work again for later invocation.
> This way we simply won't go ahead and initialize the
> XLAT registers wrongfully in case sending the very first
> command itself fails.

So what happens if there's an actual non-recoverable error that causes
perf_msg_cmd_send() to fail? Are you proposing it just requeues high
priority work forever?

I never really reviewed this stuff properly but it looks like it has a
bunch of problems. Using the high priority work queue for some low
priority setup work seems wrong, at the very least. The spad and msg
send loops also look like they have a bunch of inter-host race condition
problems as well. Yikes.

Logan



> Signed-off-by: Arindam Nath <arindam.nath@amd.com>
> Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
> ---
>  drivers/ntb/test/ntb_perf.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
> index 6d16628..9068e42 100644
> --- a/drivers/ntb/test/ntb_perf.c
> +++ b/drivers/ntb/test/ntb_perf.c
> @@ -625,14 +625,24 @@ static void perf_service_work(struct work_struct *work)
>  {
>  	struct perf_peer *peer = to_peer_service(work);
>  
> -	if (test_and_clear_bit(PERF_CMD_SSIZE, &peer->sts))
> -		perf_cmd_send(peer, PERF_CMD_SSIZE, peer->outbuf_size);
> +	if (test_and_clear_bit(PERF_CMD_SSIZE, &peer->sts)) {
> +		if (perf_cmd_send(peer, PERF_CMD_SSIZE, peer->outbuf_size)
> +		    == -EAGAIN) {
> +			set_bit(PERF_CMD_SSIZE, &peer->sts);
> +			(void)queue_work(system_highpri_wq, &peer->service);
> +		}
> +	}
>  
>  	if (test_and_clear_bit(PERF_CMD_RSIZE, &peer->sts))
>  		perf_setup_inbuf(peer);
>  
> -	if (test_and_clear_bit(PERF_CMD_SXLAT, &peer->sts))
> -		perf_cmd_send(peer, PERF_CMD_SXLAT, peer->inbuf_xlat);
> +	if (test_and_clear_bit(PERF_CMD_SXLAT, &peer->sts)) {
> +		if (perf_cmd_send(peer, PERF_CMD_SXLAT, peer->inbuf_xlat)
> +		    == -EAGAIN) {
> +			set_bit(PERF_CMD_SXLAT, &peer->sts);
> +			(void)queue_work(system_highpri_wq, &peer->service);
> +		}
> +	}
>  
>  	if (test_and_clear_bit(PERF_CMD_RXLAT, &peer->sts))
>  		perf_setup_outbuf(peer);
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA transfers
  2020-03-10 21:21   ` Logan Gunthorpe
@ 2020-03-11 17:44     ` Nath, Arindam
  0 siblings, 0 replies; 12+ messages in thread
From: Nath, Arindam @ 2020-03-11 17:44 UTC (permalink / raw)
  To: Logan Gunthorpe, Mehta, Sanju, jdmason, dave.jiang, allenbh, S-k,
	Shyam-sundar
  Cc: linux-ntb, linux-kernel

> -----Original Message-----
> From: Logan Gunthorpe <logang@deltatee.com>
> Sent: Wednesday, March 11, 2020 02:51
> To: Mehta, Sanju <Sanju.Mehta@amd.com>; jdmason@kudzu.us;
> dave.jiang@intel.com; allenbh@gmail.com; Nath, Arindam
> <Arindam.Nath@amd.com>; S-k, Shyam-sundar <Shyam-sundar.S-
> k@amd.com>
> Cc: linux-ntb@googlegroups.com; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA
> transfers
> 
> 
> 
> On 2020-03-10 2:54 p.m., Sanjay R Mehta wrote:
> > From: Arindam Nath <arindam.nath@amd.com>
> >
> > This patch creates separate function to handle CPU
> > and DMA transfers. Since CPU transfers use memcopy
> > and DMA transfers use dmaengine APIs, these changes
> > not only allow logical separation between the two,
> > but also allows someone to clearly see the difference
> > in the way the two are handled.
> >
> > In the case of DMA, we DMA from system memory to the
> > memory window(MW) of NTB, which is a MMIO region, we
> > should not use dma_map_page() for mapping MW. The
> > correct way to map a MMIO region is to use
> > dma_map_resource(), so the code is modified
> > accordingly.
> >
> > dma_map_resource() expects physical address of the
> > region to be mapped for DMA, we add a new field,
> > outbuf_phys_addr, to struct perf_peer, and also
> > another field, outbuf_dma_addr, to store the
> > corresponding mapped address returned by the API.
> >
> > Since the MW is contiguous, rather than mapping
> > chunk-by-chunk, we map the entire MW before the
> > actual DMA transfer happens. Then for each chunk,
> > we simply pass offset into the mapped region and
> > DMA to that region. Then later, we unmap the MW
> > during perf_clear_test().
> >
> > The above means that now we need to have different
> > function parameters to deal with in the case of
> > CPU and DMA transfers. In the case of CPU transfers,
> > we simply need the CPU virtual addresses for memcopy,
> > but in the case of DMA, we need dma_addr_t, which
> > will be different from CPU physical address depending
> > on whether IOMMU is enabled or not. Thus we now
> > have two separate functions, perf_copy_chunk_cpu(),
> > and perf_copy_chunk_dma() to take care of above
> > consideration.
> >
> > Signed-off-by: Arindam Nath <arindam.nath@amd.com>
> > Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
> > ---
> >  drivers/ntb/test/ntb_perf.c | 141
> +++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 105 insertions(+), 36 deletions(-)
> >
> > diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
> > index e9b7c2d..6d16628 100644
> > --- a/drivers/ntb/test/ntb_perf.c
> > +++ b/drivers/ntb/test/ntb_perf.c
> > @@ -149,6 +149,8 @@ struct perf_peer {
> >  	u64 outbuf_xlat;
> >  	resource_size_t outbuf_size;
> >  	void __iomem *outbuf;
> > +	phys_addr_t outbuf_phys_addr;
> > +	dma_addr_t outbuf_dma_addr;
> >
> >  	/* Inbound MW params */
> >  	dma_addr_t inbuf_xlat;
> > @@ -775,26 +777,24 @@ static void perf_dma_copy_callback(void *data)
> >  	wake_up(&pthr->dma_wait);
> >  }
> >
> > -static int perf_copy_chunk(struct perf_thread *pthr,
> > -			   void __iomem *dst, void *src, size_t len)
> > +static int perf_copy_chunk_cpu(struct perf_thread *pthr,
> > +			       void __iomem *dst, void *src, size_t len)
> > +{
> > +	memcpy_toio(dst, src, len);
> > +
> > +	return likely(atomic_read(&pthr->perf->tsync) > 0) ? 0 : -EINTR;
> > +}
> > +
> > +static int perf_copy_chunk_dma(struct perf_thread *pthr,
> > +			       dma_addr_t dst, void *src, size_t len)
> >  {
> >  	struct dma_async_tx_descriptor *tx;
> >  	struct dmaengine_unmap_data *unmap;
> >  	struct device *dma_dev;
> >  	int try = 0, ret = 0;
> >
> > -	if (!use_dma) {
> > -		memcpy_toio(dst, src, len);
> > -		goto ret_check_tsync;
> > -	}
> > -
> >  	dma_dev = pthr->dma_chan->device->dev;
> > -
> > -	if (!is_dma_copy_aligned(pthr->dma_chan->device,
> offset_in_page(src),
> > -				 offset_in_page(dst), len))
> > -		return -EIO;
> 
> Can you please split this patch into multiple patches? It is hard to
> review and part of the reason this code is such a mess is because we
> merged large patches with a bunch of different changes rolled into one,
> many of which didn't get sufficient reviewer attention.
> 
> Patches that refactor things shouldn't be making functional changes
> (like adding dma_map_resources()).

We will split the patch into multiple patches in v2.

> 
> 
> > -static int perf_run_test(struct perf_thread *pthr)
> > +static int perf_run_test_cpu(struct perf_thread *pthr)
> >  {
> >  	struct perf_peer *peer = pthr->perf->test_peer;
> >  	struct perf_ctx *perf = pthr->perf;
> > @@ -914,7 +903,7 @@ static int perf_run_test(struct perf_thread *pthr)
> >
> >  	/* Copied field is cleared on test launch stage */
> >  	while (pthr->copied < total_size) {
> > -		ret = perf_copy_chunk(pthr, flt_dst, flt_src, chunk_size);
> > +		ret = perf_copy_chunk_cpu(pthr, flt_dst, flt_src,
> chunk_size);
> >  		if (ret) {
> >  			dev_err(&perf->ntb->dev, "%d: Got error %d on
> test\n",
> >  				pthr->tidx, ret);
> > @@ -937,6 +926,74 @@ static int perf_run_test(struct perf_thread *pthr)
> >  	return 0;
> >  }
> >
> > +static int perf_run_test_dma(struct perf_thread *pthr)
> > +{
> > +	struct perf_peer *peer = pthr->perf->test_peer;
> > +	struct perf_ctx *perf = pthr->perf;
> > +	struct device *dma_dev;
> > +	dma_addr_t flt_dst, bnd_dst;
> > +	u64 total_size, chunk_size;
> > +	void *flt_src;
> > +	int ret = 0;
> > +
> > +	total_size = 1ULL << total_order;
> > +	chunk_size = 1ULL << chunk_order;
> > +	chunk_size = min_t(u64, peer->outbuf_size, chunk_size);
> > +
> > +	/* Map MW for DMA */
> > +	dma_dev = pthr->dma_chan->device->dev;
> > +	peer->outbuf_dma_addr = dma_map_resource(dma_dev,
> > +						 peer->outbuf_phys_addr,
> > +						 peer->outbuf_size,
> > +						 DMA_FROM_DEVICE, 0);
> > +	if (dma_mapping_error(dma_dev, peer->outbuf_dma_addr)) {
> > +		dma_unmap_resource(dma_dev, peer->outbuf_dma_addr,
> > +				   peer->outbuf_size, DMA_FROM_DEVICE,
> 0);
> > +		return -EIO;
> > +	}
> > +
> > +	flt_src = pthr->src;
> > +	bnd_dst = peer->outbuf_dma_addr + peer->outbuf_size;
> > +	flt_dst = peer->outbuf_dma_addr;
> > +
> > +	pthr->duration = ktime_get();
> > +	/* Copied field is cleared on test launch stage */
> > +	while (pthr->copied < total_size) {
> > +		ret = perf_copy_chunk_dma(pthr, flt_dst, flt_src,
> chunk_size);
> > +		if (ret) {
> > +			dev_err(&perf->ntb->dev, "%d: Got error %d on
> test\n",
> > +				pthr->tidx, ret);
> > +			return ret;
> > +		}
> > +
> 
> Honestly, this doesn't seem like a good approach to me. Duplicating the
> majority of the perf_run_test() function is making the code more
> complicated and harder to maintain.
> 
> You should be able to just selectively call dma_map_resources() in
> perf_run_test(), or even in perf_setup_peer_mw() without needing to add
> so much extra duplicate code.

Will be taken care of in v2. Thanks for the suggestions.

Arindam

> 
> Logan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN
  2020-03-10 21:31   ` Logan Gunthorpe
@ 2020-03-11 18:11     ` Nath, Arindam
  2020-03-11 18:47       ` Logan Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: Nath, Arindam @ 2020-03-11 18:11 UTC (permalink / raw)
  To: Logan Gunthorpe, Mehta, Sanju, jdmason, dave.jiang, allenbh, S-k,
	Shyam-sundar
  Cc: linux-ntb, linux-kernel

> -----Original Message-----
> From: Logan Gunthorpe <logang@deltatee.com>
> Sent: Wednesday, March 11, 2020 03:01
> To: Mehta, Sanju <Sanju.Mehta@amd.com>; jdmason@kudzu.us;
> dave.jiang@intel.com; allenbh@gmail.com; Nath, Arindam
> <Arindam.Nath@amd.com>; S-k, Shyam-sundar <Shyam-sundar.S-
> k@amd.com>
> Cc: linux-ntb@googlegroups.com; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN
> 
> 
> 
> On 2020-03-10 2:54 p.m., Sanjay R Mehta wrote:
> > From: Arindam Nath <arindam.nath@amd.com>
> >
> > perf_spad_cmd_send() and perf_msg_cmd_send() return
> > -EAGAIN after trying to send commands for a maximum
> > of MSG_TRIES re-tries. But currently there is no
> > handling for this error. These functions are invoked
> > from perf_service_work() through function pointers,
> > so rather than simply call these functions is not
> > enough. We need to make sure to invoke them again in
> > case of -EAGAIN. Since peer status bits were cleared
> > before calling these functions, we set the same status
> > bits before queueing the work again for later invocation.
> > This way we simply won't go ahead and initialize the
> > XLAT registers wrongfully in case sending the very first
> > command itself fails.
> 
> So what happens if there's an actual non-recoverable error that causes
> perf_msg_cmd_send() to fail? Are you proposing it just requeues high
> priority work forever?

The intent of the patch is to handle -EAGAIN, since the error code is
an indication that we need to try again later. Currently there is a very
small time frame during which ntb_perf should be loaded on both sides
(primary and secondary) to have XLAT registers configured correctly.
Failing that the code will still fall through without properly initializing the
XLAT registers and there is no indication of that either until we have
actually tried to perform 'echo 0 > /sys/kernel/debug/.../run'.

With the changes proposed in this patch, we do not have to depend
on whether the drivers at both ends are loaded within a fixed time
duration. So we can simply load the driver at one side, and at a later
time load the driver on the other, and still the XLAT registers would
be set up correctly.

Looking at perf_spad_cmd_send() and perf_msg_cmd_send(), if the
concern is that ntb_peer_spad_read()/ntb_msg_read_sts() fail because
of some non-recoverable error and we still schedule a high priority
work, that is a valid concern. But isn't it still better than simply falling
through and initializing XLAT register with incorrect values?

> 
> I never really reviewed this stuff properly but it looks like it has a
> bunch of problems. Using the high priority work queue for some low
> priority setup work seems wrong, at the very least. The spad and msg
> send loops also look like they have a bunch of inter-host race condition
> problems as well. Yikes.

I am not very sure what the design considerations were when having
a high priority work queue, but perhaps we can all have a discussion
on this.

Arindam

> 
> Logan
> 
> 
> 
> > Signed-off-by: Arindam Nath <arindam.nath@amd.com>
> > Signed-off-by: Sanjay R Mehta <sanju.mehta@amd.com>
> > ---
> >  drivers/ntb/test/ntb_perf.c | 18 ++++++++++++++----
> >  1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
> > index 6d16628..9068e42 100644
> > --- a/drivers/ntb/test/ntb_perf.c
> > +++ b/drivers/ntb/test/ntb_perf.c
> > @@ -625,14 +625,24 @@ static void perf_service_work(struct work_struct
> *work)
> >  {
> >  	struct perf_peer *peer = to_peer_service(work);
> >
> > -	if (test_and_clear_bit(PERF_CMD_SSIZE, &peer->sts))
> > -		perf_cmd_send(peer, PERF_CMD_SSIZE, peer-
> >outbuf_size);
> > +	if (test_and_clear_bit(PERF_CMD_SSIZE, &peer->sts)) {
> > +		if (perf_cmd_send(peer, PERF_CMD_SSIZE, peer-
> >outbuf_size)
> > +		    == -EAGAIN) {
> > +			set_bit(PERF_CMD_SSIZE, &peer->sts);
> > +			(void)queue_work(system_highpri_wq, &peer-
> >service);
> > +		}
> > +	}
> >
> >  	if (test_and_clear_bit(PERF_CMD_RSIZE, &peer->sts))
> >  		perf_setup_inbuf(peer);
> >
> > -	if (test_and_clear_bit(PERF_CMD_SXLAT, &peer->sts))
> > -		perf_cmd_send(peer, PERF_CMD_SXLAT, peer->inbuf_xlat);
> > +	if (test_and_clear_bit(PERF_CMD_SXLAT, &peer->sts)) {
> > +		if (perf_cmd_send(peer, PERF_CMD_SXLAT, peer-
> >inbuf_xlat)
> > +		    == -EAGAIN) {
> > +			set_bit(PERF_CMD_SXLAT, &peer->sts);
> > +			(void)queue_work(system_highpri_wq, &peer-
> >service);
> > +		}
> > +	}
> >
> >  	if (test_and_clear_bit(PERF_CMD_RXLAT, &peer->sts))
> >  		perf_setup_outbuf(peer);
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN
  2020-03-11 18:11     ` Nath, Arindam
@ 2020-03-11 18:47       ` Logan Gunthorpe
  2020-03-11 18:58         ` Nath, Arindam
  0 siblings, 1 reply; 12+ messages in thread
From: Logan Gunthorpe @ 2020-03-11 18:47 UTC (permalink / raw)
  To: Nath, Arindam, Mehta, Sanju, jdmason, dave.jiang, allenbh, S-k,
	Shyam-sundar
  Cc: linux-ntb, linux-kernel



On 2020-03-11 12:11 p.m., Nath, Arindam wrote:
>> -----Original Message-----
>> From: Logan Gunthorpe <logang@deltatee.com>
>> Sent: Wednesday, March 11, 2020 03:01
>> To: Mehta, Sanju <Sanju.Mehta@amd.com>; jdmason@kudzu.us;
>> dave.jiang@intel.com; allenbh@gmail.com; Nath, Arindam
>> <Arindam.Nath@amd.com>; S-k, Shyam-sundar <Shyam-sundar.S-
>> k@amd.com>
>> Cc: linux-ntb@googlegroups.com; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN
>>
>>
>>
>> On 2020-03-10 2:54 p.m., Sanjay R Mehta wrote:
>>> From: Arindam Nath <arindam.nath@amd.com>
>>>
>>> perf_spad_cmd_send() and perf_msg_cmd_send() return
>>> -EAGAIN after trying to send commands for a maximum
>>> of MSG_TRIES re-tries. But currently there is no
>>> handling for this error. These functions are invoked
>>> from perf_service_work() through function pointers,
>>> so rather than simply call these functions is not
>>> enough. We need to make sure to invoke them again in
>>> case of -EAGAIN. Since peer status bits were cleared
>>> before calling these functions, we set the same status
>>> bits before queueing the work again for later invocation.
>>> This way we simply won't go ahead and initialize the
>>> XLAT registers wrongfully in case sending the very first
>>> command itself fails.
>>
>> So what happens if there's an actual non-recoverable error that causes
>> perf_msg_cmd_send() to fail? Are you proposing it just requeues high
>> priority work forever?
> 
> The intent of the patch is to handle -EAGAIN, since the error code is
> an indication that we need to try again later. Currently there is a very
> small time frame during which ntb_perf should be loaded on both sides
> (primary and secondary) to have XLAT registers configured correctly.
> Failing that the code will still fall through without properly initializing the
> XLAT registers and there is no indication of that either until we have
> actually tried to perform 'echo 0 > /sys/kernel/debug/.../run'.
> 
> With the changes proposed in this patch, we do not have to depend
> on whether the drivers at both ends are loaded within a fixed time
> duration. So we can simply load the driver at one side, and at a later
> time load the driver on the other, and still the XLAT registers would
> be set up correctly.
> 
> Looking at perf_spad_cmd_send() and perf_msg_cmd_send(), if the
> concern is that ntb_peer_spad_read()/ntb_msg_read_sts() fail because
> of some non-recoverable error and we still schedule a high priority
> work, that is a valid concern. But isn't it still better than simply falling
> through and initializing XLAT register with incorrect values?

I don't think it's ever acceptable to get into an infinite loop.
Especially when you're running on the system's high priority work queue...

At the very least schedule a delayed work item to try again in some
number of seconds or something. Essentially just have more retires,
perhaps with longer delays in between.

Falling through and continuing with the wrong values is certainly wrong.
I didn't notice that. If an error occurs, it shouldn't continue, it
should print an error to dmesg and stop.

> 
>>
>> I never really reviewed this stuff properly but it looks like it has a
>> bunch of problems. Using the high priority work queue for some low
>> priority setup work seems wrong, at the very least. The spad and msg
>> send loops also look like they have a bunch of inter-host race condition
>> problems as well. Yikes.
> 
> I am not very sure what the design considerations were when having
> a high priority work queue, but perhaps we can all have a discussion
> on this.

I'd change it. Seems completely wrong to me.

Logan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN
  2020-03-11 18:47       ` Logan Gunthorpe
@ 2020-03-11 18:58         ` Nath, Arindam
  0 siblings, 0 replies; 12+ messages in thread
From: Nath, Arindam @ 2020-03-11 18:58 UTC (permalink / raw)
  To: Logan Gunthorpe, Mehta, Sanju, jdmason, dave.jiang, allenbh, S-k,
	Shyam-sundar
  Cc: linux-ntb, linux-kernel

> -----Original Message-----
> From: Logan Gunthorpe <logang@deltatee.com>
> Sent: Thursday, March 12, 2020 00:17
> To: Nath, Arindam <Arindam.Nath@amd.com>; Mehta, Sanju
> <Sanju.Mehta@amd.com>; jdmason@kudzu.us; dave.jiang@intel.com;
> allenbh@gmail.com; S-k, Shyam-sundar <Shyam-sundar.S-k@amd.com>
> Cc: linux-ntb@googlegroups.com; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN
> 
> 
> 
> On 2020-03-11 12:11 p.m., Nath, Arindam wrote:
> >> -----Original Message-----
> >> From: Logan Gunthorpe <logang@deltatee.com>
> >> Sent: Wednesday, March 11, 2020 03:01
> >> To: Mehta, Sanju <Sanju.Mehta@amd.com>; jdmason@kudzu.us;
> >> dave.jiang@intel.com; allenbh@gmail.com; Nath, Arindam
> >> <Arindam.Nath@amd.com>; S-k, Shyam-sundar <Shyam-sundar.S-
> >> k@amd.com>
> >> Cc: linux-ntb@googlegroups.com; linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH v2 2/5] ntb_perf: send command in response to
> EAGAIN
> >>
> >>
> >>
> >> On 2020-03-10 2:54 p.m., Sanjay R Mehta wrote:
> >>> From: Arindam Nath <arindam.nath@amd.com>
> >>>
> >>> perf_spad_cmd_send() and perf_msg_cmd_send() return
> >>> -EAGAIN after trying to send commands for a maximum
> >>> of MSG_TRIES re-tries. But currently there is no
> >>> handling for this error. These functions are invoked
> >>> from perf_service_work() through function pointers,
> >>> so rather than simply call these functions is not
> >>> enough. We need to make sure to invoke them again in
> >>> case of -EAGAIN. Since peer status bits were cleared
> >>> before calling these functions, we set the same status
> >>> bits before queueing the work again for later invocation.
> >>> This way we simply won't go ahead and initialize the
> >>> XLAT registers wrongfully in case sending the very first
> >>> command itself fails.
> >>
> >> So what happens if there's an actual non-recoverable error that causes
> >> perf_msg_cmd_send() to fail? Are you proposing it just requeues high
> >> priority work forever?
> >
> > The intent of the patch is to handle -EAGAIN, since the error code is
> > an indication that we need to try again later. Currently there is a very
> > small time frame during which ntb_perf should be loaded on both sides
> > (primary and secondary) to have XLAT registers configured correctly.
> > Failing that the code will still fall through without properly initializing the
> > XLAT registers and there is no indication of that either until we have
> > actually tried to perform 'echo 0 > /sys/kernel/debug/.../run'.
> >
> > With the changes proposed in this patch, we do not have to depend
> > on whether the drivers at both ends are loaded within a fixed time
> > duration. So we can simply load the driver at one side, and at a later
> > time load the driver on the other, and still the XLAT registers would
> > be set up correctly.
> >
> > Looking at perf_spad_cmd_send() and perf_msg_cmd_send(), if the
> > concern is that ntb_peer_spad_read()/ntb_msg_read_sts() fail because
> > of some non-recoverable error and we still schedule a high priority
> > work, that is a valid concern. But isn't it still better than simply falling
> > through and initializing XLAT register with incorrect values?
> 
> I don't think it's ever acceptable to get into an infinite loop.
> Especially when you're running on the system's high priority work queue...
> 
> At the very least schedule a delayed work item to try again in some
> number of seconds or something. Essentially just have more retires,
> perhaps with longer delays in between.

Sounds like a good idea. Thanks for the suggestion.

Arindam

> 
> Falling through and continuing with the wrong values is certainly wrong.
> I didn't notice that. If an error occurs, it shouldn't continue, it
> should print an error to dmesg and stop.
> 
> >
> >>
> >> I never really reviewed this stuff properly but it looks like it has a
> >> bunch of problems. Using the high priority work queue for some low
> >> priority setup work seems wrong, at the very least. The spad and msg
> >> send loops also look like they have a bunch of inter-host race condition
> >> problems as well. Yikes.
> >
> > I am not very sure what the design considerations were when having
> > a high priority work queue, but perhaps we can all have a discussion
> > on this.
> 
> I'd change it. Seems completely wrong to me.
> 
> Logan

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-03-11 18:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-10 20:54 [PATCH v2 0/5] ntb perf, ntb tool and ntb-hw improvements Sanjay R Mehta
2020-03-10 20:54 ` [PATCH v2 1/5] ntb_perf: refactor code for CPU and DMA transfers Sanjay R Mehta
2020-03-10 21:21   ` Logan Gunthorpe
2020-03-11 17:44     ` Nath, Arindam
2020-03-10 20:54 ` [PATCH v2 2/5] ntb_perf: send command in response to EAGAIN Sanjay R Mehta
2020-03-10 21:31   ` Logan Gunthorpe
2020-03-11 18:11     ` Nath, Arindam
2020-03-11 18:47       ` Logan Gunthorpe
2020-03-11 18:58         ` Nath, Arindam
2020-03-10 20:54 ` [PATCH v2 3/5] ntb_perf: pass correct struct device to dma_alloc_coherent Sanjay R Mehta
2020-03-10 20:54 ` [PATCH v2 4/5] ntb_tool: " Sanjay R Mehta
2020-03-10 20:54 ` [PATCH v2 5/5] ntb: hw: remove the code that sets the DMA mask Sanjay R Mehta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).