All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] media: stk1160: allocate urb buffs with the DMA noncontiguous API
@ 2022-01-11  6:55 Dafna Hirschfeld
  2022-01-11  6:55 ` [PATCH 1/3] media: stk1160: fix number of buffers in case not all buffers are created Dafna Hirschfeld
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Dafna Hirschfeld @ 2022-01-11  6:55 UTC (permalink / raw)
  To: ezequiel
  Cc: linux-media, laurent.pinchart, hverkuil, ribalda, tfiga,
	senozhatsky, hch, Dafna Hirschfeld, kernel

Replace urb buffers allocation to use the DMA
noncontiguous API. Similarly to a commit sent to uvc: [1]
This improves performance on Arm.

The series was tested on rock-pi4 (Arm64) and shows about 10x
improvements:

NON_CONTIGUOUS
uS/qty: 9231963/106048 sec: 9231.963472 avg: 87.054 min: 0.291 max: 216.423 (uS)
raw decode speed: 9.217 Gbits/s
bytes 9.971904900 G
bytes/urb: 100290

COHERENT
uS/qty: 92423219/98904 sec: 92423.219509 avg: 934.474 min: 0.583 max: 2196.583 (uS)
raw decode speed: 787.698 Mbits/s
bytes 8.510250828 G
bytes/urb: 92010

On x86 there is no significant difference:

NON_CONTIGUOUS
uS/qty: 3203891/50096 sec: 3203.891724 avg: 63.955 min: 0.141 max: 11696.636 (uS)
raw decode speed: 23.783 Gbits/s
bytes 8.932543668 G
bytes/urb: 190084

COHERENT
uS/qty: 3081680/50112 sec: 3081.680438 avg: 61.495 min: 0.160 max: 482.099 (uS)
raw decode speed: 21.557 Gbits/s
bytes 7.786260844 G
bytes/urb: 165677

[1] https://lkml.org/lkml/2021/3/12/1506


Dafna Hirschfeld (3):
  media: stk1160: fix number of buffers in case not all buffers are
    created
  media: stk1160: move transfer_buffer and urb to same struct
    'stk1160_urb'
  media: stk1160: use dma_alloc_noncontiguous API

 drivers/media/usb/stk1160/stk1160-v4l.c   |   5 +-
 drivers/media/usb/stk1160/stk1160-video.c | 137 +++++++++++-----------
 drivers/media/usb/stk1160/stk1160.h       |  21 +++-
 3 files changed, 91 insertions(+), 72 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] media: stk1160: fix number of buffers in case not all buffers are created
  2022-01-11  6:55 [PATCH 0/3] media: stk1160: allocate urb buffs with the DMA noncontiguous API Dafna Hirschfeld
@ 2022-01-11  6:55 ` Dafna Hirschfeld
  2022-01-11 12:21   ` Ezequiel Garcia
  2022-01-11  6:55 ` [PATCH 2/3] media: stk1160: move transfer_buffer and urb to same struct 'stk1160_urb' Dafna Hirschfeld
  2022-01-11  6:55 ` [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API Dafna Hirschfeld
  2 siblings, 1 reply; 16+ messages in thread
From: Dafna Hirschfeld @ 2022-01-11  6:55 UTC (permalink / raw)
  To: ezequiel
  Cc: linux-media, laurent.pinchart, hverkuil, ribalda, tfiga,
	senozhatsky, hch, Dafna Hirschfeld, kernel

In case we fail to allocate a transfer_buffer then we
break the buffers creation loop and update the number of
buffers to the number of successfully allocated which should
be 'i' and not 'i - 1'

Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
---
 drivers/media/usb/stk1160/stk1160-video.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/usb/stk1160/stk1160-video.c b/drivers/media/usb/stk1160/stk1160-video.c
index 202b084f65a2..91bd6adccdd1 100644
--- a/drivers/media/usb/stk1160/stk1160-video.c
+++ b/drivers/media/usb/stk1160/stk1160-video.c
@@ -511,9 +511,9 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
 	usb_free_urb(dev->isoc_ctl.urb[i]);
 	dev->isoc_ctl.urb[i] = NULL;
 
-	stk1160_warn("%d urbs allocated. Trying to continue...\n", i - 1);
+	stk1160_warn("%d urbs allocated. Trying to continue...\n", i);
 
-	dev->isoc_ctl.num_bufs = i - 1;
+	dev->isoc_ctl.num_bufs = i;
 
 	return 0;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] media: stk1160: move transfer_buffer and urb to same struct 'stk1160_urb'
  2022-01-11  6:55 [PATCH 0/3] media: stk1160: allocate urb buffs with the DMA noncontiguous API Dafna Hirschfeld
  2022-01-11  6:55 ` [PATCH 1/3] media: stk1160: fix number of buffers in case not all buffers are created Dafna Hirschfeld
@ 2022-01-11  6:55 ` Dafna Hirschfeld
  2022-01-11  8:10   ` Christoph Hellwig
  2022-01-11 12:29   ` Ezequiel Garcia
  2022-01-11  6:55 ` [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API Dafna Hirschfeld
  2 siblings, 2 replies; 16+ messages in thread
From: Dafna Hirschfeld @ 2022-01-11  6:55 UTC (permalink / raw)
  To: ezequiel
  Cc: linux-media, laurent.pinchart, hverkuil, ribalda, tfiga,
	senozhatsky, hch, Dafna Hirschfeld, kernel

Instead of having two separated arrays, one for the urbs and
one for their buffers, have one array of a struct containing both.
In addition, the array is just 16 pointers, no need to dynamically
allocate it.

Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
---
 drivers/media/usb/stk1160/stk1160-v4l.c   |  2 +-
 drivers/media/usb/stk1160/stk1160-video.c | 50 ++++++++---------------
 drivers/media/usb/stk1160/stk1160.h       | 11 ++---
 3 files changed, 23 insertions(+), 40 deletions(-)

diff --git a/drivers/media/usb/stk1160/stk1160-v4l.c b/drivers/media/usb/stk1160/stk1160-v4l.c
index 6a4eb616d516..a06030451db4 100644
--- a/drivers/media/usb/stk1160/stk1160-v4l.c
+++ b/drivers/media/usb/stk1160/stk1160-v4l.c
@@ -232,7 +232,7 @@ static int stk1160_start_streaming(struct stk1160 *dev)
 
 	/* submit urbs and enables IRQ */
 	for (i = 0; i < dev->isoc_ctl.num_bufs; i++) {
-		rc = usb_submit_urb(dev->isoc_ctl.urb[i], GFP_KERNEL);
+		rc = usb_submit_urb(dev->isoc_ctl.stk1160_urb[i].urb, GFP_KERNEL);
 		if (rc) {
 			stk1160_err("cannot submit urb[%d] (%d)\n", i, rc);
 			goto out_uninit;
diff --git a/drivers/media/usb/stk1160/stk1160-video.c b/drivers/media/usb/stk1160/stk1160-video.c
index 91bd6adccdd1..4194d31b53bb 100644
--- a/drivers/media/usb/stk1160/stk1160-video.c
+++ b/drivers/media/usb/stk1160/stk1160-video.c
@@ -347,7 +347,7 @@ void stk1160_cancel_isoc(struct stk1160 *dev)
 		 * We don't care for NULL pointer since
 		 * usb_kill_urb allows it.
 		 */
-		usb_kill_urb(dev->isoc_ctl.urb[i]);
+		usb_kill_urb(dev->isoc_ctl.stk1160_urb[i].urb);
 	}
 
 	stk1160_dbg("all urbs killed\n");
@@ -366,30 +366,25 @@ void stk1160_free_isoc(struct stk1160 *dev)
 
 	for (i = 0; i < num_bufs; i++) {
 
-		urb = dev->isoc_ctl.urb[i];
+		urb = dev->isoc_ctl.stk1160_urb[i].urb;
 		if (urb) {
 
-			if (dev->isoc_ctl.transfer_buffer[i]) {
+			if (dev->isoc_ctl.stk1160_urb[i].transfer_buffer) {
 #ifndef CONFIG_DMA_NONCOHERENT
 				usb_free_coherent(dev->udev,
 					urb->transfer_buffer_length,
-					dev->isoc_ctl.transfer_buffer[i],
+					dev->isoc_ctl.stk1160_urb[i].transfer_buffer,
 					urb->transfer_dma);
 #else
-				kfree(dev->isoc_ctl.transfer_buffer[i]);
+				kfree(dev->isoc_ctl.stk1160_urb[i].transfer_buffer);
 #endif
 			}
 			usb_free_urb(urb);
-			dev->isoc_ctl.urb[i] = NULL;
+			dev->isoc_ctl.stk1160_urb[i].urb = NULL;
 		}
-		dev->isoc_ctl.transfer_buffer[i] = NULL;
+		dev->isoc_ctl.stk1160_urb[i].transfer_buffer = NULL;
 	}
 
-	kfree(dev->isoc_ctl.urb);
-	kfree(dev->isoc_ctl.transfer_buffer);
-
-	dev->isoc_ctl.urb = NULL;
-	dev->isoc_ctl.transfer_buffer = NULL;
 	dev->isoc_ctl.num_bufs = 0;
 
 	stk1160_dbg("all urb buffers freed\n");
@@ -429,19 +424,6 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
 
 	dev->isoc_ctl.buf = NULL;
 	dev->isoc_ctl.max_pkt_size = dev->max_pkt_size;
-	dev->isoc_ctl.urb = kcalloc(num_bufs, sizeof(void *), GFP_KERNEL);
-	if (!dev->isoc_ctl.urb) {
-		stk1160_err("out of memory for urb array\n");
-		return -ENOMEM;
-	}
-
-	dev->isoc_ctl.transfer_buffer = kcalloc(num_bufs, sizeof(void *),
-						GFP_KERNEL);
-	if (!dev->isoc_ctl.transfer_buffer) {
-		stk1160_err("out of memory for usb transfers\n");
-		kfree(dev->isoc_ctl.urb);
-		return -ENOMEM;
-	}
 
 	/* allocate urbs and transfer buffers */
 	for (i = 0; i < num_bufs; i++) {
@@ -449,15 +431,15 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
 		urb = usb_alloc_urb(max_packets, GFP_KERNEL);
 		if (!urb)
 			goto free_i_bufs;
-		dev->isoc_ctl.urb[i] = urb;
+		dev->isoc_ctl.stk1160_urb[i].urb = urb;
 
 #ifndef CONFIG_DMA_NONCOHERENT
-		dev->isoc_ctl.transfer_buffer[i] = usb_alloc_coherent(dev->udev,
-			sb_size, GFP_KERNEL, &urb->transfer_dma);
+		dev->isoc_ctl.stk1160_urb[i].transfer_buffer =
+			usb_alloc_coherent(dev->udev, sb_size, GFP_KERNEL, &urb->transfer_dma);
 #else
-		dev->isoc_ctl.transfer_buffer[i] = kmalloc(sb_size, GFP_KERNEL);
+		dev->isoc_ctl.stk1160_urb[i].transfer_buffer = kmalloc(sb_size, GFP_KERNEL);
 #endif
-		if (!dev->isoc_ctl.transfer_buffer[i]) {
+		if (!dev->isoc_ctl.stk1160_urb[i].transfer_buffer) {
 			stk1160_err("cannot alloc %d bytes for tx[%d] buffer\n",
 				sb_size, i);
 
@@ -466,14 +448,14 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
 				goto free_i_bufs;
 			goto nomore_tx_bufs;
 		}
-		memset(dev->isoc_ctl.transfer_buffer[i], 0, sb_size);
+		memset(dev->isoc_ctl.stk1160_urb[i].transfer_buffer, 0, sb_size);
 
 		/*
 		 * FIXME: Where can I get the endpoint?
 		 */
 		urb->dev = dev->udev;
 		urb->pipe = usb_rcvisocpipe(dev->udev, STK1160_EP_VIDEO);
-		urb->transfer_buffer = dev->isoc_ctl.transfer_buffer[i];
+		urb->transfer_buffer = dev->isoc_ctl.stk1160_urb[i].transfer_buffer;
 		urb->transfer_buffer_length = sb_size;
 		urb->complete = stk1160_isoc_irq;
 		urb->context = dev;
@@ -508,8 +490,8 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
 	 * enough to work fine, so we just free the extra urb,
 	 * store the allocated count and keep going, fingers crossed!
 	 */
-	usb_free_urb(dev->isoc_ctl.urb[i]);
-	dev->isoc_ctl.urb[i] = NULL;
+	usb_free_urb(dev->isoc_ctl.stk1160_urb[i].urb);
+	dev->isoc_ctl.stk1160_urb[i].urb = NULL;
 
 	stk1160_warn("%d urbs allocated. Trying to continue...\n", i);
 
diff --git a/drivers/media/usb/stk1160/stk1160.h b/drivers/media/usb/stk1160/stk1160.h
index a31ea1c80f25..1ffca1343d88 100644
--- a/drivers/media/usb/stk1160/stk1160.h
+++ b/drivers/media/usb/stk1160/stk1160.h
@@ -84,6 +84,11 @@ struct stk1160_buffer {
 	unsigned int pos;		/* current pos inside buffer */
 };
 
+struct stk1160_urb {
+	struct urb *urb;
+	char *transfer_buffer;
+};
+
 struct stk1160_isoc_ctl {
 	/* max packet size of isoc transaction */
 	int max_pkt_size;
@@ -91,11 +96,7 @@ struct stk1160_isoc_ctl {
 	/* number of allocated urbs */
 	int num_bufs;
 
-	/* urb for isoc transfers */
-	struct urb **urb;
-
-	/* transfer buffers for isoc transfer */
-	char **transfer_buffer;
+	struct stk1160_urb stk1160_urb[STK1160_NUM_BUFS];
 
 	/* current buffer */
 	struct stk1160_buffer *buf;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API
  2022-01-11  6:55 [PATCH 0/3] media: stk1160: allocate urb buffs with the DMA noncontiguous API Dafna Hirschfeld
  2022-01-11  6:55 ` [PATCH 1/3] media: stk1160: fix number of buffers in case not all buffers are created Dafna Hirschfeld
  2022-01-11  6:55 ` [PATCH 2/3] media: stk1160: move transfer_buffer and urb to same struct 'stk1160_urb' Dafna Hirschfeld
@ 2022-01-11  6:55 ` Dafna Hirschfeld
  2022-01-11  8:13   ` Christoph Hellwig
  2022-01-11 12:38   ` Ezequiel Garcia
  2 siblings, 2 replies; 16+ messages in thread
From: Dafna Hirschfeld @ 2022-01-11  6:55 UTC (permalink / raw)
  To: ezequiel
  Cc: linux-media, laurent.pinchart, hverkuil, ribalda, tfiga,
	senozhatsky, hch, Dafna Hirschfeld, kernel

Replace the urb buffers allocation to
use the noncontiguous API. This improve performance
on Arm.
The noncontiguous API require handling
synchronization.
This commit is similar to the one sent to
uvc: [1]

[1] https://lkml.org/lkml/2021/3/12/1506

Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
---
 drivers/media/usb/stk1160/stk1160-v4l.c   |   3 +
 drivers/media/usb/stk1160/stk1160-video.c | 109 +++++++++++++---------
 drivers/media/usb/stk1160/stk1160.h       |  10 ++
 3 files changed, 79 insertions(+), 43 deletions(-)

diff --git a/drivers/media/usb/stk1160/stk1160-v4l.c b/drivers/media/usb/stk1160/stk1160-v4l.c
index a06030451db4..31c34734425b 100644
--- a/drivers/media/usb/stk1160/stk1160-v4l.c
+++ b/drivers/media/usb/stk1160/stk1160-v4l.c
@@ -232,6 +232,9 @@ static int stk1160_start_streaming(struct stk1160 *dev)
 
 	/* submit urbs and enables IRQ */
 	for (i = 0; i < dev->isoc_ctl.num_bufs; i++) {
+		struct stk1160_urb *stk_urb = &dev->isoc_ctl.stk1160_urb[i];
+
+		dma_sync_sgtable_for_device(stk1160_get_dmadev(dev), stk_urb->sgt, DMA_FROM_DEVICE);
 		rc = usb_submit_urb(dev->isoc_ctl.stk1160_urb[i].urb, GFP_KERNEL);
 		if (rc) {
 			stk1160_err("cannot submit urb[%d] (%d)\n", i, rc);
diff --git a/drivers/media/usb/stk1160/stk1160-video.c b/drivers/media/usb/stk1160/stk1160-video.c
index 4194d31b53bb..fdc0de2af4c0 100644
--- a/drivers/media/usb/stk1160/stk1160-video.c
+++ b/drivers/media/usb/stk1160/stk1160-video.c
@@ -295,7 +295,8 @@ static void stk1160_process_isoc(struct stk1160 *dev, struct urb *urb)
 static void stk1160_isoc_irq(struct urb *urb)
 {
 	int i, rc;
-	struct stk1160 *dev = urb->context;
+	struct stk1160_urb *stk_urb = urb->context;
+	struct stk1160 *dev = stk_urb->dev;
 
 	switch (urb->status) {
 	case 0:
@@ -310,6 +311,9 @@ static void stk1160_isoc_irq(struct urb *urb)
 		return;
 	}
 
+	dma_sync_sgtable_for_cpu(stk1160_get_dmadev(dev), stk_urb->sgt, DMA_FROM_DEVICE);
+	invalidate_kernel_vmap_range(stk_urb->transfer_buffer, urb->transfer_buffer_length);
+
 	stk1160_process_isoc(dev, urb);
 
 	/* Reset urb buffers */
@@ -318,6 +322,7 @@ static void stk1160_isoc_irq(struct urb *urb)
 		urb->iso_frame_desc[i].actual_length = 0;
 	}
 
+	dma_sync_sgtable_for_device(stk1160_get_dmadev(dev), stk_urb->sgt, DMA_FROM_DEVICE);
 	rc = usb_submit_urb(urb, GFP_ATOMIC);
 	if (rc)
 		stk1160_err("urb re-submit failed (%d)\n", rc);
@@ -353,37 +358,34 @@ void stk1160_cancel_isoc(struct stk1160 *dev)
 	stk1160_dbg("all urbs killed\n");
 }
 
+static void stk_free_urb_buffer(struct stk1160 *dev, struct stk1160_urb *stk_urb)
+{
+	struct device *dma_dev = stk1160_get_dmadev(dev);
+
+	dma_vunmap_noncontiguous(dma_dev, stk_urb->transfer_buffer);
+	dma_free_noncontiguous(dma_dev, stk_urb->urb->transfer_buffer_length, stk_urb->sgt,
+			       DMA_FROM_DEVICE);
+	usb_free_urb(stk_urb->urb);
+
+	stk_urb->transfer_buffer = NULL;
+	stk_urb->sgt = NULL;
+	stk_urb->urb = NULL;
+	stk_urb->dev = NULL;
+	stk_urb->dma = 0;
+}
+
 /*
  * Releases urb and transfer buffers
  * Obviusly, associated urb must be killed before releasing it.
  */
 void stk1160_free_isoc(struct stk1160 *dev)
 {
-	struct urb *urb;
 	int i, num_bufs = dev->isoc_ctl.num_bufs;
 
 	stk1160_dbg("freeing %d urb buffers...\n", num_bufs);
 
-	for (i = 0; i < num_bufs; i++) {
-
-		urb = dev->isoc_ctl.stk1160_urb[i].urb;
-		if (urb) {
-
-			if (dev->isoc_ctl.stk1160_urb[i].transfer_buffer) {
-#ifndef CONFIG_DMA_NONCOHERENT
-				usb_free_coherent(dev->udev,
-					urb->transfer_buffer_length,
-					dev->isoc_ctl.stk1160_urb[i].transfer_buffer,
-					urb->transfer_dma);
-#else
-				kfree(dev->isoc_ctl.stk1160_urb[i].transfer_buffer);
-#endif
-			}
-			usb_free_urb(urb);
-			dev->isoc_ctl.stk1160_urb[i].urb = NULL;
-		}
-		dev->isoc_ctl.stk1160_urb[i].transfer_buffer = NULL;
-	}
+	for (i = 0; i < num_bufs; i++)
+		stk_free_urb_buffer(dev, &dev->isoc_ctl.stk1160_urb[i]);
 
 	dev->isoc_ctl.num_bufs = 0;
 
@@ -400,6 +402,39 @@ void stk1160_uninit_isoc(struct stk1160 *dev)
 	stk1160_free_isoc(dev);
 }
 
+static int stk1160_fill_urb(struct stk1160 *dev, struct stk1160_urb *stk_urb, int sb_size,
+			    int max_packets)
+{
+	struct device *dma_dev = stk1160_get_dmadev(dev);
+
+	stk_urb->urb = usb_alloc_urb(max_packets, GFP_KERNEL);
+	if (!stk_urb->urb)
+		return -ENOMEM;
+	stk_urb->sgt = dma_alloc_noncontiguous(dma_dev, sb_size, DMA_FROM_DEVICE, GFP_KERNEL, 0);
+
+	/*
+	 * If the buffer allocation failed, we exit but return 0 since we allow driver
+	 * working with less buffers
+	 */
+	if (!stk_urb->sgt)
+		goto free_urb;
+
+	stk_urb->transfer_buffer = dma_vmap_noncontiguous(dma_dev, sb_size, stk_urb->sgt);
+	if (!stk_urb->transfer_buffer)
+		goto free_sgt;
+
+	stk_urb->dma = stk_urb->sgt->sgl->dma_address;
+	stk_urb->dev = dev;
+	return 0;
+free_sgt:
+	dma_free_noncontiguous(dma_dev, sb_size, stk_urb->sgt, DMA_FROM_DEVICE);
+	stk_urb->sgt = NULL;
+free_urb:
+	usb_free_urb(stk_urb->urb);
+	stk_urb->urb = NULL;
+
+	return 0;
+}
 /*
  * Allocate URBs
  */
@@ -407,6 +442,7 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
 {
 	struct urb *urb;
 	int i, j, k, sb_size, max_packets, num_bufs;
+	int ret;
 
 	/*
 	 * It may be necessary to release isoc here,
@@ -428,21 +464,13 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
 	/* allocate urbs and transfer buffers */
 	for (i = 0; i < num_bufs; i++) {
 
-		urb = usb_alloc_urb(max_packets, GFP_KERNEL);
-		if (!urb)
+		ret = stk1160_fill_urb(dev, &dev->isoc_ctl.stk1160_urb[i], sb_size, max_packets);
+		if (ret)
 			goto free_i_bufs;
-		dev->isoc_ctl.stk1160_urb[i].urb = urb;
-
-#ifndef CONFIG_DMA_NONCOHERENT
-		dev->isoc_ctl.stk1160_urb[i].transfer_buffer =
-			usb_alloc_coherent(dev->udev, sb_size, GFP_KERNEL, &urb->transfer_dma);
-#else
-		dev->isoc_ctl.stk1160_urb[i].transfer_buffer = kmalloc(sb_size, GFP_KERNEL);
-#endif
-		if (!dev->isoc_ctl.stk1160_urb[i].transfer_buffer) {
-			stk1160_err("cannot alloc %d bytes for tx[%d] buffer\n",
-				sb_size, i);
 
+		urb = dev->isoc_ctl.stk1160_urb[i].urb;
+
+		if (!urb) {
 			/* Not enough transfer buffers, so just give up */
 			if (i < STK1160_MIN_BUFS)
 				goto free_i_bufs;
@@ -458,15 +486,12 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
 		urb->transfer_buffer = dev->isoc_ctl.stk1160_urb[i].transfer_buffer;
 		urb->transfer_buffer_length = sb_size;
 		urb->complete = stk1160_isoc_irq;
-		urb->context = dev;
+		urb->context = &dev->isoc_ctl.stk1160_urb[i];
 		urb->interval = 1;
 		urb->start_frame = 0;
 		urb->number_of_packets = max_packets;
-#ifndef CONFIG_DMA_NONCOHERENT
 		urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP;
-#else
-		urb->transfer_flags = URB_ISO_ASAP;
-#endif
+		urb->transfer_dma = dev->isoc_ctl.stk1160_urb[i].dma;
 
 		k = 0;
 		for (j = 0; j < max_packets; j++) {
@@ -490,8 +515,6 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
 	 * enough to work fine, so we just free the extra urb,
 	 * store the allocated count and keep going, fingers crossed!
 	 */
-	usb_free_urb(dev->isoc_ctl.stk1160_urb[i].urb);
-	dev->isoc_ctl.stk1160_urb[i].urb = NULL;
 
 	stk1160_warn("%d urbs allocated. Trying to continue...\n", i);
 
@@ -501,7 +524,7 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
 
 free_i_bufs:
 	/* Save the allocated buffers so far, so we can properly free them */
-	dev->isoc_ctl.num_bufs = i+1;
+	dev->isoc_ctl.num_bufs = i;
 	stk1160_free_isoc(dev);
 	return -ENOMEM;
 }
diff --git a/drivers/media/usb/stk1160/stk1160.h b/drivers/media/usb/stk1160/stk1160.h
index 1ffca1343d88..52bea7815ae5 100644
--- a/drivers/media/usb/stk1160/stk1160.h
+++ b/drivers/media/usb/stk1160/stk1160.h
@@ -16,6 +16,8 @@
 #include <media/videobuf2-v4l2.h>
 #include <media/v4l2-device.h>
 #include <media/v4l2-ctrls.h>
+#include <linux/usb.h>
+#include <linux/usb/hcd.h>
 
 #define STK1160_VERSION		"0.9.5"
 #define STK1160_VERSION_NUM	0x000905
@@ -87,6 +89,9 @@ struct stk1160_buffer {
 struct stk1160_urb {
 	struct urb *urb;
 	char *transfer_buffer;
+	struct sg_table *sgt;
+	struct stk1160 *dev;
+	dma_addr_t dma;
 };
 
 struct stk1160_isoc_ctl {
@@ -190,3 +195,8 @@ void stk1160_select_input(struct stk1160 *dev);
 
 /* Provided by stk1160-ac97.c */
 void stk1160_ac97_setup(struct stk1160 *dev);
+
+static inline struct device *stk1160_get_dmadev(struct stk1160 *dev)
+{
+	return bus_to_hcd(dev->udev->bus)->self.sysdev;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] media: stk1160: move transfer_buffer and urb to same struct 'stk1160_urb'
  2022-01-11  6:55 ` [PATCH 2/3] media: stk1160: move transfer_buffer and urb to same struct 'stk1160_urb' Dafna Hirschfeld
@ 2022-01-11  8:10   ` Christoph Hellwig
  2022-01-11 12:29   ` Ezequiel Garcia
  1 sibling, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2022-01-11  8:10 UTC (permalink / raw)
  To: Dafna Hirschfeld
  Cc: ezequiel, linux-media, laurent.pinchart, hverkuil, ribalda,
	tfiga, senozhatsky, hch, kernel

The overly long lines make this pretty unreadable.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API
  2022-01-11  6:55 ` [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API Dafna Hirschfeld
@ 2022-01-11  8:13   ` Christoph Hellwig
  2022-01-11  8:50     ` Dafna Hirschfeld
  2022-01-11 12:38   ` Ezequiel Garcia
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2022-01-11  8:13 UTC (permalink / raw)
  To: Dafna Hirschfeld
  Cc: ezequiel, linux-media, laurent.pinchart, hverkuil, ribalda,
	tfiga, senozhatsky, hch, kernel

On Tue, Jan 11, 2022 at 08:55:05AM +0200, Dafna Hirschfeld wrote:
> Replace the urb buffers allocation to
> use the noncontiguous API. This improve performance
> on Arm.
> The noncontiguous API require handling
> synchronization.
> This commit is similar to the one sent to
> uvc: [1]

Strange formatting.  I'd flow this as:

Replace the urb buffers allocation to use the noncontiguous API.  This
improve performance on ARM plattform (XXX: insert why here)
The noncontiguous API requires the driver to handle synchronization.
This commit is similar to this one for the uvc driver:

    https://lkml.org/lkml/2021/3/12/1506

> @@ -310,6 +311,9 @@ static void stk1160_isoc_irq(struct urb *urb)
>  		return;
>  	}
>  
> +	dma_sync_sgtable_for_cpu(stk1160_get_dmadev(dev), stk_urb->sgt, DMA_FROM_DEVICE);
> +	invalidate_kernel_vmap_range(stk_urb->transfer_buffer, urb->transfer_buffer_length);

Besisdes the unreadably long lines, I'd invalidate the vmap range before
the direct mapping range.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API
  2022-01-11  8:13   ` Christoph Hellwig
@ 2022-01-11  8:50     ` Dafna Hirschfeld
  2022-01-11  8:52       ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Dafna Hirschfeld @ 2022-01-11  8:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: ezequiel, linux-media, laurent.pinchart, hverkuil, ribalda,
	tfiga, senozhatsky, kernel



On 11.01.22 10:13, Christoph Hellwig wrote:
> On Tue, Jan 11, 2022 at 08:55:05AM +0200, Dafna Hirschfeld wrote:
>> Replace the urb buffers allocation to
>> use the noncontiguous API. This improve performance
>> on Arm.
>> The noncontiguous API require handling
>> synchronization.
>> This commit is similar to the one sent to
>> uvc: [1]
> 
> Strange formatting.  I'd flow this as:
> 
> Replace the urb buffers allocation to use the noncontiguous API.  This
> improve performance on ARM plattform (XXX: insert why here)
> The noncontiguous API requires the driver to handle synchronization.
> This commit is similar to this one for the uvc driver:
> 
>      https://lkml.org/lkml/2021/3/12/1506

I'll reformulate and explain.

> 
>> @@ -310,6 +311,9 @@ static void stk1160_isoc_irq(struct urb *urb)
>>   		return;
>>   	}
>>   
>> +	dma_sync_sgtable_for_cpu(stk1160_get_dmadev(dev), stk_urb->sgt, DMA_FROM_DEVICE);
>> +	invalidate_kernel_vmap_range(stk_urb->transfer_buffer, urb->transfer_buffer_length);
> 
> Besisdes the unreadably long lines, I'd invalidate the vmap range before
> the direct mapping range.

I'll send v2 with shorter lines. (the official limit is now 100 char which I still follow).
I can send v2 with limit of 80.

You mean you would call "invalidate_kernel_vmap_range" before "dma_sync_sgtable_for_cpu" ?

Thanks,
Dafna

> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API
  2022-01-11  8:50     ` Dafna Hirschfeld
@ 2022-01-11  8:52       ` Christoph Hellwig
  2022-01-11  8:55         ` Dafna Hirschfeld
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2022-01-11  8:52 UTC (permalink / raw)
  To: Dafna Hirschfeld
  Cc: Christoph Hellwig, ezequiel, linux-media, laurent.pinchart,
	hverkuil, ribalda, tfiga, senozhatsky, kernel

On Tue, Jan 11, 2022 at 10:50:50AM +0200, Dafna Hirschfeld wrote:
> I'll send v2 with shorter lines. (the official limit is now 100 char which I still follow).

No.  It is 80 lines with an exception to go over it if it sigificantly
improves readability.

> You mean you would call "invalidate_kernel_vmap_range" before "dma_sync_sgtable_for_cpu" ?

Yes.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API
  2022-01-11  8:52       ` Christoph Hellwig
@ 2022-01-11  8:55         ` Dafna Hirschfeld
  2022-01-11  8:59           ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Dafna Hirschfeld @ 2022-01-11  8:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: ezequiel, linux-media, laurent.pinchart, hverkuil, ribalda,
	tfiga, senozhatsky, kernel



On 11.01.22 10:52, Christoph Hellwig wrote:
> On Tue, Jan 11, 2022 at 10:50:50AM +0200, Dafna Hirschfeld wrote:
>> I'll send v2 with shorter lines. (the official limit is now 100 char which I still follow).
> 
> No.  It is 80 lines with an exception to go over it if it sigificantly
> improves readability.

Ok, didn't know that.

> 
>> You mean you would call "invalidate_kernel_vmap_range" before "dma_sync_sgtable_for_cpu" ?
> 
> Yes.

Could you explain why?

> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API
  2022-01-11  8:55         ` Dafna Hirschfeld
@ 2022-01-11  8:59           ` Christoph Hellwig
  2022-01-12  9:25             ` Sergey Senozhatsky
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2022-01-11  8:59 UTC (permalink / raw)
  To: Dafna Hirschfeld
  Cc: Christoph Hellwig, ezequiel, linux-media, laurent.pinchart,
	hverkuil, ribalda, tfiga, senozhatsky, kernel

On Tue, Jan 11, 2022 at 10:55:47AM +0200, Dafna Hirschfeld wrote:
>
>
> On 11.01.22 10:52, Christoph Hellwig wrote:
>> On Tue, Jan 11, 2022 at 10:50:50AM +0200, Dafna Hirschfeld wrote:
>>> I'll send v2 with shorter lines. (the official limit is now 100 char which I still follow).
>>
>> No.  It is 80 lines with an exception to go over it if it sigificantly
>> improves readability.
>
> Ok, didn't know that.
>
>>
>>> You mean you would call "invalidate_kernel_vmap_range" before "dma_sync_sgtable_for_cpu" ?
>>
>> Yes.
>
> Could you explain why?

the vmap range is the one actually use for cpu access and thus most
prone for speculation, so I'd invalidate it first.  It probably does
not matter to much, but that order looks more sensible.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] media: stk1160: fix number of buffers in case not all buffers are created
  2022-01-11  6:55 ` [PATCH 1/3] media: stk1160: fix number of buffers in case not all buffers are created Dafna Hirschfeld
@ 2022-01-11 12:21   ` Ezequiel Garcia
  0 siblings, 0 replies; 16+ messages in thread
From: Ezequiel Garcia @ 2022-01-11 12:21 UTC (permalink / raw)
  To: Dafna Hirschfeld
  Cc: linux-media, laurent.pinchart, hverkuil, ribalda, tfiga,
	senozhatsky, hch, kernel

On Tue, Jan 11, 2022 at 08:55:03AM +0200, Dafna Hirschfeld wrote:
> In case we fail to allocate a transfer_buffer then we
> break the buffers creation loop and update the number of
> buffers to the number of successfully allocated which should
> be 'i' and not 'i - 1'
> 
> Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>

Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>

> ---
>  drivers/media/usb/stk1160/stk1160-video.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/usb/stk1160/stk1160-video.c b/drivers/media/usb/stk1160/stk1160-video.c
> index 202b084f65a2..91bd6adccdd1 100644
> --- a/drivers/media/usb/stk1160/stk1160-video.c
> +++ b/drivers/media/usb/stk1160/stk1160-video.c
> @@ -511,9 +511,9 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
>  	usb_free_urb(dev->isoc_ctl.urb[i]);
>  	dev->isoc_ctl.urb[i] = NULL;
>  
> -	stk1160_warn("%d urbs allocated. Trying to continue...\n", i - 1);
> +	stk1160_warn("%d urbs allocated. Trying to continue...\n", i);
>  
> -	dev->isoc_ctl.num_bufs = i - 1;
> +	dev->isoc_ctl.num_bufs = i;
>  
>  	return 0;
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] media: stk1160: move transfer_buffer and urb to same struct 'stk1160_urb'
  2022-01-11  6:55 ` [PATCH 2/3] media: stk1160: move transfer_buffer and urb to same struct 'stk1160_urb' Dafna Hirschfeld
  2022-01-11  8:10   ` Christoph Hellwig
@ 2022-01-11 12:29   ` Ezequiel Garcia
  1 sibling, 0 replies; 16+ messages in thread
From: Ezequiel Garcia @ 2022-01-11 12:29 UTC (permalink / raw)
  To: Dafna Hirschfeld
  Cc: linux-media, laurent.pinchart, hverkuil, ribalda, tfiga,
	senozhatsky, hch, kernel

Hi Dafna,

On Tue, Jan 11, 2022 at 08:55:04AM +0200, Dafna Hirschfeld wrote:
> Instead of having two separated arrays, one for the urbs and
> one for their buffers, have one array of a struct containing both.
> In addition, the array is just 16 pointers, no need to dynamically
> allocate it.
> 
> Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
> ---
>  drivers/media/usb/stk1160/stk1160-v4l.c   |  2 +-
>  drivers/media/usb/stk1160/stk1160-video.c | 50 ++++++++---------------
>  drivers/media/usb/stk1160/stk1160.h       | 11 ++---
>  3 files changed, 23 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/media/usb/stk1160/stk1160-v4l.c b/drivers/media/usb/stk1160/stk1160-v4l.c
> index 6a4eb616d516..a06030451db4 100644
> --- a/drivers/media/usb/stk1160/stk1160-v4l.c
> +++ b/drivers/media/usb/stk1160/stk1160-v4l.c
> @@ -232,7 +232,7 @@ static int stk1160_start_streaming(struct stk1160 *dev)
>  
>  	/* submit urbs and enables IRQ */
>  	for (i = 0; i < dev->isoc_ctl.num_bufs; i++) {
> -		rc = usb_submit_urb(dev->isoc_ctl.urb[i], GFP_KERNEL);
> +		rc = usb_submit_urb(dev->isoc_ctl.stk1160_urb[i].urb, GFP_KERNEL);

The "stk1160_urb" field is a member of struct stk1160_isoc_ctl,
so it's not really needed to name it "stk1160_urb", you could just
name it "urb_ctl" or something like that.

I think the result should be saner:

		rc = usb_submit_urb(dev->isoc_ctl.urb_ctl[i].urb,
				    GFP_KERNEL);

Also, please make sure you run some regression tests with this patch alone
applied (without 3/3). It's easy to make a mistake on this kind of cleanups.

Thanks,
Ezequiel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API
  2022-01-11  6:55 ` [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API Dafna Hirschfeld
  2022-01-11  8:13   ` Christoph Hellwig
@ 2022-01-11 12:38   ` Ezequiel Garcia
  2022-01-24 14:43     ` Dafna Hirschfeld
  1 sibling, 1 reply; 16+ messages in thread
From: Ezequiel Garcia @ 2022-01-11 12:38 UTC (permalink / raw)
  To: Dafna Hirschfeld
  Cc: linux-media, laurent.pinchart, hverkuil, ribalda, tfiga,
	senozhatsky, hch, kernel

Hi Dafna,

Very nice work.

I specially like all the testing that you mentioned in the cover-letter.

Back in the day, there were a few users trying to use STK1160 (Easycap)
with Beaglebone boards, without success due to lack of USB throughput.

If anything else, I think it's good to see additional noncontiguous API users.

On Tue, Jan 11, 2022 at 08:55:05AM +0200, Dafna Hirschfeld wrote:
> Replace the urb buffers allocation to
> use the noncontiguous API. This improve performance
> on Arm.
> The noncontiguous API require handling
> synchronization.
> This commit is similar to the one sent to
> uvc: [1]
> 
> [1] https://lkml.org/lkml/2021/3/12/1506
> 

This commit description needs lots of attention. In particular,
please add the test results here.

> Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
> ---
>  drivers/media/usb/stk1160/stk1160-v4l.c   |   3 +
>  drivers/media/usb/stk1160/stk1160-video.c | 109 +++++++++++++---------
>  drivers/media/usb/stk1160/stk1160.h       |  10 ++
>  3 files changed, 79 insertions(+), 43 deletions(-)
> 
[..]
> @@ -501,7 +524,7 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
>  
>  free_i_bufs:
>  	/* Save the allocated buffers so far, so we can properly free them */
> -	dev->isoc_ctl.num_bufs = i+1;
> +	dev->isoc_ctl.num_bufs = i;

This looks like a separate fix, similar to 1/3 ?

>  	stk1160_free_isoc(dev);
>  	return -ENOMEM;
>  }
> diff --git a/drivers/media/usb/stk1160/stk1160.h b/drivers/media/usb/stk1160/stk1160.h
> index 1ffca1343d88..52bea7815ae5 100644
> --- a/drivers/media/usb/stk1160/stk1160.h
> +++ b/drivers/media/usb/stk1160/stk1160.h
> @@ -16,6 +16,8 @@
>  #include <media/videobuf2-v4l2.h>
>  #include <media/v4l2-device.h>
>  #include <media/v4l2-ctrls.h>
> +#include <linux/usb.h>
> +#include <linux/usb/hcd.h>
>  
>  #define STK1160_VERSION		"0.9.5"
>  #define STK1160_VERSION_NUM	0x000905
> @@ -87,6 +89,9 @@ struct stk1160_buffer {
>  struct stk1160_urb {
>  	struct urb *urb;
>  	char *transfer_buffer;
> +	struct sg_table *sgt;
> +	struct stk1160 *dev;
> +	dma_addr_t dma;
>  };
>  
>  struct stk1160_isoc_ctl {
> @@ -190,3 +195,8 @@ void stk1160_select_input(struct stk1160 *dev);
>  
>  /* Provided by stk1160-ac97.c */
>  void stk1160_ac97_setup(struct stk1160 *dev);
> +
> +static inline struct device *stk1160_get_dmadev(struct stk1160 *dev)
> +{
> +	return bus_to_hcd(dev->udev->bus)->self.sysdev;

This function looks truly horrible, is there no other way to get the device ?

I suppose this function return something different than (struct stk1160*)dev->dev ?

Thanks,
Ezequiel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API
  2022-01-11  8:59           ` Christoph Hellwig
@ 2022-01-12  9:25             ` Sergey Senozhatsky
  0 siblings, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2022-01-12  9:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dafna Hirschfeld, ezequiel, linux-media, laurent.pinchart,
	hverkuil, ribalda, tfiga, senozhatsky, kernel

On (22/01/11 09:59), Christoph Hellwig wrote:
[..]
> >>> You mean you would call "invalidate_kernel_vmap_range" before "dma_sync_sgtable_for_cpu" ?
> >>
> >> Yes.
> >
> > Could you explain why?
> 
> the vmap range is the one actually use for cpu access and thus most
> prone for speculation, so I'd invalidate it first.  It probably does
> not matter to much, but that order looks more sensible.

Shall we do the same in videobuf2?

----

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index 7c4096e62173..0e3f264122af 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -132,12 +132,12 @@ static void vb2_dc_prepare(void *buf_priv)
 	if (!buf->non_coherent_mem)
 		return;
 
-	/* For both USERPTR and non-coherent MMAP */
-	dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
-
 	/* Non-coherent MMAP only */
 	if (buf->vaddr)
 		flush_kernel_vmap_range(buf->vaddr, buf->size);
+
+	/* For both USERPTR and non-coherent MMAP */
+	dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
 }
 
 static void vb2_dc_finish(void *buf_priv)
@@ -152,12 +152,12 @@ static void vb2_dc_finish(void *buf_priv)
 	if (!buf->non_coherent_mem)
 		return;
 
-	/* For both USERPTR and non-coherent MMAP */
-	dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
-
 	/* Non-coherent MMAP only */
 	if (buf->vaddr)
 		invalidate_kernel_vmap_range(buf->vaddr, buf->size);
+
+	/* For both USERPTR and non-coherent MMAP */
+	dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
 }
 
 /*********************************************/

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API
  2022-01-11 12:38   ` Ezequiel Garcia
@ 2022-01-24 14:43     ` Dafna Hirschfeld
  2022-01-24 16:02       ` Dafna Hirschfeld
  0 siblings, 1 reply; 16+ messages in thread
From: Dafna Hirschfeld @ 2022-01-24 14:43 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: linux-media, laurent.pinchart, hverkuil, ribalda, tfiga,
	senozhatsky, hch, kernel



On 11.01.22 14:38, Ezequiel Garcia wrote:
> Hi Dafna,
> 
> Very nice work.
> 
> I specially like all the testing that you mentioned in the cover-letter.
> 
> Back in the day, there were a few users trying to use STK1160 (Easycap)
> with Beaglebone boards, without success due to lack of USB throughput.

Hi, what do you mean by "USB throughput" ?
I see that the FPS does not change with this patchset and stands on about 25 frames/sec
So this patchset only improve cpu time.

> 
> If anything else, I think it's good to see additional noncontiguous API users.
> 
> On Tue, Jan 11, 2022 at 08:55:05AM +0200, Dafna Hirschfeld wrote:
>> Replace the urb buffers allocation to
>> use the noncontiguous API. This improve performance
>> on Arm.
>> The noncontiguous API require handling
>> synchronization.
>> This commit is similar to the one sent to
>> uvc: [1]
>>
>> [1] https://lkml.org/lkml/2021/3/12/1506
>>
> 
> This commit description needs lots of attention. In particular,
> please add the test results here.
> 
>> Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
>> ---
>>   drivers/media/usb/stk1160/stk1160-v4l.c   |   3 +
>>   drivers/media/usb/stk1160/stk1160-video.c | 109 +++++++++++++---------
>>   drivers/media/usb/stk1160/stk1160.h       |  10 ++
>>   3 files changed, 79 insertions(+), 43 deletions(-)
>>
> [..]
>> @@ -501,7 +524,7 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
>>   
>>   free_i_bufs:
>>   	/* Save the allocated buffers so far, so we can properly free them */
>> -	dev->isoc_ctl.num_bufs = i+1;
>> +	dev->isoc_ctl.num_bufs = i;
> 
> This looks like a separate fix, similar to 1/3 ?
> 
>>   	stk1160_free_isoc(dev);
>>   	return -ENOMEM;
>>   }
>> diff --git a/drivers/media/usb/stk1160/stk1160.h b/drivers/media/usb/stk1160/stk1160.h
>> index 1ffca1343d88..52bea7815ae5 100644
>> --- a/drivers/media/usb/stk1160/stk1160.h
>> +++ b/drivers/media/usb/stk1160/stk1160.h
>> @@ -16,6 +16,8 @@
>>   #include <media/videobuf2-v4l2.h>
>>   #include <media/v4l2-device.h>
>>   #include <media/v4l2-ctrls.h>
>> +#include <linux/usb.h>
>> +#include <linux/usb/hcd.h>
>>   
>>   #define STK1160_VERSION		"0.9.5"
>>   #define STK1160_VERSION_NUM	0x000905
>> @@ -87,6 +89,9 @@ struct stk1160_buffer {
>>   struct stk1160_urb {
>>   	struct urb *urb;
>>   	char *transfer_buffer;
>> +	struct sg_table *sgt;
>> +	struct stk1160 *dev;
>> +	dma_addr_t dma;
>>   };
>>   
>>   struct stk1160_isoc_ctl {
>> @@ -190,3 +195,8 @@ void stk1160_select_input(struct stk1160 *dev);
>>   
>>   /* Provided by stk1160-ac97.c */
>>   void stk1160_ac97_setup(struct stk1160 *dev);
>> +
>> +static inline struct device *stk1160_get_dmadev(struct stk1160 *dev)
>> +{
>> +	return bus_to_hcd(dev->udev->bus)->self.sysdev;
> 
> This function looks truly horrible, is there no other way to get the device ?
> 
> I suppose this function return something different than (struct stk1160*)dev->dev ?

I remember I tried using this device and got an error that it is not cable of doing dma
allocations or something like that. The function "stk1160_get_dmadev" is a copy-paste from the
uvc equivalent:

static inline struct device *uvc_stream_to_dmadev(struct uvc_streaming *stream)
{
         return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
}

Thanks,
Dafna

> 
> Thanks,
> Ezequiel
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API
  2022-01-24 14:43     ` Dafna Hirschfeld
@ 2022-01-24 16:02       ` Dafna Hirschfeld
  0 siblings, 0 replies; 16+ messages in thread
From: Dafna Hirschfeld @ 2022-01-24 16:02 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: linux-media, laurent.pinchart, hverkuil, ribalda, tfiga,
	senozhatsky, hch, kernel



On 24.01.22 16:43, Dafna Hirschfeld wrote:
> 
> 
> On 11.01.22 14:38, Ezequiel Garcia wrote:
>> Hi Dafna,
>>
>> Very nice work.
>>
>> I specially like all the testing that you mentioned in the cover-letter.
>>
>> Back in the day, there were a few users trying to use STK1160 (Easycap)
>> with Beaglebone boards, without success due to lack of USB throughput.
> 
> Hi, what do you mean by "USB throughput" ?
> I see that the FPS does not change with this patchset and stands on about 25 frames/sec
> So this patchset only improve cpu time.
> 
>>
>> If anything else, I think it's good to see additional noncontiguous API users.
>>
>> On Tue, Jan 11, 2022 at 08:55:05AM +0200, Dafna Hirschfeld wrote:
>>> Replace the urb buffers allocation to
>>> use the noncontiguous API. This improve performance
>>> on Arm.
>>> The noncontiguous API require handling
>>> synchronization.
>>> This commit is similar to the one sent to
>>> uvc: [1]
>>>
>>> [1] https://lkml.org/lkml/2021/3/12/1506
>>>
>>
>> This commit description needs lots of attention. In particular,
>> please add the test results here.
>>
>>> Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
>>> ---
>>>   drivers/media/usb/stk1160/stk1160-v4l.c   |   3 +
>>>   drivers/media/usb/stk1160/stk1160-video.c | 109 +++++++++++++---------
>>>   drivers/media/usb/stk1160/stk1160.h       |  10 ++
>>>   3 files changed, 79 insertions(+), 43 deletions(-)
>>>
>> [..]
>>> @@ -501,7 +524,7 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
>>>   free_i_bufs:
>>>       /* Save the allocated buffers so far, so we can properly free them */
>>> -    dev->isoc_ctl.num_bufs = i+1;
>>> +    dev->isoc_ctl.num_bufs = i;
>>
>> This looks like a separate fix, similar to 1/3 ?
>>
>>>       stk1160_free_isoc(dev);
>>>       return -ENOMEM;
>>>   }
>>> diff --git a/drivers/media/usb/stk1160/stk1160.h b/drivers/media/usb/stk1160/stk1160.h
>>> index 1ffca1343d88..52bea7815ae5 100644
>>> --- a/drivers/media/usb/stk1160/stk1160.h
>>> +++ b/drivers/media/usb/stk1160/stk1160.h
>>> @@ -16,6 +16,8 @@
>>>   #include <media/videobuf2-v4l2.h>
>>>   #include <media/v4l2-device.h>
>>>   #include <media/v4l2-ctrls.h>
>>> +#include <linux/usb.h>
>>> +#include <linux/usb/hcd.h>
>>>   #define STK1160_VERSION        "0.9.5"
>>>   #define STK1160_VERSION_NUM    0x000905
>>> @@ -87,6 +89,9 @@ struct stk1160_buffer {
>>>   struct stk1160_urb {
>>>       struct urb *urb;
>>>       char *transfer_buffer;
>>> +    struct sg_table *sgt;
>>> +    struct stk1160 *dev;
>>> +    dma_addr_t dma;
>>>   };
>>>   struct stk1160_isoc_ctl {
>>> @@ -190,3 +195,8 @@ void stk1160_select_input(struct stk1160 *dev);
>>>   /* Provided by stk1160-ac97.c */
>>>   void stk1160_ac97_setup(struct stk1160 *dev);
>>> +
>>> +static inline struct device *stk1160_get_dmadev(struct stk1160 *dev)
>>> +{
>>> +    return bus_to_hcd(dev->udev->bus)->self.sysdev;
>>
>> This function looks truly horrible, is there no other way to get the device ?
>>
>> I suppose this function return something different than (struct stk1160*)dev->dev ?
> 
> I remember I tried using this device and got an error that it is not cable of doing dma
> allocations or something like that. The function "stk1160_get_dmadev" is a copy-paste from the
> uvc equivalent:
> 
> static inline struct device *uvc_stream_to_dmadev(struct uvc_streaming *stream)
> {
>          return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
> }
> 
> Thanks,
> Dafna

There are 3 'struct device' related, when printing their names with:

         pr_err("dmadev device is %s\n", dev_name(stk1160_get_dmadev(dev)));
         pr_err("udev->dev device is %s\n", dev_name(&dev->udev->dev));
         pr_err("dev device is %s\n", dev_name(dev->dev));

I get:

[  107.770870] dmadev device is fe900000.usb
[  107.771253] udev->dev device is 7-1
[  107.771561] dev device is 7-1:1.0

If I try to use 'dev->dev' instead of the above stk1160_get_dmadev,
streaming fails and I get two warnings. The first one is originated from:

https://elixir.bootlin.com/linux/v5.16-rc4/source/kernel/dma/mapping.c#L546

Which is:
WARN_ON_ONCE(!dev->coherent_dma_mask)

I wonder if that warning is required in our case since we remove the alloc_coherent usage.

The second warning in videobuf2-core.c:1612 seems to be a bug in the driver itself.
I'll look at that.


[   65.554406] ------------[ cut here ]------------
[   65.554815] WARNING: CPU: 1 PID: 593 at /home/dafna/git/easycap_media_tree/kernel/dma/mapping.c:546 __dma_alloc_pages+0x78/0xc8
[   65.555836] Modules linked in: snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_soc_hdmi_codec dw_hdmi_i2s_audio saa7115 stk1160 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc crct10dif_ce panfrost snd_soc_simple_card snd_soc_audio_graph_card snd_soc_spdif_tx snd_soc_simple_card_utils gpu_sched phy_rockchip_pcie snd_soc_rockchip_i2s rockchipdrm analogix_dp dw_mipi_dsi dw_hdmi cec drm_kms_helper drm rtc_rk808 rockchip_saradc industrialio_triggered_buffer kfifo_buf rockchip_thermal pcie_rockchip_host ip_tables x_tables ipv6
[   65.560265] CPU: 1 PID: 593 Comm: v4l2src0:src Not tainted 5.16.0-rc4-62408-g32447129cb30-dirty #14
[   65.561269] Hardware name: Radxa ROCK Pi 4B (DT)
[   65.561676] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   65.562288] pc : __dma_alloc_pages+0x78/0xc8
[   65.562849] lr : dma_alloc_noncontiguous+0x128/0x1a8
[   65.563288] sp : ffff800012bc39b0
[   65.563579] x29: ffff800012bc39b0 x28: 0000000000000000 x27: 0000000000030000
[   65.564213] x26: ffff00000908a950 x25: ffff0000018af830 x24: 0000000000410080
[   65.564843] x23: 0000000000000002 x22: ffff000005c88090 x21: 0000000000000cc0
[   65.565473] x20: ffff0000018af830 x19: 0000000000030000 x18: ffffffffffffffff
[   65.566103] x17: 000000040044ffff x16: 00400032b5503510 x15: ffff800011323f78
[   65.566735] x14: 0000000000000000 x13: 302e313a312d3720 x12: 7369206563697665
[   65.567366] x11: 0000000005f5e0ff x10: 000000000000005d x9 : 00000000ffffffd0
[   65.567997] x8 : ffff000005c880a0 x7 : 0000000000000000 x6 : ffff800012bc3750
[   65.568628] x5 : ffff800011b671f8 x4 : 0000000000000cc0 x3 : 0000000000000002
[   65.569259] x2 : ffff000005c88090 x1 : 0000000000030000 x0 : 0000000000000000
[   65.569891] Call trace:
[   65.570110]  __dma_alloc_pages+0x78/0xc8
[   65.570459]  dma_alloc_noncontiguous+0x128/0x1a8
[   65.570868]  stk1160_alloc_isoc+0xf0/0x2b0 [stk1160]
[   65.571318]  start_streaming+0xfc/0x250 [stk1160]
[   65.571740]  vb2_start_streaming+0x6c/0x160 [videobuf2_common]
[   65.572273]  vb2_core_streamon+0x17c/0x1a8 [videobuf2_common]
[   65.572794]  vb2_streamon+0x54/0x88 [videobuf2_v4l2]
[   65.573244]  vb2_ioctl_streamon+0x54/0x60 [videobuf2_v4l2]
[   65.573736]  v4l_streamon+0x3c/0x50 [videodev]
[   65.574195]  __video_do_ioctl+0x1a4/0x428 [videodev]
[   65.574684]  video_usercopy+0x320/0x828 [videodev]
[   65.575158]  video_ioctl2+0x3c/0x58 [videodev]
[   65.575603]  v4l2_ioctl+0x60/0x90 [videodev]
[   65.576031]  __arm64_sys_ioctl+0xa8/0xe0
[   65.576381]  invoke_syscall+0x54/0x118
[   65.576718]  el0_svc_common.constprop.3+0x84/0x100
[   65.577144]  do_el0_svc+0x34/0xa0
[   65.577441]  el0_svc+0x1c/0x50
[   65.577716]  el0t_64_sync_handler+0x88/0xb0
[   65.578086]  el0t_64_sync+0x16c/0x170
[   65.578413] ---[ end trace 578e0ba07742170c ]---
[   65.583222] ------------[ cut here ]------------
[   65.583633] WARNING: CPU: 5 PID: 593 at /home/dafna/git/easycap_media_tree/drivers/media/common/videobuf2/videobuf2-core.c:1612 vb2_start_streaming+0xd4/0x160 [videobuf2_common]
[   65.585027] Modules linked in: snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_soc_hdmi_codec dw_hdmi_i2s_audio saa7115 stk1160 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc crct10dif_ce panfrost snd_soc_simple_card snd_soc_audio_graph_card snd_soc_spdif_tx snd_soc_simple_card_utils gpu_sched phy_rockchip_pcie snd_soc_rockchip_i2s rockchipdrm analogix_dp dw_mipi_dsi dw_hdmi cec drm_kms_helper drm rtc_rk808 rockchip_saradc industrialio_triggered_buffer kfifo_buf rockchip_thermal pcie_rockchip_host ip_tables x_tables ipv6
[   65.589383] CPU: 5 PID: 593 Comm: v4l2src0:src Tainted: G        W         5.16.0-rc4-62408-g32447129cb30-dirty #14
[   65.590293] Hardware name: Radxa ROCK Pi 4B (DT)
[   65.590696] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   65.591304] pc : vb2_start_streaming+0xd4/0x160 [videobuf2_common]
[   65.591850] lr : vb2_start_streaming+0x6c/0x160 [videobuf2_common]
[   65.592395] sp : ffff800012bc3ad0
[   65.592685] x29: ffff800012bc3ad0 x28: 0000000000000000 x27: ffff800012bc3cd8
[   65.593312] x26: 0000000000000000 x25: ffff00000d8a7800 x24: 0000000040045612
[   65.593938] x23: ffff800011323000 x22: ffff800012bc3cd8 x21: ffff00000908a8b0
[   65.594562] x20: ffff00000908a8c8 x19: 00000000fffffff4 x18: ffffffffffffffff
[   65.595188] x17: 000000040044ffff x16: 00400034b5503510 x15: ffff800011323f78
[   65.595813] x14: ffff000013163886 x13: ffff000013163885 x12: 00000000000002ce
[   65.596439] x11: 0000000000000028 x10: 0000000000000001 x9 : 0000000000000228
[   65.597064] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff726c5e78
[   65.597690] x5 : ffff800012bc3990 x4 : 0000000000000000 x3 : ffff000009a34880
[   65.598315] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000007cd99f0
[   65.598940] Call trace:
[   65.599155]  vb2_start_streaming+0xd4/0x160 [videobuf2_common]
[   65.599672]  vb2_core_streamon+0x17c/0x1a8 [videobuf2_common]
[   65.600179]  vb2_streamon+0x54/0x88 [videobuf2_v4l2]
[   65.600619]  vb2_ioctl_streamon+0x54/0x60 [videobuf2_v4l2]
[   65.601103]  v4l_streamon+0x3c/0x50 [videodev]
[   65.601521]  __video_do_ioctl+0x1a4/0x428 [videodev]
[   65.601977]  video_usercopy+0x320/0x828 [videodev]
[   65.602419]  video_ioctl2+0x3c/0x58 [videodev]
[   65.602830]  v4l2_ioctl+0x60/0x90 [videodev]
[   65.603227]  __arm64_sys_ioctl+0xa8/0xe0
[   65.603576]  invoke_syscall+0x54/0x118
[   65.603911]  el0_svc_common.constprop.3+0x84/0x100
[   65.604332]  do_el0_svc+0x34/0xa0
[   65.604625]  el0_svc+0x1c/0x50
[   65.604897]  el0t_64_sync_handler+0x88/0xb0
[   65.605264]  el0t_64_sync+0x16c/0x170
[   65.605587] ---[ end trace 578e0ba07742170d ]---

Thanks,
Dafna

> 
>>
>> Thanks,
>> Ezequiel
>>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-01-24 16:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-11  6:55 [PATCH 0/3] media: stk1160: allocate urb buffs with the DMA noncontiguous API Dafna Hirschfeld
2022-01-11  6:55 ` [PATCH 1/3] media: stk1160: fix number of buffers in case not all buffers are created Dafna Hirschfeld
2022-01-11 12:21   ` Ezequiel Garcia
2022-01-11  6:55 ` [PATCH 2/3] media: stk1160: move transfer_buffer and urb to same struct 'stk1160_urb' Dafna Hirschfeld
2022-01-11  8:10   ` Christoph Hellwig
2022-01-11 12:29   ` Ezequiel Garcia
2022-01-11  6:55 ` [PATCH 3/3] media: stk1160: use dma_alloc_noncontiguous API Dafna Hirschfeld
2022-01-11  8:13   ` Christoph Hellwig
2022-01-11  8:50     ` Dafna Hirschfeld
2022-01-11  8:52       ` Christoph Hellwig
2022-01-11  8:55         ` Dafna Hirschfeld
2022-01-11  8:59           ` Christoph Hellwig
2022-01-12  9:25             ` Sergey Senozhatsky
2022-01-11 12:38   ` Ezequiel Garcia
2022-01-24 14:43     ` Dafna Hirschfeld
2022-01-24 16:02       ` Dafna Hirschfeld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.