linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers
@ 2019-08-02 13:12 Shik Chen
  2019-08-27  2:03 ` Tomasz Figa
  2019-09-28  3:33 ` Nicolas Boichat
  0 siblings, 2 replies; 8+ messages in thread
From: Shik Chen @ 2019-08-02 13:12 UTC (permalink / raw)
  To: linux-media
  Cc: Tomasz Figa, notify, Keiichi Watanabe, Ricky Liang, Shik Chen,
	Mauro Carvalho Chehab, Laurent Pinchart, linux-kernel

Similar to the commit 1161db6776bd ("media: usb: pwc: Don't use coherent
DMA buffers for ISO transfer") [1] for the pwc driver. Use streaming DMA
APIs to transfer buffers and sync them explicitly, because accessing
buffers allocated by usb_alloc_coherent() is slow on platforms without
hardware coherent DMA.

Tested on x86-64 (Intel Celeron 4305U) and armv7l (Rockchip RK3288) with
Logitech Brio 4K camera at 1920x1080 using the WebRTC sample site [3].

|                  |  URB (us)  | Decode (Gbps) | CPU (%) |
|------------------|------------|---------------|---------|
| x86-64 coherent  |  53 +-  20 |          50.6 |    0.24 |
| x86-64 streaming |  55 +-  19 |          50.1 |    0.25 |
| armv7l coherent  | 342 +- 379 |           1.8 |    2.16 |
| armv7l streaming |  99 +-  98 |          11.0 |    0.36 |

The performance characteristics are improved a lot on armv7l, and
remained (almost) unchanged on x86-64. The code used for measurement can
be found at [2].

[1] https://git.kernel.org/torvalds/c/1161db6776bd
[2] https://crrev.com/c/1729133
[3] https://webrtc.github.io/samples/src/content/getusermedia/resolution/

Signed-off-by: Shik Chen <shik@chromium.org>
---
The allocated buffer could be as large as 768K when streaming 4K video.
Ideally we should use some generic helper to allocate non-coherent
memory in a more efficient way, such as https://lwn.net/Articles/774429/
("make the non-consistent DMA allocator more userful").

But I cannot find any existing helper so a simple kzalloc() is used in
this patch. The logic to figure out the DMA addressable GFP flags is
similar to __dma_direct_alloc_pages() without the optimistic retries:
https://elixir.bootlin.com/linux/v5.2.5/source/kernel/dma/direct.c#L96

 drivers/media/usb/uvc/uvc_video.c | 65 +++++++++++++++++++++----------
 1 file changed, 45 insertions(+), 20 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
index 8fa77a81dd7f2c..962c35478896c4 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -1539,6 +1539,8 @@ static void uvc_video_complete(struct urb *urb)
 	 * Process the URB headers, and optionally queue expensive memcpy tasks
 	 * to be deferred to a work queue.
 	 */
+	dma_sync_single_for_cpu(&urb->dev->dev, urb->transfer_dma,
+				urb->transfer_buffer_length, DMA_FROM_DEVICE);
 	stream->decode(uvc_urb, buf, buf_meta);
 
 	/* If no async work is needed, resubmit the URB immediately. */
@@ -1565,18 +1567,51 @@ static void uvc_free_urb_buffers(struct uvc_streaming *stream)
 		if (!uvc_urb->buffer)
 			continue;
 
-#ifndef CONFIG_DMA_NONCOHERENT
-		usb_free_coherent(stream->dev->udev, stream->urb_size,
-				  uvc_urb->buffer, uvc_urb->dma);
-#else
+		dma_unmap_single(&stream->dev->udev->dev, uvc_urb->dma,
+				 stream->urb_size, DMA_FROM_DEVICE);
 		kfree(uvc_urb->buffer);
-#endif
-		uvc_urb->buffer = NULL;
 	}
 
 	stream->urb_size = 0;
 }
 
+static gfp_t uvc_alloc_gfp_flags(struct device *dev)
+{
+	u64 mask = dma_get_mask(dev);
+
+	if (dev->bus_dma_mask)
+		mask &= dev->bus_dma_mask;
+
+	if (mask < DMA_BIT_MASK(32) && IS_ENABLED(CONFIG_ZONE_DMA))
+		return GFP_DMA;
+
+	if (mask < DMA_BIT_MASK(64)) {
+		if (IS_ENABLED(CONFIG_ZONE_DMA32))
+			return GFP_DMA32;
+		if (IS_ENABLED(CONFIG_ZONE_DMA))
+			return GFP_DMA;
+	}
+
+	return 0;
+}
+
+static char *uvc_alloc_urb_buffer(struct device *dev, size_t size,
+				  gfp_t gfp_flags, dma_addr_t *dma_handle)
+{
+	void *buffer = kzalloc(size, gfp_flags | uvc_alloc_gfp_flags(dev));
+
+	if (!buffer)
+		return NULL;
+
+	*dma_handle = dma_map_single(dev, buffer, size, DMA_FROM_DEVICE);
+	if (dma_mapping_error(dev, *dma_handle)) {
+		kfree(buffer);
+		return NULL;
+	}
+
+	return buffer;
+}
+
 /*
  * Allocate transfer buffers. This function can be called with buffers
  * already allocated when resuming from suspend, in which case it will
@@ -1607,18 +1642,14 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream,
 
 	/* Retry allocations until one succeed. */
 	for (; npackets > 1; npackets /= 2) {
+		stream->urb_size = psize * npackets;
+
 		for (i = 0; i < UVC_URBS; ++i) {
 			struct uvc_urb *uvc_urb = &stream->uvc_urb[i];
 
-			stream->urb_size = psize * npackets;
-#ifndef CONFIG_DMA_NONCOHERENT
-			uvc_urb->buffer = usb_alloc_coherent(
-				stream->dev->udev, stream->urb_size,
+			uvc_urb->buffer = uvc_alloc_urb_buffer(
+				&stream->dev->udev->dev, stream->urb_size,
 				gfp_flags | __GFP_NOWARN, &uvc_urb->dma);
-#else
-			uvc_urb->buffer =
-			    kmalloc(stream->urb_size, gfp_flags | __GFP_NOWARN);
-#endif
 			if (!uvc_urb->buffer) {
 				uvc_free_urb_buffers(stream);
 				break;
@@ -1728,12 +1759,8 @@ static int uvc_init_video_isoc(struct uvc_streaming *stream,
 		urb->context = uvc_urb;
 		urb->pipe = usb_rcvisocpipe(stream->dev->udev,
 				ep->desc.bEndpointAddress);
-#ifndef CONFIG_DMA_NONCOHERENT
 		urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP;
 		urb->transfer_dma = uvc_urb->dma;
-#else
-		urb->transfer_flags = URB_ISO_ASAP;
-#endif
 		urb->interval = ep->desc.bInterval;
 		urb->transfer_buffer = uvc_urb->buffer;
 		urb->complete = uvc_video_complete;
@@ -1793,10 +1820,8 @@ static int uvc_init_video_bulk(struct uvc_streaming *stream,
 
 		usb_fill_bulk_urb(urb, stream->dev->udev, pipe,	uvc_urb->buffer,
 				  size, uvc_video_complete, uvc_urb);
-#ifndef CONFIG_DMA_NONCOHERENT
 		urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP;
 		urb->transfer_dma = uvc_urb->dma;
-#endif
 
 		uvc_urb->urb = urb;
 	}
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers
  2019-08-02 13:12 [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers Shik Chen
@ 2019-08-27  2:03 ` Tomasz Figa
  2019-09-28  3:33 ` Nicolas Boichat
  1 sibling, 0 replies; 8+ messages in thread
From: Tomasz Figa @ 2019-08-27  2:03 UTC (permalink / raw)
  To: Linux Media Mailing List, Laurent Pinchart, Kieran Bingham
  Cc: notify, Shik Chen, Keiichi Watanabe, Ricky Liang,
	Mauro Carvalho Chehab, Linux Kernel Mailing List

On Fri, Aug 2, 2019 at 10:12 PM Shik Chen <shik@chromium.org> wrote:
>
> Similar to the commit 1161db6776bd ("media: usb: pwc: Don't use coherent
> DMA buffers for ISO transfer") [1] for the pwc driver. Use streaming DMA
> APIs to transfer buffers and sync them explicitly, because accessing
> buffers allocated by usb_alloc_coherent() is slow on platforms without
> hardware coherent DMA.
>
> Tested on x86-64 (Intel Celeron 4305U) and armv7l (Rockchip RK3288) with
> Logitech Brio 4K camera at 1920x1080 using the WebRTC sample site [3].
>
> |                  |  URB (us)  | Decode (Gbps) | CPU (%) |
> |------------------|------------|---------------|---------|
> | x86-64 coherent  |  53 +-  20 |          50.6 |    0.24 |
> | x86-64 streaming |  55 +-  19 |          50.1 |    0.25 |
> | armv7l coherent  | 342 +- 379 |           1.8 |    2.16 |
> | armv7l streaming |  99 +-  98 |          11.0 |    0.36 |
>
> The performance characteristics are improved a lot on armv7l, and
> remained (almost) unchanged on x86-64. The code used for measurement can
> be found at [2].
>
> [1] https://git.kernel.org/torvalds/c/1161db6776bd
> [2] https://crrev.com/c/1729133
> [3] https://webrtc.github.io/samples/src/content/getusermedia/resolution/
>
> Signed-off-by: Shik Chen <shik@chromium.org>
> ---
> The allocated buffer could be as large as 768K when streaming 4K video.
> Ideally we should use some generic helper to allocate non-coherent
> memory in a more efficient way, such as https://lwn.net/Articles/774429/
> ("make the non-consistent DMA allocator more userful").
>
> But I cannot find any existing helper so a simple kzalloc() is used in
> this patch. The logic to figure out the DMA addressable GFP flags is
> similar to __dma_direct_alloc_pages() without the optimistic retries:
> https://elixir.bootlin.com/linux/v5.2.5/source/kernel/dma/direct.c#L96
>
>  drivers/media/usb/uvc/uvc_video.c | 65 +++++++++++++++++++++----------
>  1 file changed, 45 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
> index 8fa77a81dd7f2c..962c35478896c4 100644
> --- a/drivers/media/usb/uvc/uvc_video.c
> +++ b/drivers/media/usb/uvc/uvc_video.c
> @@ -1539,6 +1539,8 @@ static void uvc_video_complete(struct urb *urb)
>          * Process the URB headers, and optionally queue expensive memcpy tasks
>          * to be deferred to a work queue.
>          */
> +       dma_sync_single_for_cpu(&urb->dev->dev, urb->transfer_dma,
> +                               urb->transfer_buffer_length, DMA_FROM_DEVICE);
>         stream->decode(uvc_urb, buf, buf_meta);
>
>         /* If no async work is needed, resubmit the URB immediately. */
> @@ -1565,18 +1567,51 @@ static void uvc_free_urb_buffers(struct uvc_streaming *stream)
>                 if (!uvc_urb->buffer)
>                         continue;
>
> -#ifndef CONFIG_DMA_NONCOHERENT
> -               usb_free_coherent(stream->dev->udev, stream->urb_size,
> -                                 uvc_urb->buffer, uvc_urb->dma);
> -#else
> +               dma_unmap_single(&stream->dev->udev->dev, uvc_urb->dma,
> +                                stream->urb_size, DMA_FROM_DEVICE);
>                 kfree(uvc_urb->buffer);
> -#endif
> -               uvc_urb->buffer = NULL;
>         }
>
>         stream->urb_size = 0;
>  }
>
> +static gfp_t uvc_alloc_gfp_flags(struct device *dev)
> +{
> +       u64 mask = dma_get_mask(dev);
> +
> +       if (dev->bus_dma_mask)
> +               mask &= dev->bus_dma_mask;
> +
> +       if (mask < DMA_BIT_MASK(32) && IS_ENABLED(CONFIG_ZONE_DMA))
> +               return GFP_DMA;
> +
> +       if (mask < DMA_BIT_MASK(64)) {
> +               if (IS_ENABLED(CONFIG_ZONE_DMA32))
> +                       return GFP_DMA32;
> +               if (IS_ENABLED(CONFIG_ZONE_DMA))
> +                       return GFP_DMA;
> +       }
> +
> +       return 0;
> +}
> +
> +static char *uvc_alloc_urb_buffer(struct device *dev, size_t size,
> +                                 gfp_t gfp_flags, dma_addr_t *dma_handle)
> +{
> +       void *buffer = kzalloc(size, gfp_flags | uvc_alloc_gfp_flags(dev));
> +
> +       if (!buffer)
> +               return NULL;
> +
> +       *dma_handle = dma_map_single(dev, buffer, size, DMA_FROM_DEVICE);
> +       if (dma_mapping_error(dev, *dma_handle)) {
> +               kfree(buffer);
> +               return NULL;
> +       }
> +
> +       return buffer;
> +}
> +
>  /*
>   * Allocate transfer buffers. This function can be called with buffers
>   * already allocated when resuming from suspend, in which case it will
> @@ -1607,18 +1642,14 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream,
>
>         /* Retry allocations until one succeed. */
>         for (; npackets > 1; npackets /= 2) {
> +               stream->urb_size = psize * npackets;
> +
>                 for (i = 0; i < UVC_URBS; ++i) {
>                         struct uvc_urb *uvc_urb = &stream->uvc_urb[i];
>
> -                       stream->urb_size = psize * npackets;
> -#ifndef CONFIG_DMA_NONCOHERENT
> -                       uvc_urb->buffer = usb_alloc_coherent(
> -                               stream->dev->udev, stream->urb_size,
> +                       uvc_urb->buffer = uvc_alloc_urb_buffer(
> +                               &stream->dev->udev->dev, stream->urb_size,
>                                 gfp_flags | __GFP_NOWARN, &uvc_urb->dma);
> -#else
> -                       uvc_urb->buffer =
> -                           kmalloc(stream->urb_size, gfp_flags | __GFP_NOWARN);
> -#endif
>                         if (!uvc_urb->buffer) {
>                                 uvc_free_urb_buffers(stream);
>                                 break;
> @@ -1728,12 +1759,8 @@ static int uvc_init_video_isoc(struct uvc_streaming *stream,
>                 urb->context = uvc_urb;
>                 urb->pipe = usb_rcvisocpipe(stream->dev->udev,
>                                 ep->desc.bEndpointAddress);
> -#ifndef CONFIG_DMA_NONCOHERENT
>                 urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP;
>                 urb->transfer_dma = uvc_urb->dma;
> -#else
> -               urb->transfer_flags = URB_ISO_ASAP;
> -#endif
>                 urb->interval = ep->desc.bInterval;
>                 urb->transfer_buffer = uvc_urb->buffer;
>                 urb->complete = uvc_video_complete;
> @@ -1793,10 +1820,8 @@ static int uvc_init_video_bulk(struct uvc_streaming *stream,
>
>                 usb_fill_bulk_urb(urb, stream->dev->udev, pipe, uvc_urb->buffer,
>                                   size, uvc_video_complete, uvc_urb);
> -#ifndef CONFIG_DMA_NONCOHERENT
>                 urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP;
>                 urb->transfer_dma = uvc_urb->dma;
> -#endif
>
>                 uvc_urb->urb = urb;
>         }
> --
> 2.22.0.770.g0f2c4a37fd-goog
>

Gentle ping.

Also, oops, looks like we forgot to add Kieran on CC. Let me fix it now.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers
  2019-08-02 13:12 [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers Shik Chen
  2019-08-27  2:03 ` Tomasz Figa
@ 2019-09-28  3:33 ` Nicolas Boichat
  2019-09-30  8:23   ` Christoph Hellwig
  1 sibling, 1 reply; 8+ messages in thread
From: Nicolas Boichat @ 2019-09-28  3:33 UTC (permalink / raw)
  To: Shik Chen
  Cc: Linux Media Mailing List, Tomasz Figa, notify, Keiichi Watanabe,
	Ricky Liang, Mauro Carvalho Chehab, Laurent Pinchart, lkml,
	kieran.bingham, Vlastimil Babka, Christoph Lameter,
	Christoph Hellwig

On Fri, Aug 2, 2019 at 9:13 PM Shik Chen <shik@chromium.org> wrote:
>
> Similar to the commit 1161db6776bd ("media: usb: pwc: Don't use coherent
> DMA buffers for ISO transfer") [1] for the pwc driver. Use streaming DMA
> APIs to transfer buffers and sync them explicitly, because accessing
> buffers allocated by usb_alloc_coherent() is slow on platforms without
> hardware coherent DMA.
>
> Tested on x86-64 (Intel Celeron 4305U) and armv7l (Rockchip RK3288) with
> Logitech Brio 4K camera at 1920x1080 using the WebRTC sample site [3].
>
> |                  |  URB (us)  | Decode (Gbps) | CPU (%) |
> |------------------|------------|---------------|---------|
> | x86-64 coherent  |  53 +-  20 |          50.6 |    0.24 |
> | x86-64 streaming |  55 +-  19 |          50.1 |    0.25 |
> | armv7l coherent  | 342 +- 379 |           1.8 |    2.16 |
> | armv7l streaming |  99 +-  98 |          11.0 |    0.36 |
>
> The performance characteristics are improved a lot on armv7l, and
> remained (almost) unchanged on x86-64. The code used for measurement can
> be found at [2].
>
> [1] https://git.kernel.org/torvalds/c/1161db6776bd
> [2] https://crrev.com/c/1729133
> [3] https://webrtc.github.io/samples/src/content/getusermedia/resolution/
>
> Signed-off-by: Shik Chen <shik@chromium.org>
> ---
> The allocated buffer could be as large as 768K when streaming 4K video.
> Ideally we should use some generic helper to allocate non-coherent
> memory in a more efficient way, such as https://lwn.net/Articles/774429/
> ("make the non-consistent DMA allocator more userful").
>
> But I cannot find any existing helper so a simple kzalloc() is used in
> this patch. The logic to figure out the DMA addressable GFP flags is
> similar to __dma_direct_alloc_pages() without the optimistic retries:
> https://elixir.bootlin.com/linux/v5.2.5/source/kernel/dma/direct.c#L96
>
>  drivers/media/usb/uvc/uvc_video.c | 65 +++++++++++++++++++++----------
>  1 file changed, 45 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
> index 8fa77a81dd7f2c..962c35478896c4 100644
> --- a/drivers/media/usb/uvc/uvc_video.c
> +++ b/drivers/media/usb/uvc/uvc_video.c
> @@ -1539,6 +1539,8 @@ static void uvc_video_complete(struct urb *urb)
>          * Process the URB headers, and optionally queue expensive memcpy tasks
>          * to be deferred to a work queue.
>          */
> +       dma_sync_single_for_cpu(&urb->dev->dev, urb->transfer_dma,
> +                               urb->transfer_buffer_length, DMA_FROM_DEVICE);
>         stream->decode(uvc_urb, buf, buf_meta);
>
>         /* If no async work is needed, resubmit the URB immediately. */
> @@ -1565,18 +1567,51 @@ static void uvc_free_urb_buffers(struct uvc_streaming *stream)
>                 if (!uvc_urb->buffer)
>                         continue;
>
> -#ifndef CONFIG_DMA_NONCOHERENT
> -               usb_free_coherent(stream->dev->udev, stream->urb_size,
> -                                 uvc_urb->buffer, uvc_urb->dma);
> -#else
> +               dma_unmap_single(&stream->dev->udev->dev, uvc_urb->dma,
> +                                stream->urb_size, DMA_FROM_DEVICE);
>                 kfree(uvc_urb->buffer);
> -#endif
> -               uvc_urb->buffer = NULL;
>         }
>
>         stream->urb_size = 0;
>  }
>
> +static gfp_t uvc_alloc_gfp_flags(struct device *dev)
> +{
> +       u64 mask = dma_get_mask(dev);
> +
> +       if (dev->bus_dma_mask)
> +               mask &= dev->bus_dma_mask;
> +
> +       if (mask < DMA_BIT_MASK(32) && IS_ENABLED(CONFIG_ZONE_DMA))
> +               return GFP_DMA;
> +
> +       if (mask < DMA_BIT_MASK(64)) {
> +               if (IS_ENABLED(CONFIG_ZONE_DMA32))
> +                       return GFP_DMA32;

We're hitting issues with this on 64-bit ARM platform, where
ZONE_DMA32 is enabled (default), the kmalloc allocation with GFP_DMA32
fails.

There is no slab cache for GFP_DMA32, so those calls are expected to
fail, AFAICT there are no other (working) kmalloc(..., .. | GFP_DMA32)
users in the kernel, so I don't think we want to add a cache. If this
helps, some discussion here why the cache wasn't added:
https://lore.kernel.org/patchwork/patch/1009563/#1198622

This function looks out of place in a high-level driver, but from your
comment above (below ---), I guess we may have to live with that until
the kernel provides a better API?

For the time being, looking at how __dma_direct_alloc_pages works,
could you use alloc_pages_node (or dma_alloc_contiguous?) instead of
kmalloc, that would let you use GFP_DMA32 as needed?

> +               if (IS_ENABLED(CONFIG_ZONE_DMA))
> +                       return GFP_DMA;
> +       }
> +
> +       return 0;
> +}
> +
> +static char *uvc_alloc_urb_buffer(struct device *dev, size_t size,
> +                                 gfp_t gfp_flags, dma_addr_t *dma_handle)
> +{
> +       void *buffer = kzalloc(size, gfp_flags | uvc_alloc_gfp_flags(dev));
> +
> +       if (!buffer)
> +               return NULL;
> +
> +       *dma_handle = dma_map_single(dev, buffer, size, DMA_FROM_DEVICE);
> +       if (dma_mapping_error(dev, *dma_handle)) {
> +               kfree(buffer);
> +               return NULL;
> +       }
> +
> +       return buffer;
> +}
> +
>  /*
>   * Allocate transfer buffers. This function can be called with buffers
>   * already allocated when resuming from suspend, in which case it will
> @@ -1607,18 +1642,14 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream,
>
>         /* Retry allocations until one succeed. */
>         for (; npackets > 1; npackets /= 2) {
> +               stream->urb_size = psize * npackets;
> +
>                 for (i = 0; i < UVC_URBS; ++i) {
>                         struct uvc_urb *uvc_urb = &stream->uvc_urb[i];
>
> -                       stream->urb_size = psize * npackets;
> -#ifndef CONFIG_DMA_NONCOHERENT
> -                       uvc_urb->buffer = usb_alloc_coherent(
> -                               stream->dev->udev, stream->urb_size,
> +                       uvc_urb->buffer = uvc_alloc_urb_buffer(
> +                               &stream->dev->udev->dev, stream->urb_size,
>                                 gfp_flags | __GFP_NOWARN, &uvc_urb->dma);
> -#else
> -                       uvc_urb->buffer =
> -                           kmalloc(stream->urb_size, gfp_flags | __GFP_NOWARN);
> -#endif
>                         if (!uvc_urb->buffer) {
>                                 uvc_free_urb_buffers(stream);
>                                 break;
> @@ -1728,12 +1759,8 @@ static int uvc_init_video_isoc(struct uvc_streaming *stream,
>                 urb->context = uvc_urb;
>                 urb->pipe = usb_rcvisocpipe(stream->dev->udev,
>                                 ep->desc.bEndpointAddress);
> -#ifndef CONFIG_DMA_NONCOHERENT
>                 urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP;
>                 urb->transfer_dma = uvc_urb->dma;
> -#else
> -               urb->transfer_flags = URB_ISO_ASAP;
> -#endif
>                 urb->interval = ep->desc.bInterval;
>                 urb->transfer_buffer = uvc_urb->buffer;
>                 urb->complete = uvc_video_complete;
> @@ -1793,10 +1820,8 @@ static int uvc_init_video_bulk(struct uvc_streaming *stream,
>
>                 usb_fill_bulk_urb(urb, stream->dev->udev, pipe, uvc_urb->buffer,
>                                   size, uvc_video_complete, uvc_urb);
> -#ifndef CONFIG_DMA_NONCOHERENT
>                 urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP;
>                 urb->transfer_dma = uvc_urb->dma;
> -#endif
>
>                 uvc_urb->urb = urb;
>         }
> --
> 2.22.0.770.g0f2c4a37fd-goog
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers
  2019-09-28  3:33 ` Nicolas Boichat
@ 2019-09-30  8:23   ` Christoph Hellwig
  2019-10-01  6:37     ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-09-30  8:23 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: Shik Chen, Linux Media Mailing List, Tomasz Figa, notify,
	Keiichi Watanabe, Ricky Liang, Mauro Carvalho Chehab,
	Laurent Pinchart, lkml, kieran.bingham, Vlastimil Babka,
	Christoph Lameter, Christoph Hellwig

On Sat, Sep 28, 2019 at 11:33:16AM +0800, Nicolas Boichat wrote:
> > +static gfp_t uvc_alloc_gfp_flags(struct device *dev)
> > +{
> > +       u64 mask = dma_get_mask(dev);
> > +
> > +       if (dev->bus_dma_mask)
> > +               mask &= dev->bus_dma_mask;
> > +
> > +       if (mask < DMA_BIT_MASK(32) && IS_ENABLED(CONFIG_ZONE_DMA))
> > +               return GFP_DMA;
> > +
> > +       if (mask < DMA_BIT_MASK(64)) {
> > +               if (IS_ENABLED(CONFIG_ZONE_DMA32))
> > +                       return GFP_DMA32;
> 
> We're hitting issues with this on 64-bit ARM platform, where
> ZONE_DMA32 is enabled (default), the kmalloc allocation with GFP_DMA32
> fails.
> 
> There is no slab cache for GFP_DMA32, so those calls are expected to
> fail, AFAICT there are no other (working) kmalloc(..., .. | GFP_DMA32)
> users in the kernel, so I don't think we want to add a cache. If this
> helps, some discussion here why the cache wasn't added:
> https://lore.kernel.org/patchwork/patch/1009563/#1198622

And drivers really have no business looking at the dma mask.  I have
a plan for dma_alloc_pages API that could replace that cruft, but
until then please use GFP_KERNEL and let the dma subsystem bounce
buffer if needed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers
  2019-09-30  8:23   ` Christoph Hellwig
@ 2019-10-01  6:37     ` Christoph Hellwig
  2020-02-27  6:28       ` Tomasz Figa
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-10-01  6:37 UTC (permalink / raw)
  To: Nicolas Boichat
  Cc: Shik Chen, Linux Media Mailing List, Tomasz Figa, notify,
	Keiichi Watanabe, Ricky Liang, Mauro Carvalho Chehab,
	Laurent Pinchart, lkml, kieran.bingham, Vlastimil Babka,
	Christoph Lameter, Christoph Hellwig

On Mon, Sep 30, 2019 at 01:23:10AM -0700, Christoph Hellwig wrote:
> And drivers really have no business looking at the dma mask.  I have
> a plan for dma_alloc_pages API that could replace that cruft, but
> until then please use GFP_KERNEL and let the dma subsystem bounce
> buffer if needed.

Can you try this series:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages

and see if it does whay you need for usb?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers
  2019-10-01  6:37     ` Christoph Hellwig
@ 2020-02-27  6:28       ` Tomasz Figa
  2020-04-21 11:21         ` Tomasz Figa
  0 siblings, 1 reply; 8+ messages in thread
From: Tomasz Figa @ 2020-02-27  6:28 UTC (permalink / raw)
  To: Christoph Hellwig, Sergey Senozhatsky
  Cc: Nicolas Boichat, Shik Chen, Linux Media Mailing List, notify,
	Keiichi Watanabe, Ricky Liang, Mauro Carvalho Chehab,
	Laurent Pinchart, lkml, Kieran Bingham, Vlastimil Babka,
	Christoph Lameter

+Sergey Senozhatsky who's going to be looking into this.

Hi Christoph,

On Tue, Oct 1, 2019 at 3:37 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Sep 30, 2019 at 01:23:10AM -0700, Christoph Hellwig wrote:
> > And drivers really have no business looking at the dma mask.  I have
> > a plan for dma_alloc_pages API that could replace that cruft, but
> > until then please use GFP_KERNEL and let the dma subsystem bounce
> > buffer if needed.
>
> Can you try this series:
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages
>
> and see if it does whay you need for usb?

Reviving this thread. Sorry for no updates for a long time.

dma_alloc_pages() still wouldn't be an equivalent replacement of the
existing dma_alloc_coherent() (used behind the scenes by
usb_alloc_coherent()). That's because the latter can allocate
non-contiguous memory if the DMA device can handle it (i.e. is behind
an IOMMU), but the former can only allocate a contiguous range of
pages.

That said, I noticed that you also put a lot of effort into making the
NONCONSISTENT attribute more usable. Perhaps that's the way to go here
then? Of course we would need to make sure that the attribute is
handled properly on ARM and ARM64, which are the most affected
platforms. Right now neither handles them. The former doesn't use the
generic DMA mapping ops, while the latter does, but doesn't enable a
Kconfig option needed to allow generic inconsistent allocations.

Any hints would be appreciated.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers
  2020-02-27  6:28       ` Tomasz Figa
@ 2020-04-21 11:21         ` Tomasz Figa
  2020-04-27 12:48           ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Tomasz Figa @ 2020-04-21 11:21 UTC (permalink / raw)
  To: Christoph Hellwig, Catalin Marinas, Will Deacon
  Cc: Nicolas Boichat, Shik Chen, Linux Media Mailing List, notify,
	Keiichi Watanabe, Ricky Liang, Mauro Carvalho Chehab,
	Laurent Pinchart, lkml, Kieran Bingham, Vlastimil Babka,
	Christoph Lameter, Sergey Senozhatsky

On Thu, Feb 27, 2020 at 7:28 AM Tomasz Figa <tfiga@chromium.org> wrote:
>
> +Sergey Senozhatsky who's going to be looking into this.
>
> Hi Christoph,
>
> On Tue, Oct 1, 2019 at 3:37 PM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Mon, Sep 30, 2019 at 01:23:10AM -0700, Christoph Hellwig wrote:
> > > And drivers really have no business looking at the dma mask.  I have
> > > a plan for dma_alloc_pages API that could replace that cruft, but
> > > until then please use GFP_KERNEL and let the dma subsystem bounce
> > > buffer if needed.
> >
> > Can you try this series:
> >
> > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages
> >
> > and see if it does whay you need for usb?
>
> Reviving this thread. Sorry for no updates for a long time.
>
> dma_alloc_pages() still wouldn't be an equivalent replacement of the
> existing dma_alloc_coherent() (used behind the scenes by
> usb_alloc_coherent()). That's because the latter can allocate
> non-contiguous memory if the DMA device can handle it (i.e. is behind
> an IOMMU), but the former can only allocate a contiguous range of
> pages.
>
> That said, I noticed that you also put a lot of effort into making the
> NONCONSISTENT attribute more usable. Perhaps that's the way to go here
> then? Of course we would need to make sure that the attribute is
> handled properly on ARM and ARM64, which are the most affected
> platforms. Right now neither handles them. The former doesn't use the
> generic DMA mapping ops, while the latter does, but doesn't enable a
> Kconfig option needed to allow generic inconsistent allocations.
>
> Any hints would be appreciated.

Hi Christoph, would you have some time to check the above?

Hi Catalin, Will, do you know why CONFIG_DMA_NONCOHERENT_CACHE_SYNC is
not enabled on arm64?

Thanks in advance. :)

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers
  2020-04-21 11:21         ` Tomasz Figa
@ 2020-04-27 12:48           ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2020-04-27 12:48 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Christoph Hellwig, Catalin Marinas, Will Deacon, Nicolas Boichat,
	Shik Chen, Linux Media Mailing List, notify, Keiichi Watanabe,
	Ricky Liang, Mauro Carvalho Chehab, Laurent Pinchart, lkml,
	Kieran Bingham, Vlastimil Babka, Christoph Lameter,
	Sergey Senozhatsky

On Tue, Apr 21, 2020 at 01:21:15PM +0200, Tomasz Figa wrote:
> On Thu, Feb 27, 2020 at 7:28 AM Tomasz Figa <tfiga@chromium.org> wrote:
> >
> > +Sergey Senozhatsky who's going to be looking into this.
> >
> > Hi Christoph,
> >
> > That said, I noticed that you also put a lot of effort into making the
> > NONCONSISTENT attribute more usable. Perhaps that's the way to go here
> > then? Of course we would need to make sure that the attribute is
> > handled properly on ARM and ARM64, which are the most affected
> > platforms. Right now neither handles them. The former doesn't use the
> > generic DMA mapping ops, while the latter does, but doesn't enable a
> > Kconfig option needed to allow generic inconsistent allocations.
> >
> > Any hints would be appreciated.
> 
> Hi Christoph, would you have some time to check the above?
> 
> Hi Catalin, Will, do you know why CONFIG_DMA_NONCOHERENT_CACHE_SYNC is
> not enabled on arm64?

NONCONSISTENT is still a mess, mostly because dma_cache_sync is such
a horrible API.  I've been wanting to switch to the normal
sync_for_device / sync_for_cpu primitives instead.  Let me see if I can
expedite that.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-04-27 12:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-02 13:12 [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers Shik Chen
2019-08-27  2:03 ` Tomasz Figa
2019-09-28  3:33 ` Nicolas Boichat
2019-09-30  8:23   ` Christoph Hellwig
2019-10-01  6:37     ` Christoph Hellwig
2020-02-27  6:28       ` Tomasz Figa
2020-04-21 11:21         ` Tomasz Figa
2020-04-27 12:48           ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).