linux-renesas-soc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC 0/2] R-Car DU: Support importing non-contiguous dma-buf with VSP
@ 2017-11-13 10:32 Laurent Pinchart
  2017-11-13 10:32 ` [PATCH/RFC 1/2] drm: rcar-du: Set the DMA coherent mask for the DU device Laurent Pinchart
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Laurent Pinchart @ 2017-11-13 10:32 UTC (permalink / raw)
  To: dri-devel; +Cc: linux-renesas-soc, Kieran Bingham, Liviu Dudau, Daniel Vetter

Hello everybody,

This patch series fixes two issues related to dma-buf import for the Renesas
R-Car DU when the imported buffer is either not physically contiguous or
located in high memory.

Both cases require the use of an IOMMU to remap the buffers contiguously and
in 32-bit DMA space. On Gen2 platforms this is transparent and works already,
but on Gen3 platforms the situation is more complex.

The Gen3 DU doesn't perform DMA access directly but instead goes through a VSP
IP core that acts as an external composer. When importing a dma-buf buffer the
GEM PRIME helper drm_gem_prime_import() maps the buffer to the DU device, and
the DU driver later remaps it to the VSP device. The driver unfortunately
can't use the drm_gem_prime_import_dev() helper to map th buffer to the VSP
device directly, as at import time we don't know yet to which VSP device the
buffer will need to be mapped (there is one VSP composer per CRTC). Mapping
the buffer to the VSP can thus only be done when pinning the framebuffer, as
we know at that time which plane and CRTC it will be used with.

This works currently (with the caveat that an unneeded mapping to the DU is
created), but fails when the imported buffer is not physically contiguous or
is located in high memory. In the first case the GEM CMA helper
drm_gem_cma_prime_import_sg_table() will reject the buffer as it hasn't been
mapped contiguously in the DU DMA address space (because the DU has no IOMMU),
and in the second case the DMA mapping API will try to allocate a bounce
buffer and fail due to bounce buffer memory exhaustion.

The first patch in this series fixes the second issue by setting the DMA
coherent mask of the DU device to the full 40 bits address space. All memory
will thus be accepted without trying to allocate a bounce buffer.

The second patch in this series fixes the first issue by using an internal
copy of the drm_gem_cma_prime_import_sg_table() function that doesn't ensure
that the buffer is contiguous in memory. This is what caused me to post this
series as an RFC, as I'm not very happy with duplication of helper code in the
driver that could lead to breakages when the GEM CMA helpers are modified. If
this approach is accepted, I would prefer implementing the code in the GEM CMA
helpers.

Another point that bothers me is that buffers can now be imported
unconditionally on Gen3 platforms, but atomic commits will fail if the buffer
can't be remapped through the VSP (for instance if no IOMMU is available).
Given the hardware architecture and the DRM/KMS API I don't really see a way
around this.

Testing this series isn't easy as it requires getting hold of exported
dma-bufs located in high memory or not physically contiguous. I have used the
V4L2 vivid driver as an exporter for that purpose, with a few hacks that can
be found at

	git://linuxtv.org/pinchartl/media.git drm/du/devel/iommu

On the userspace side I have used the v4l2-drm-example test application
available at

	git://git.infradead.org/users/kmpark/public-apps

and run with

dmabuf-sharing -M rcar-du -i /dev/video0 -S 640,480 -f YUYV -F YUYV \
	-s 640,480@0,0 -t 640,480@0,0 -o 70:68:640x480 -b 4 -e v4l2

(the -o argument needs to be modified based on the connector and CRTC ID).

Up to patch "[DEBUG] v4l: videobuf2-dma-contig: Print allocated buffer
address" buffer sharing should work. Patch "[HACK] vivid: Allocate buffers
from high memory" then breaks buffer import, and the issue is fixed by patch
"drm: rcar-du: Set the DMA coherent mask for the DU device". Patch "[HACK]
vivid: Use vmalloc allocator" breaks buffer import again, and is fixed by
"drm: rcar-du: Allow importing non-contiguous dma-buf with VSP".

Kieran, I have tested this remotely on your Salvator-X H3 ES1.1 and haven't
noticed any problem, but couldn't check the output on the screen. Would you be
able to rerun the test locally for me ? Please note that the IPMMU is known to
have issues on H3 ES1.1, so you might need to test the code on H3 ES2.0
instead.

Laurent Pinchart (2):
  drm: rcar-du: Set the DMA coherent mask for the DU device
  drm: rcar-du: Allow importing non-contiguous dma-buf with VSP

 drivers/gpu/drm/rcar-du/rcar_du_drv.c | 13 +++++++++++-
 drivers/gpu/drm/rcar-du/rcar_du_kms.c | 39 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/rcar-du/rcar_du_kms.h |  7 +++++++
 3 files changed, 58 insertions(+), 1 deletion(-)

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH/RFC 1/2] drm: rcar-du: Set the DMA coherent mask for the DU device
  2017-11-13 10:32 [PATCH/RFC 0/2] R-Car DU: Support importing non-contiguous dma-buf with VSP Laurent Pinchart
@ 2017-11-13 10:32 ` Laurent Pinchart
  2017-11-13 11:44   ` Liviu Dudau
  2017-11-13 10:32 ` [PATCH/RFC 2/2] drm: rcar-du: Allow importing non-contiguous dma-buf with VSP Laurent Pinchart
  2017-11-13 13:48 ` [PATCH/RFC 0/2] R-Car DU: Support " Kieran Bingham
  2 siblings, 1 reply; 10+ messages in thread
From: Laurent Pinchart @ 2017-11-13 10:32 UTC (permalink / raw)
  To: dri-devel; +Cc: linux-renesas-soc, Kieran Bingham, Liviu Dudau, Daniel Vetter

The DU DMA address space is limited to 32 bits, so the DMA coherent mask
should be set accordingly. The DMA mapping implementation will
transparently map high memory buffers to 32-bit addresses through an
IOMMU when present (or through bounce buffers otherwise, which isn't a
supported use case as performances would be terrible).

However, when sourcing frames from a VSP, the situation is more
complicated. The DU delegates all memory accesses to the VSP and doesn't
perform any DMA access by itself. Due to how the GEM CMA helpers are
structured buffers are still mapped to the DU device. They are later
mapped to the VSP as well to perform DMA access, through the IOMMU
connected to the VSP.

Setting the DMA coherent mask to 32 bits for the DU when using a VSP can
cause issues when importing a dma_buf. If the buffer is located above
the 32-bit address space, the DMA mapping implementation will try to map
it to the DU's DMA address space. As the DU has no IOMMU a bounce buffer
will be allocated, which in the best case will waste memory and in the
worst case will just fail.

To work around this issue, set the DMA coherent mask to the full 40-bit
address space for the DU. All dma-buf instances will be imported without
any restriction, and will be mapped to the VSP when preparing the
associated framebuffer.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
---
 drivers/gpu/drm/rcar-du/rcar_du_drv.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/rcar-du/rcar_du_drv.c b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
index 6e02c762a557..48c166f925a3 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_drv.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
@@ -370,6 +370,7 @@ static int rcar_du_probe(struct platform_device *pdev)
 	struct rcar_du_device *rcdu;
 	struct drm_device *ddev;
 	struct resource *mem;
+	unsigned int mask;
 	int ret;
 
 	/* Allocate and initialize the R-Car device structure. */
@@ -388,6 +389,16 @@ static int rcar_du_probe(struct platform_device *pdev)
 	if (IS_ERR(rcdu->mmio))
 		return PTR_ERR(rcdu->mmio);
 
+	/*
+	 * Set the DMA coherent mask to reflect the DU 32-bit DMA address space
+	 * limitations. When sourcing frames from a VSP the DU doesn't perform
+	 * any memory access so set the mask to 40 bits to accept all buffers.
+	 */
+	mask = rcar_du_has(rcdu, RCAR_DU_FEATURE_VSP1_SOURCE) ? 40 : 32;
+	ret = dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(mask));
+	if (ret)
+		return ret;
+
 	/* DRM/KMS objects */
 	ddev = drm_dev_alloc(&rcar_du_driver, &pdev->dev);
 	if (IS_ERR(ddev))
-- 
Regards,

Laurent Pinchart

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH/RFC 2/2] drm: rcar-du: Allow importing non-contiguous dma-buf with VSP
  2017-11-13 10:32 [PATCH/RFC 0/2] R-Car DU: Support importing non-contiguous dma-buf with VSP Laurent Pinchart
  2017-11-13 10:32 ` [PATCH/RFC 1/2] drm: rcar-du: Set the DMA coherent mask for the DU device Laurent Pinchart
@ 2017-11-13 10:32 ` Laurent Pinchart
  2017-11-13 11:53   ` Liviu Dudau
  2017-12-13 22:46   ` [PATCH/RFC v1.1] " Laurent Pinchart
  2017-11-13 13:48 ` [PATCH/RFC 0/2] R-Car DU: Support " Kieran Bingham
  2 siblings, 2 replies; 10+ messages in thread
From: Laurent Pinchart @ 2017-11-13 10:32 UTC (permalink / raw)
  To: dri-devel; +Cc: linux-renesas-soc, Kieran Bingham, Liviu Dudau, Daniel Vetter

When the DU sources its frames from a VSP, it performs no memory access
and thus has no requirements on imported dma-buf memory types. In
particular the DU could import a physically non-contiguous buffer that
would later be mapped contiguously through the VSP IOMMU.

This use case isn't supported at the moment as the GEM CMA helpers will
reject any non-contiguous buffer, and the DU isn't connected to an IOMMU
that can make the buffer contiguous for DMA. Fix this by implementing a
custom .gem_prime_import_sg_table() operation that accepts all imported
dma-buf regardless of the number of scatterlist entries.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
---
 drivers/gpu/drm/rcar-du/rcar_du_drv.c |  2 +-
 drivers/gpu/drm/rcar-du/rcar_du_kms.c | 39 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/rcar-du/rcar_du_kms.h |  7 +++++++
 3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/rcar-du/rcar_du_drv.c b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
index 48c166f925a3..d999231f98c7 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_drv.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
@@ -289,7 +289,7 @@ static struct drm_driver rcar_du_driver = {
 	.gem_prime_import	= drm_gem_prime_import,
 	.gem_prime_export	= drm_gem_prime_export,
 	.gem_prime_get_sg_table	= drm_gem_cma_prime_get_sg_table,
-	.gem_prime_import_sg_table = drm_gem_cma_prime_import_sg_table,
+	.gem_prime_import_sg_table = rcar_du_gem_prime_import_sg_table,
 	.gem_prime_vmap		= drm_gem_cma_prime_vmap,
 	.gem_prime_vunmap	= drm_gem_cma_prime_vunmap,
 	.gem_prime_mmap		= drm_gem_cma_prime_mmap,
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
index 566d1a948c8f..2dd0c2ba047d 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
@@ -20,6 +20,7 @@
 #include <drm/drm_gem_cma_helper.h>
 #include <drm/drm_gem_framebuffer_helper.h>
 
+#include <linux/dma-buf.h>
 #include <linux/of_graph.h>
 #include <linux/wait.h>
 
@@ -148,6 +149,44 @@ const struct rcar_du_format_info *rcar_du_format_info(u32 fourcc)
  * Frame buffer
  */
 
+struct drm_gem_object *rcar_du_gem_prime_import_sg_table(struct drm_device *dev,
+				struct dma_buf_attachment *attach,
+				struct sg_table *sgt)
+{
+	struct rcar_du_device *rcdu = dev->dev_private;
+	struct drm_gem_cma_object *cma_obj;
+	struct drm_gem_object *gem_obj;
+	int ret;
+
+	if (!rcar_du_has(rcdu, RCAR_DU_FEATURE_VSP1_SOURCE))
+		return drm_gem_cma_prime_import_sg_table(dev, attach, sgt);
+
+	/* Create a CMA GEM buffer. */
+	cma_obj = kzalloc(sizeof(*cma_obj), GFP_KERNEL);
+	if (!cma_obj)
+		return ERR_PTR(-ENOMEM);
+	gem_obj = &cma_obj->base;
+
+	ret = drm_gem_object_init(dev, gem_obj, attach->dmabuf->size);
+	if (ret)
+		goto error;
+
+	ret = drm_gem_create_mmap_offset(gem_obj);
+	if (ret) {
+		drm_gem_object_release(gem_obj);
+		goto error;
+	}
+
+	cma_obj->paddr = 0;
+	cma_obj->sgt = sgt;
+
+	return gem_obj;
+
+error:
+	kfree(cma_obj);
+	return ERR_PTR(ret);
+}
+
 int rcar_du_dumb_create(struct drm_file *file, struct drm_device *dev,
 			struct drm_mode_create_dumb *args)
 {
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.h b/drivers/gpu/drm/rcar-du/rcar_du_kms.h
index 07951d5fe38b..10b2bb0f0df9 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_kms.h
+++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.h
@@ -16,10 +16,13 @@
 
 #include <linux/types.h>
 
+struct dma_buf_attachment;
 struct drm_file;
 struct drm_device;
+struct drm_gem_object;
 struct drm_mode_create_dumb;
 struct rcar_du_device;
+struct sg_table;
 
 struct rcar_du_format_info {
 	u32 fourcc;
@@ -36,4 +39,8 @@ int rcar_du_modeset_init(struct rcar_du_device *rcdu);
 int rcar_du_dumb_create(struct drm_file *file, struct drm_device *dev,
 			struct drm_mode_create_dumb *args);
 
+struct drm_gem_object *rcar_du_gem_prime_import_sg_table(struct drm_device *dev,
+				struct dma_buf_attachment *attach,
+				struct sg_table *sgt);
+
 #endif /* __RCAR_DU_KMS_H__ */
-- 
Regards,

Laurent Pinchart

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC 1/2] drm: rcar-du: Set the DMA coherent mask for the DU device
  2017-11-13 10:32 ` [PATCH/RFC 1/2] drm: rcar-du: Set the DMA coherent mask for the DU device Laurent Pinchart
@ 2017-11-13 11:44   ` Liviu Dudau
  0 siblings, 0 replies; 10+ messages in thread
From: Liviu Dudau @ 2017-11-13 11:44 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: dri-devel, linux-renesas-soc, Kieran Bingham, Daniel Vetter

On Mon, Nov 13, 2017 at 12:32:27PM +0200, Laurent Pinchart wrote:
> The DU DMA address space is limited to 32 bits, so the DMA coherent mask
> should be set accordingly. The DMA mapping implementation will
> transparently map high memory buffers to 32-bit addresses through an
> IOMMU when present (or through bounce buffers otherwise, which isn't a
> supported use case as performances would be terrible).
> 
> However, when sourcing frames from a VSP, the situation is more
> complicated. The DU delegates all memory accesses to the VSP and doesn't
> perform any DMA access by itself. Due to how the GEM CMA helpers are
> structured buffers are still mapped to the DU device. They are later
> mapped to the VSP as well to perform DMA access, through the IOMMU
> connected to the VSP.
> 
> Setting the DMA coherent mask to 32 bits for the DU when using a VSP can
> cause issues when importing a dma_buf. If the buffer is located above
> the 32-bit address space, the DMA mapping implementation will try to map
> it to the DU's DMA address space. As the DU has no IOMMU a bounce buffer
> will be allocated, which in the best case will waste memory and in the
> worst case will just fail.
> 
> To work around this issue, set the DMA coherent mask to the full 40-bit
> address space for the DU. All dma-buf instances will be imported without
> any restriction, and will be mapped to the VSP when preparing the
> associated framebuffer.

This does indeed look like a work around, but I can't see anything wrong
with it.

Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>

> 
> Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
> ---
>  drivers/gpu/drm/rcar-du/rcar_du_drv.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/gpu/drm/rcar-du/rcar_du_drv.c b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> index 6e02c762a557..48c166f925a3 100644
> --- a/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> +++ b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> @@ -370,6 +370,7 @@ static int rcar_du_probe(struct platform_device *pdev)
>  	struct rcar_du_device *rcdu;
>  	struct drm_device *ddev;
>  	struct resource *mem;
> +	unsigned int mask;
>  	int ret;
>  
>  	/* Allocate and initialize the R-Car device structure. */
> @@ -388,6 +389,16 @@ static int rcar_du_probe(struct platform_device *pdev)
>  	if (IS_ERR(rcdu->mmio))
>  		return PTR_ERR(rcdu->mmio);
>  
> +	/*
> +	 * Set the DMA coherent mask to reflect the DU 32-bit DMA address space
> +	 * limitations. When sourcing frames from a VSP the DU doesn't perform
> +	 * any memory access so set the mask to 40 bits to accept all buffers.
> +	 */
> +	mask = rcar_du_has(rcdu, RCAR_DU_FEATURE_VSP1_SOURCE) ? 40 : 32;
> +	ret = dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(mask));
> +	if (ret)
> +		return ret;
> +
>  	/* DRM/KMS objects */
>  	ddev = drm_dev_alloc(&rcar_du_driver, &pdev->dev);
>  	if (IS_ERR(ddev))
> -- 
> Regards,
> 
> Laurent Pinchart
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC 2/2] drm: rcar-du: Allow importing non-contiguous dma-buf with VSP
  2017-11-13 10:32 ` [PATCH/RFC 2/2] drm: rcar-du: Allow importing non-contiguous dma-buf with VSP Laurent Pinchart
@ 2017-11-13 11:53   ` Liviu Dudau
  2017-11-14  3:34     ` Laurent Pinchart
  2017-12-13 22:46   ` [PATCH/RFC v1.1] " Laurent Pinchart
  1 sibling, 1 reply; 10+ messages in thread
From: Liviu Dudau @ 2017-11-13 11:53 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: dri-devel, linux-renesas-soc, Kieran Bingham, Daniel Vetter

On Mon, Nov 13, 2017 at 12:32:28PM +0200, Laurent Pinchart wrote:
> When the DU sources its frames from a VSP, it performs no memory access
> and thus has no requirements on imported dma-buf memory types. In
> particular the DU could import a physically non-contiguous buffer that
> would later be mapped contiguously through the VSP IOMMU.
> 
> This use case isn't supported at the moment as the GEM CMA helpers will
> reject any non-contiguous buffer, and the DU isn't connected to an IOMMU
> that can make the buffer contiguous for DMA. Fix this by implementing a
> custom .gem_prime_import_sg_table() operation that accepts all imported
> dma-buf regardless of the number of scatterlist entries.

This patch raises the question of why use CMA at all if you can accept
any kind of buffers.

> 
> Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
> ---
>  drivers/gpu/drm/rcar-du/rcar_du_drv.c |  2 +-
>  drivers/gpu/drm/rcar-du/rcar_du_kms.c | 39 +++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/rcar-du/rcar_du_kms.h |  7 +++++++
>  3 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/rcar-du/rcar_du_drv.c b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> index 48c166f925a3..d999231f98c7 100644
> --- a/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> +++ b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> @@ -289,7 +289,7 @@ static struct drm_driver rcar_du_driver = {
>  	.gem_prime_import	= drm_gem_prime_import,
>  	.gem_prime_export	= drm_gem_prime_export,
>  	.gem_prime_get_sg_table	= drm_gem_cma_prime_get_sg_table,
> -	.gem_prime_import_sg_table = drm_gem_cma_prime_import_sg_table,
> +	.gem_prime_import_sg_table = rcar_du_gem_prime_import_sg_table,
>  	.gem_prime_vmap		= drm_gem_cma_prime_vmap,
>  	.gem_prime_vunmap	= drm_gem_cma_prime_vunmap,
>  	.gem_prime_mmap		= drm_gem_cma_prime_mmap,
> diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
> index 566d1a948c8f..2dd0c2ba047d 100644
> --- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c
> +++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
> @@ -20,6 +20,7 @@
>  #include <drm/drm_gem_cma_helper.h>
>  #include <drm/drm_gem_framebuffer_helper.h>
>  
> +#include <linux/dma-buf.h>
>  #include <linux/of_graph.h>
>  #include <linux/wait.h>
>  
> @@ -148,6 +149,44 @@ const struct rcar_du_format_info *rcar_du_format_info(u32 fourcc)
>   * Frame buffer
>   */
>  
> +struct drm_gem_object *rcar_du_gem_prime_import_sg_table(struct drm_device *dev,
> +				struct dma_buf_attachment *attach,
> +				struct sg_table *sgt)
> +{
> +	struct rcar_du_device *rcdu = dev->dev_private;
> +	struct drm_gem_cma_object *cma_obj;
> +	struct drm_gem_object *gem_obj;
> +	int ret;
> +
> +	if (!rcar_du_has(rcdu, RCAR_DU_FEATURE_VSP1_SOURCE))
> +		return drm_gem_cma_prime_import_sg_table(dev, attach, sgt);
> +
> +	/* Create a CMA GEM buffer. */
> +	cma_obj = kzalloc(sizeof(*cma_obj), GFP_KERNEL);
> +	if (!cma_obj)
> +		return ERR_PTR(-ENOMEM);
> +	gem_obj = &cma_obj->base;
> +
> +	ret = drm_gem_object_init(dev, gem_obj, attach->dmabuf->size);
> +	if (ret)
> +		goto error;
> +
> +	ret = drm_gem_create_mmap_offset(gem_obj);
> +	if (ret) {
> +		drm_gem_object_release(gem_obj);
> +		goto error;
> +	}
> +
> +	cma_obj->paddr = 0;

This is going to break drm_gem_cma_describe() if you are using it plus
the rcar_du_plane_setup_scanout() unless I'm missing something besides
familiarity with the RCAR driver code :)

This function looks very similar to what I tried to do for mali-dp to
allow the import of contiguous DMA buffers that have more than 1 sgt
entries. In the end I gave up as I kept finding issues and went for the
drm_gem_cma_prime_import_sg_table() changes. Maybe you need to do a
similar change in the function to bypass some requirements if the driver
signals that it can accept relaxed requirements?

Best regards,
Liviu

> +	cma_obj->sgt = sgt;
> +
> +	return gem_obj;
> +
> +error:
> +	kfree(cma_obj);
> +	return ERR_PTR(ret);
> +}
> +
>  int rcar_du_dumb_create(struct drm_file *file, struct drm_device *dev,
>  			struct drm_mode_create_dumb *args)
>  {
> diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.h b/drivers/gpu/drm/rcar-du/rcar_du_kms.h
> index 07951d5fe38b..10b2bb0f0df9 100644
> --- a/drivers/gpu/drm/rcar-du/rcar_du_kms.h
> +++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.h
> @@ -16,10 +16,13 @@
>  
>  #include <linux/types.h>
>  
> +struct dma_buf_attachment;
>  struct drm_file;
>  struct drm_device;
> +struct drm_gem_object;
>  struct drm_mode_create_dumb;
>  struct rcar_du_device;
> +struct sg_table;
>  
>  struct rcar_du_format_info {
>  	u32 fourcc;
> @@ -36,4 +39,8 @@ int rcar_du_modeset_init(struct rcar_du_device *rcdu);
>  int rcar_du_dumb_create(struct drm_file *file, struct drm_device *dev,
>  			struct drm_mode_create_dumb *args);
>  
> +struct drm_gem_object *rcar_du_gem_prime_import_sg_table(struct drm_device *dev,
> +				struct dma_buf_attachment *attach,
> +				struct sg_table *sgt);
> +
>  #endif /* __RCAR_DU_KMS_H__ */
> -- 
> Regards,
> 
> Laurent Pinchart
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC 0/2] R-Car DU: Support importing non-contiguous dma-buf with VSP
  2017-11-13 10:32 [PATCH/RFC 0/2] R-Car DU: Support importing non-contiguous dma-buf with VSP Laurent Pinchart
  2017-11-13 10:32 ` [PATCH/RFC 1/2] drm: rcar-du: Set the DMA coherent mask for the DU device Laurent Pinchart
  2017-11-13 10:32 ` [PATCH/RFC 2/2] drm: rcar-du: Allow importing non-contiguous dma-buf with VSP Laurent Pinchart
@ 2017-11-13 13:48 ` Kieran Bingham
  2018-12-18  7:40   ` Laurent Pinchart
  2 siblings, 1 reply; 10+ messages in thread
From: Kieran Bingham @ 2017-11-13 13:48 UTC (permalink / raw)
  To: Laurent Pinchart, dri-devel; +Cc: linux-renesas-soc, Liviu Dudau, Daniel Vetter

Hi Laurent,

On 13/11/17 10:32, Laurent Pinchart wrote:
> Hello everybody,
> 
> This patch series fixes two issues related to dma-buf import for the Renesas
> R-Car DU when the imported buffer is either not physically contiguous or
> located in high memory.
> 
> Both cases require the use of an IOMMU to remap the buffers contiguously and
> in 32-bit DMA space. On Gen2 platforms this is transparent and works already,
> but on Gen3 platforms the situation is more complex.
> 
> The Gen3 DU doesn't perform DMA access directly but instead goes through a VSP
> IP core that acts as an external composer. When importing a dma-buf buffer the
> GEM PRIME helper drm_gem_prime_import() maps the buffer to the DU device, and
> the DU driver later remaps it to the VSP device. The driver unfortunately
> can't use the drm_gem_prime_import_dev() helper to map th buffer to the VSP
> device directly, as at import time we don't know yet to which VSP device the
> buffer will need to be mapped (there is one VSP composer per CRTC). Mapping
> the buffer to the VSP can thus only be done when pinning the framebuffer, as
> we know at that time which plane and CRTC it will be used with.
> 
> This works currently (with the caveat that an unneeded mapping to the DU is
> created), but fails when the imported buffer is not physically contiguous or
> is located in high memory. In the first case the GEM CMA helper
> drm_gem_cma_prime_import_sg_table() will reject the buffer as it hasn't been
> mapped contiguously in the DU DMA address space (because the DU has no IOMMU),
> and in the second case the DMA mapping API will try to allocate a bounce
> buffer and fail due to bounce buffer memory exhaustion.
> 
> The first patch in this series fixes the second issue by setting the DMA
> coherent mask of the DU device to the full 40 bits address space. All memory
> will thus be accepted without trying to allocate a bounce buffer.
> 
> The second patch in this series fixes the first issue by using an internal
> copy of the drm_gem_cma_prime_import_sg_table() function that doesn't ensure
> that the buffer is contiguous in memory. This is what caused me to post this
> series as an RFC, as I'm not very happy with duplication of helper code in the
> driver that could lead to breakages when the GEM CMA helpers are modified. If
> this approach is accepted, I would prefer implementing the code in the GEM CMA
> helpers.
> 
> Another point that bothers me is that buffers can now be imported
> unconditionally on Gen3 platforms, but atomic commits will fail if the buffer
> can't be remapped through the VSP (for instance if no IOMMU is available).
> Given the hardware architecture and the DRM/KMS API I don't really see a way
> around this.
> 
> Testing this series isn't easy as it requires getting hold of exported
> dma-bufs located in high memory or not physically contiguous. I have used the
> V4L2 vivid driver as an exporter for that purpose, with a few hacks that can
> be found at
> 
> 	git://linuxtv.org/pinchartl/media.git drm/du/devel/iommu
> 

Ok - testing from this branch with "drm: rcar-du: Allow importing non-contiguous
dma-buf with VSP" applied on top (it appeared to be missing)

> On the userspace side I have used the v4l2-drm-example test application
> available at
> 
> 	git://git.infradead.org/users/kmpark/public-apps
> 
> and run with
> 
> dmabuf-sharing -M rcar-du -i /dev/video0 -S 640,480 -f YUYV -F YUYV \
> 	-s 640,480@0,0 -t 640,480@0,0 -o 70:68:640x480 -b 4 -e v4l2
> 
> (the -o argument needs to be modified based on the connector and CRTC ID).

I've build dmabuf-sharing from kmpark/public-apps/v4l2-drm-example, but it
doesn't appear to have a "-e v4l2" option (not listed in the -h, and complains
with "ERROR(dmabuf-sharing.c:560) : failed to parse arguments")

Have you also modified this application to run your tests?

If it's easier, I can look at the screen while you run the test under your
control as well.


> Up to patch "[DEBUG] v4l: videobuf2-dma-contig: Print allocated buffer
> address" buffer sharing should work. Patch "[HACK] vivid: Allocate buffers
> from high memory" then breaks buffer import, and the issue is fixed by patch
> "drm: rcar-du: Set the DMA coherent mask for the DU device". Patch "[HACK]
> vivid: Use vmalloc allocator" breaks buffer import again, and is fixed by
> "drm: rcar-du: Allow importing non-contiguous dma-buf with VSP".
> 
> Kieran, I have tested this remotely on your Salvator-X H3 ES1.1 and haven't
> noticed any problem, but couldn't check the output on the screen. Would you be
> able to rerun the test locally for me ? Please note that the IPMMU is known to
> have issues on H3 ES1.1, so you might need to test the code on H3 ES2.0
> instead.
> 
> Laurent Pinchart (2):
>   drm: rcar-du: Set the DMA coherent mask for the DU device
>   drm: rcar-du: Allow importing non-contiguous dma-buf with VSP
> 
>  drivers/gpu/drm/rcar-du/rcar_du_drv.c | 13 +++++++++++-
>  drivers/gpu/drm/rcar-du/rcar_du_kms.c | 39 +++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/rcar-du/rcar_du_kms.h |  7 +++++++
>  3 files changed, 58 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC 2/2] drm: rcar-du: Allow importing non-contiguous dma-buf with VSP
  2017-11-13 11:53   ` Liviu Dudau
@ 2017-11-14  3:34     ` Laurent Pinchart
  2017-11-14  9:27       ` Liviu Dudau
  0 siblings, 1 reply; 10+ messages in thread
From: Laurent Pinchart @ 2017-11-14  3:34 UTC (permalink / raw)
  To: Liviu Dudau
  Cc: Laurent Pinchart, dri-devel, linux-renesas-soc, Kieran Bingham,
	Daniel Vetter

Hi Liviu,

On Monday, 13 November 2017 13:53:14 EET Liviu Dudau wrote:
> On Mon, Nov 13, 2017 at 12:32:28PM +0200, Laurent Pinchart wrote:
> > When the DU sources its frames from a VSP, it performs no memory access
> > and thus has no requirements on imported dma-buf memory types. In
> > particular the DU could import a physically non-contiguous buffer that
> > would later be mapped contiguously through the VSP IOMMU.
> > 
> > This use case isn't supported at the moment as the GEM CMA helpers will
> > reject any non-contiguous buffer, and the DU isn't connected to an IOMMU
> > that can make the buffer contiguous for DMA. Fix this by implementing a
> > custom .gem_prime_import_sg_table() operation that accepts all imported
> > dma-buf regardless of the number of scatterlist entries.
> 
> This patch raises the question of why use CMA at all if you can accept
> any kind of buffers.

Both the DU and VSP require DMA contiguous memory. On R-Car Gen2 the DU 
performs DMA, and thus uses the GEM CMA helpers. On Gen3 it delegates DMA to 
the external VSP composer, and still uses the GEM CMA helpers to ensure that 
memory will be DMA contiguous for the VSP. The problem arises when physically 
non-contiguous dma-buf buffers are imported, they can be mapped contiguously 
to the VSP (assuming an IOMMU is present) but not to the DU (as there's no 
IOMMU).

The situation is particularly messy due to the fact that I have one VSP 
instance per CRTC, each connected to its own IOMMU channel. The driver thus 
doesn't know when allocating GEM objects to which struct device they need to 
be mapped. The DRM helpers don't support delayed mapping of memory. I'd like 
to know whether that's something I should fix properly (which would likely 
involve major reworking of the helpers) or hack around it in the DU driver.

> > Signed-off-by: Laurent Pinchart
> > <laurent.pinchart+renesas@ideasonboard.com>
> > ---
> > 
> >  drivers/gpu/drm/rcar-du/rcar_du_drv.c |  2 +-
> >  drivers/gpu/drm/rcar-du/rcar_du_kms.c | 39 ++++++++++++++++++++++++++++++
> >  drivers/gpu/drm/rcar-du/rcar_du_kms.h |  7 +++++++
> >  3 files changed, 47 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> > b/drivers/gpu/drm/rcar-du/rcar_du_drv.c index 48c166f925a3..d999231f98c7
> > 100644
> > --- a/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> > +++ b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> > @@ -289,7 +289,7 @@ static struct drm_driver rcar_du_driver = {
> >  	.gem_prime_import	= drm_gem_prime_import,
> >  	.gem_prime_export	= drm_gem_prime_export,
> >  	.gem_prime_get_sg_table	= drm_gem_cma_prime_get_sg_table,
> > -	.gem_prime_import_sg_table = drm_gem_cma_prime_import_sg_table,
> > +	.gem_prime_import_sg_table = rcar_du_gem_prime_import_sg_table,
> >  	.gem_prime_vmap		= drm_gem_cma_prime_vmap,
> >  	.gem_prime_vunmap	= drm_gem_cma_prime_vunmap,
> >  	.gem_prime_mmap		= drm_gem_cma_prime_mmap,
> > diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c
> > b/drivers/gpu/drm/rcar-du/rcar_du_kms.c index 566d1a948c8f..2dd0c2ba047d
> > 100644
> > --- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c
> > +++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c

[snip]

> > +struct drm_gem_object *rcar_du_gem_prime_import_sg_table(struct
> > drm_device *dev,
> > +				struct dma_buf_attachment *attach,
> > +				struct sg_table *sgt)
> > +{
> > +	struct rcar_du_device *rcdu = dev->dev_private;
> > +	struct drm_gem_cma_object *cma_obj;
> > +	struct drm_gem_object *gem_obj;
> > +	int ret;
> > +
> > +	if (!rcar_du_has(rcdu, RCAR_DU_FEATURE_VSP1_SOURCE))
> > +		return drm_gem_cma_prime_import_sg_table(dev, attach, sgt);
> > +
> > +	/* Create a CMA GEM buffer. */
> > +	cma_obj = kzalloc(sizeof(*cma_obj), GFP_KERNEL);
> > +	if (!cma_obj)
> > +		return ERR_PTR(-ENOMEM);
> > +	gem_obj = &cma_obj->base;
> > +
> > +	ret = drm_gem_object_init(dev, gem_obj, attach->dmabuf->size);
> > +	if (ret)
> > +		goto error;
> > +
> > +	ret = drm_gem_create_mmap_offset(gem_obj);
> > +	if (ret) {
> > +		drm_gem_object_release(gem_obj);
> > +		goto error;
> > +	}
> > +
> > +	cma_obj->paddr = 0;
> 
> This is going to break drm_gem_cma_describe() if you are using it

As far as I can tell drm_gem_cma_describe() will just print paddr = 0, it 
won't crash.

> plus the rcar_du_plane_setup_scanout() unless I'm missing something besides
> familiarity with the RCAR driver code :)

rcar_du_plane_setup_scanout() is only called when !rcar_du_has(rcdu, 
RCAR_DU_FEATURE_VSP1_SOURCE) (that is on Gen3 hardware, or on Gen2 when I 
artificially disable all DMA from the DU and use external composers to emulate 
Gen3 behaviour for testing purpose), in which case this function calls 
drm_gem_cma_prime_import_sg_table().

> This function looks very similar to what I tried to do for mali-dp to
> allow the import of contiguous DMA buffers that have more than 1 sgt
> entries. In the end I gave up as I kept finding issues and went for the
> drm_gem_cma_prime_import_sg_table() changes. Maybe you need to do a
> similar change in the function to bypass some requirements if the driver
> signals that it can accept relaxed requirements?

As explained above I've considered reworking the helpers (to do it properly I 
think I'll likely need to rework drm_prime.c too), but I'd first like to know 
whether there's an interest for that.

> > +	cma_obj->sgt = sgt;
> > +
> > +	return gem_obj;
> > +
> > +error:
> > +	kfree(cma_obj);
> > +	return ERR_PTR(ret);
> > +}

[snip]

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC 2/2] drm: rcar-du: Allow importing non-contiguous dma-buf with VSP
  2017-11-14  3:34     ` Laurent Pinchart
@ 2017-11-14  9:27       ` Liviu Dudau
  0 siblings, 0 replies; 10+ messages in thread
From: Liviu Dudau @ 2017-11-14  9:27 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Laurent Pinchart, dri-devel, linux-renesas-soc, Kieran Bingham,
	Daniel Vetter

On Tue, Nov 14, 2017 at 05:34:14AM +0200, Laurent Pinchart wrote:
> Hi Liviu,

Hi Laurent,

> 
> On Monday, 13 November 2017 13:53:14 EET Liviu Dudau wrote:
> > On Mon, Nov 13, 2017 at 12:32:28PM +0200, Laurent Pinchart wrote:
> > > When the DU sources its frames from a VSP, it performs no memory access
> > > and thus has no requirements on imported dma-buf memory types. In
> > > particular the DU could import a physically non-contiguous buffer that
> > > would later be mapped contiguously through the VSP IOMMU.
> > > 
> > > This use case isn't supported at the moment as the GEM CMA helpers will
> > > reject any non-contiguous buffer, and the DU isn't connected to an IOMMU
> > > that can make the buffer contiguous for DMA. Fix this by implementing a
> > > custom .gem_prime_import_sg_table() operation that accepts all imported
> > > dma-buf regardless of the number of scatterlist entries.
> > 
> > This patch raises the question of why use CMA at all if you can accept
> > any kind of buffers.
> 
> Both the DU and VSP require DMA contiguous memory. On R-Car Gen2 the DU 
> performs DMA, and thus uses the GEM CMA helpers. On Gen3 it delegates DMA to 
> the external VSP composer, and still uses the GEM CMA helpers to ensure that 
> memory will be DMA contiguous for the VSP. The problem arises when physically 
> non-contiguous dma-buf buffers are imported, they can be mapped contiguously 
> to the VSP (assuming an IOMMU is present) but not to the DU (as there's no 
> IOMMU).
> 
> The situation is particularly messy due to the fact that I have one VSP 
> instance per CRTC, each connected to its own IOMMU channel. The driver thus 
> doesn't know when allocating GEM objects to which struct device they need to 
> be mapped. The DRM helpers don't support delayed mapping of memory. I'd like 
> to know whether that's something I should fix properly (which would likely 
> involve major reworking of the helpers) or hack around it in the DU driver.

I understand. Looks like you have your hands full with an "interesting"
architecture.

> 
> > > Signed-off-by: Laurent Pinchart
> > > <laurent.pinchart+renesas@ideasonboard.com>
> > > ---
> > > 
> > >  drivers/gpu/drm/rcar-du/rcar_du_drv.c |  2 +-
> > >  drivers/gpu/drm/rcar-du/rcar_du_kms.c | 39 ++++++++++++++++++++++++++++++
> > >  drivers/gpu/drm/rcar-du/rcar_du_kms.h |  7 +++++++
> > >  3 files changed, 47 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> > > b/drivers/gpu/drm/rcar-du/rcar_du_drv.c index 48c166f925a3..d999231f98c7
> > > 100644
> > > --- a/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> > > +++ b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
> > > @@ -289,7 +289,7 @@ static struct drm_driver rcar_du_driver = {
> > >  	.gem_prime_import	= drm_gem_prime_import,
> > >  	.gem_prime_export	= drm_gem_prime_export,
> > >  	.gem_prime_get_sg_table	= drm_gem_cma_prime_get_sg_table,
> > > -	.gem_prime_import_sg_table = drm_gem_cma_prime_import_sg_table,
> > > +	.gem_prime_import_sg_table = rcar_du_gem_prime_import_sg_table,
> > >  	.gem_prime_vmap		= drm_gem_cma_prime_vmap,
> > >  	.gem_prime_vunmap	= drm_gem_cma_prime_vunmap,
> > >  	.gem_prime_mmap		= drm_gem_cma_prime_mmap,
> > > diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c
> > > b/drivers/gpu/drm/rcar-du/rcar_du_kms.c index 566d1a948c8f..2dd0c2ba047d
> > > 100644
> > > --- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c
> > > +++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
> 
> [snip]
> 
> > > +struct drm_gem_object *rcar_du_gem_prime_import_sg_table(struct
> > > drm_device *dev,
> > > +				struct dma_buf_attachment *attach,
> > > +				struct sg_table *sgt)
> > > +{
> > > +	struct rcar_du_device *rcdu = dev->dev_private;
> > > +	struct drm_gem_cma_object *cma_obj;
> > > +	struct drm_gem_object *gem_obj;
> > > +	int ret;
> > > +
> > > +	if (!rcar_du_has(rcdu, RCAR_DU_FEATURE_VSP1_SOURCE))
> > > +		return drm_gem_cma_prime_import_sg_table(dev, attach, sgt);
> > > +
> > > +	/* Create a CMA GEM buffer. */
> > > +	cma_obj = kzalloc(sizeof(*cma_obj), GFP_KERNEL);
> > > +	if (!cma_obj)
> > > +		return ERR_PTR(-ENOMEM);
> > > +	gem_obj = &cma_obj->base;
> > > +
> > > +	ret = drm_gem_object_init(dev, gem_obj, attach->dmabuf->size);
> > > +	if (ret)
> > > +		goto error;
> > > +
> > > +	ret = drm_gem_create_mmap_offset(gem_obj);
> > > +	if (ret) {
> > > +		drm_gem_object_release(gem_obj);
> > > +		goto error;
> > > +	}
> > > +
> > > +	cma_obj->paddr = 0;
> > 
> > This is going to break drm_gem_cma_describe() if you are using it
> 
> As far as I can tell drm_gem_cma_describe() will just print paddr = 0, it 
> won't crash.

I'm sorry, didn't meant it will crash, just print (possibly useless)
zero value.

> 
> > plus the rcar_du_plane_setup_scanout() unless I'm missing something besides
> > familiarity with the RCAR driver code :)
> 
> rcar_du_plane_setup_scanout() is only called when !rcar_du_has(rcdu, 
> RCAR_DU_FEATURE_VSP1_SOURCE) (that is on Gen3 hardware, or on Gen2 when I 
> artificially disable all DMA from the DU and use external composers to emulate 
> Gen3 behaviour for testing purpose), in which case this function calls 
> drm_gem_cma_prime_import_sg_table().

Thanks for the explanation!

> 
> > This function looks very similar to what I tried to do for mali-dp to
> > allow the import of contiguous DMA buffers that have more than 1 sgt
> > entries. In the end I gave up as I kept finding issues and went for the
> > drm_gem_cma_prime_import_sg_table() changes. Maybe you need to do a
> > similar change in the function to bypass some requirements if the driver
> > signals that it can accept relaxed requirements?
> 
> As explained above I've considered reworking the helpers (to do it properly I 
> think I'll likely need to rework drm_prime.c too), but I'd first like to know 
> whether there's an interest for that.

Probably if you can express somehow the fact that the DU doesn't do any
memory access it might be interesting to other drivers. Otherwise you're
probably tied to having a driver specific version of the hook.

Best regards,
Liviu

> 
> > > +	cma_obj->sgt = sgt;
> > > +
> > > +	return gem_obj;
> > > +
> > > +error:
> > > +	kfree(cma_obj);
> > > +	return ERR_PTR(ret);
> > > +}
> 
> [snip]
> 
> -- 
> Regards,
> 
> Laurent Pinchart
> 

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH/RFC v1.1] drm: rcar-du: Allow importing non-contiguous dma-buf with VSP
  2017-11-13 10:32 ` [PATCH/RFC 2/2] drm: rcar-du: Allow importing non-contiguous dma-buf with VSP Laurent Pinchart
  2017-11-13 11:53   ` Liviu Dudau
@ 2017-12-13 22:46   ` Laurent Pinchart
  1 sibling, 0 replies; 10+ messages in thread
From: Laurent Pinchart @ 2017-12-13 22:46 UTC (permalink / raw)
  To: dri-devel; +Cc: linux-renesas-soc, Daniel Vetter, Liviu Dudau, Kieran Bingham

When the DU sources its frames from a VSP, it performs no memory access
and thus has no requirements on imported dma-buf memory types. In
particular the DU could import a physically non-contiguous buffer that
would later be mapped contiguously through the VSP IOMMU.

This use case isn't supported at the moment as the GEM CMA helpers will
reject any non-contiguous buffer, and the DU isn't connected to an IOMMU
that can make the buffer contiguous for DMA. Fix this by implementing a
custom .gem_prime_import_sg_table() operation that accepts all imported
dma-buf regardless of the number of scatterlist entries.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
---
Changes since v1:

- Duplicate the imported scatter gather table in
  rcar_du_vsp_plane_prepare_fb()

This patch fixes a bug of the previous version and is posted in case anyone
would like to test the implementation. I still plan to give Noralf's GEM
library a try as an alternative approach to this series.

 drivers/gpu/drm/rcar-du/rcar_du_drv.c |  2 +-
 drivers/gpu/drm/rcar-du/rcar_du_kms.c | 39 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/rcar-du/rcar_du_kms.h |  7 +++++++
 drivers/gpu/drm/rcar-du/rcar_du_vsp.c | 34 ++++++++++++++++++++++++++----
 4 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/rcar-du/rcar_du_drv.c b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
index 48c166f925a3..d999231f98c7 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_drv.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_drv.c
@@ -289,7 +289,7 @@ static struct drm_driver rcar_du_driver = {
 	.gem_prime_import	= drm_gem_prime_import,
 	.gem_prime_export	= drm_gem_prime_export,
 	.gem_prime_get_sg_table	= drm_gem_cma_prime_get_sg_table,
-	.gem_prime_import_sg_table = drm_gem_cma_prime_import_sg_table,
+	.gem_prime_import_sg_table = rcar_du_gem_prime_import_sg_table,
 	.gem_prime_vmap		= drm_gem_cma_prime_vmap,
 	.gem_prime_vunmap	= drm_gem_cma_prime_vunmap,
 	.gem_prime_mmap		= drm_gem_cma_prime_mmap,
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.c b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
index 566d1a948c8f..2dd0c2ba047d 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_kms.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.c
@@ -20,6 +20,7 @@
 #include <drm/drm_gem_cma_helper.h>
 #include <drm/drm_gem_framebuffer_helper.h>
 
+#include <linux/dma-buf.h>
 #include <linux/of_graph.h>
 #include <linux/wait.h>
 
@@ -148,6 +149,44 @@ const struct rcar_du_format_info *rcar_du_format_info(u32 fourcc)
  * Frame buffer
  */
 
+struct drm_gem_object *rcar_du_gem_prime_import_sg_table(struct drm_device *dev,
+				struct dma_buf_attachment *attach,
+				struct sg_table *sgt)
+{
+	struct rcar_du_device *rcdu = dev->dev_private;
+	struct drm_gem_cma_object *cma_obj;
+	struct drm_gem_object *gem_obj;
+	int ret;
+
+	if (!rcar_du_has(rcdu, RCAR_DU_FEATURE_VSP1_SOURCE))
+		return drm_gem_cma_prime_import_sg_table(dev, attach, sgt);
+
+	/* Create a CMA GEM buffer. */
+	cma_obj = kzalloc(sizeof(*cma_obj), GFP_KERNEL);
+	if (!cma_obj)
+		return ERR_PTR(-ENOMEM);
+	gem_obj = &cma_obj->base;
+
+	ret = drm_gem_object_init(dev, gem_obj, attach->dmabuf->size);
+	if (ret)
+		goto error;
+
+	ret = drm_gem_create_mmap_offset(gem_obj);
+	if (ret) {
+		drm_gem_object_release(gem_obj);
+		goto error;
+	}
+
+	cma_obj->paddr = 0;
+	cma_obj->sgt = sgt;
+
+	return gem_obj;
+
+error:
+	kfree(cma_obj);
+	return ERR_PTR(ret);
+}
+
 int rcar_du_dumb_create(struct drm_file *file, struct drm_device *dev,
 			struct drm_mode_create_dumb *args)
 {
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_kms.h b/drivers/gpu/drm/rcar-du/rcar_du_kms.h
index 07951d5fe38b..10b2bb0f0df9 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_kms.h
+++ b/drivers/gpu/drm/rcar-du/rcar_du_kms.h
@@ -16,10 +16,13 @@
 
 #include <linux/types.h>
 
+struct dma_buf_attachment;
 struct drm_file;
 struct drm_device;
+struct drm_gem_object;
 struct drm_mode_create_dumb;
 struct rcar_du_device;
+struct sg_table;
 
 struct rcar_du_format_info {
 	u32 fourcc;
@@ -36,4 +39,8 @@ int rcar_du_modeset_init(struct rcar_du_device *rcdu);
 int rcar_du_dumb_create(struct drm_file *file, struct drm_device *dev,
 			struct drm_mode_create_dumb *args);
 
+struct drm_gem_object *rcar_du_gem_prime_import_sg_table(struct drm_device *dev,
+				struct dma_buf_attachment *attach,
+				struct sg_table *sgt);
+
 #endif /* __RCAR_DU_KMS_H__ */
diff --git a/drivers/gpu/drm/rcar-du/rcar_du_vsp.c b/drivers/gpu/drm/rcar-du/rcar_du_vsp.c
index 2c260c33840b..73fdc814aa39 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_vsp.c
+++ b/drivers/gpu/drm/rcar-du/rcar_du_vsp.c
@@ -224,10 +224,36 @@ static int rcar_du_vsp_plane_prepare_fb(struct drm_plane *plane,
 			drm_fb_cma_get_gem_obj(state->fb, i);
 		struct sg_table *sgt = &rstate->sg_tables[i];
 
-		ret = dma_get_sgtable(rcdu->dev, sgt, gem->vaddr, gem->paddr,
-				      gem->base.size);
-		if (ret)
-			goto fail;
+		if (gem->sgt) {
+			struct scatterlist *src;
+			struct scatterlist *dst;
+
+			/*
+			 * If the GEM buffer has a scatter gather table, it has
+			 * been imported from a dma-buf and has no physical
+			 * address as it might not be physically contiguous.
+			 * Copy the original scatter gather table to map it to
+			 * the VSP.
+			 */
+			ret = sg_alloc_table(sgt, gem->sgt->orig_nents,
+					     GFP_KERNEL);
+			if (ret)
+				goto fail;
+
+			src = gem->sgt->sgl;
+			dst = sgt->sgl;
+			for (i = 0; i < gem->sgt->orig_nents; ++i) {
+				sg_set_page(dst, sg_page(src), src->length,
+					    src->offset);
+				src = sg_next(src);
+				dst = sg_next(dst);
+			}
+		} else {
+			ret = dma_get_sgtable(rcdu->dev, sgt, gem->vaddr,
+					      gem->paddr, gem->base.size);
+			if (ret)
+				goto fail;
+		}
 
 		ret = vsp1_du_map_sg(vsp->vsp, sgt);
 		if (!ret) {
-- 
Regards,

Laurent Pinchart

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC 0/2] R-Car DU: Support importing non-contiguous dma-buf with VSP
  2017-11-13 13:48 ` [PATCH/RFC 0/2] R-Car DU: Support " Kieran Bingham
@ 2018-12-18  7:40   ` Laurent Pinchart
  0 siblings, 0 replies; 10+ messages in thread
From: Laurent Pinchart @ 2018-12-18  7:40 UTC (permalink / raw)
  To: dri-devel
  Cc: Kieran Bingham, Laurent Pinchart, linux-renesas-soc,
	Daniel Vetter, Liviu Dudau

Hi Kieran,

On Monday, 13 November 2017 15:48:19 EET Kieran Bingham wrote:
> On 13/11/17 10:32, Laurent Pinchart wrote:
> > Hello everybody,
> > 
> > This patch series fixes two issues related to dma-buf import for the
> > Renesas R-Car DU when the imported buffer is either not physically
> > contiguous or located in high memory.
> > 
> > Both cases require the use of an IOMMU to remap the buffers contiguously
> > and in 32-bit DMA space. On Gen2 platforms this is transparent and works
> > already, but on Gen3 platforms the situation is more complex.
> > 
> > The Gen3 DU doesn't perform DMA access directly but instead goes through a
> > VSP IP core that acts as an external composer. When importing a dma-buf
> > buffer the GEM PRIME helper drm_gem_prime_import() maps the buffer to the
> > DU device, and the DU driver later remaps it to the VSP device. The
> > driver unfortunately can't use the drm_gem_prime_import_dev() helper to
> > map th buffer to the VSP device directly, as at import time we don't know
> > yet to which VSP device the buffer will need to be mapped (there is one
> > VSP composer per CRTC). Mapping the buffer to the VSP can thus only be
> > done when pinning the framebuffer, as we know at that time which plane
> > and CRTC it will be used with.
> > 
> > This works currently (with the caveat that an unneeded mapping to the DU
> > is created), but fails when the imported buffer is not physically
> > contiguous or is located in high memory. In the first case the GEM CMA
> > helper drm_gem_cma_prime_import_sg_table() will reject the buffer as it
> > hasn't been mapped contiguously in the DU DMA address space (because the
> > DU has no IOMMU), and in the second case the DMA mapping API will try to
> > allocate a bounce buffer and fail due to bounce buffer memory exhaustion.
> > 
> > The first patch in this series fixes the second issue by setting the DMA
> > coherent mask of the DU device to the full 40 bits address space. All
> > memory will thus be accepted without trying to allocate a bounce buffer.
> > 
> > The second patch in this series fixes the first issue by using an internal
> > copy of the drm_gem_cma_prime_import_sg_table() function that doesn't
> > ensure that the buffer is contiguous in memory. This is what caused me to
> > post this series as an RFC, as I'm not very happy with duplication of
> > helper code in the driver that could lead to breakages when the GEM CMA
> > helpers are modified. If this approach is accepted, I would prefer
> > implementing the code in the GEM CMA helpers.
> > 
> > Another point that bothers me is that buffers can now be imported
> > unconditionally on Gen3 platforms, but atomic commits will fail if the
> > buffer can't be remapped through the VSP (for instance if no IOMMU is
> > available). Given the hardware architecture and the DRM/KMS API I don't
> > really see a way around this.
> > 
> > Testing this series isn't easy as it requires getting hold of exported
> > dma-bufs located in high memory or not physically contiguous. I have used
> > the V4L2 vivid driver as an exporter for that purpose, with a few hacks
> > that can be found at
> > 
> > 	git://linuxtv.org/pinchartl/media.git drm/du/devel/iommu
> 
> Ok - testing from this branch with "drm: rcar-du: Allow importing
> non-contiguous dma-buf with VSP" applied on top (it appeared to be missing)
> 
> > On the userspace side I have used the v4l2-drm-example test application
> > available at
> > 
> > 	git://git.infradead.org/users/kmpark/public-apps
> > 
> > and run with
> > 
> > dmabuf-sharing -M rcar-du -i /dev/video0 -S 640,480 -f YUYV -F YUYV \
> > 
> > 	-s 640,480@0,0 -t 640,480@0,0 -o 70:68:640x480 -b 4 -e v4l2
> > 
> > (the -o argument needs to be modified based on the connector and CRTC ID).
> 
> I've build dmabuf-sharing from kmpark/public-apps/v4l2-drm-example, but it
> doesn't appear to have a "-e v4l2" option (not listed in the -h, and
> complains with "ERROR(dmabuf-sharing.c:560) : failed to parse arguments")
> 
> Have you also modified this application to run your tests?

Oops, yes, I have. Sorry for not mentioning that. The modified version can be 
found at

	git://git.ideasonboard.org/samsung-utils.git

> If it's easier, I can look at the screen while you run the test under your
> control as well.
> 
> > Up to patch "[DEBUG] v4l: videobuf2-dma-contig: Print allocated buffer
> > address" buffer sharing should work. Patch "[HACK] vivid: Allocate buffers
> > from high memory" then breaks buffer import, and the issue is fixed by
> > patch "drm: rcar-du: Set the DMA coherent mask for the DU device". Patch
> > "[HACK] vivid: Use vmalloc allocator" breaks buffer import again, and is
> > fixed by "drm: rcar-du: Allow importing non-contiguous dma-buf with VSP".
> > 
> > Kieran, I have tested this remotely on your Salvator-X H3 ES1.1 and
> > haven't noticed any problem, but couldn't check the output on the screen.
> > Would you be able to rerun the test locally for me ? Please note that the
> > IPMMU is known to have issues on H3 ES1.1, so you might need to test the
> > code on H3 ES2.0 instead.
> > 
> > Laurent Pinchart (2):
> >   drm: rcar-du: Set the DMA coherent mask for the DU device
> >   drm: rcar-du: Allow importing non-contiguous dma-buf with VSP
> >  
> >  drivers/gpu/drm/rcar-du/rcar_du_drv.c | 13 +++++++++++-
> >  drivers/gpu/drm/rcar-du/rcar_du_kms.c | 39 ++++++++++++++++++++++++++++++
> >  drivers/gpu/drm/rcar-du/rcar_du_kms.h |  7 +++++++
> >  3 files changed, 58 insertions(+), 1 deletion(-)

-- 
Regards,

Laurent Pinchart




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-12-18  7:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-13 10:32 [PATCH/RFC 0/2] R-Car DU: Support importing non-contiguous dma-buf with VSP Laurent Pinchart
2017-11-13 10:32 ` [PATCH/RFC 1/2] drm: rcar-du: Set the DMA coherent mask for the DU device Laurent Pinchart
2017-11-13 11:44   ` Liviu Dudau
2017-11-13 10:32 ` [PATCH/RFC 2/2] drm: rcar-du: Allow importing non-contiguous dma-buf with VSP Laurent Pinchart
2017-11-13 11:53   ` Liviu Dudau
2017-11-14  3:34     ` Laurent Pinchart
2017-11-14  9:27       ` Liviu Dudau
2017-12-13 22:46   ` [PATCH/RFC v1.1] " Laurent Pinchart
2017-11-13 13:48 ` [PATCH/RFC 0/2] R-Car DU: Support " Kieran Bingham
2018-12-18  7:40   ` Laurent Pinchart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).