All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] drm: etnaviv: Unmap gems on gem_close
       [not found] <cover.1603979517.git.agx@sigxcpu.org>
@ 2020-10-29 14:20 ` Guido Günther
  2020-10-29 14:20   ` [RFC PATCH 1/2] drm: etnaviv: Add lockdep annotations for context lock Guido Günther
  2020-10-29 14:20   ` [RFC PATCH 2/2] drm: etnaviv: Unmap gems on gem_close Guido Günther
  0 siblings, 2 replies; 10+ messages in thread
From: Guido Günther @ 2020-10-29 14:20 UTC (permalink / raw)
  To: Lucas Stach, Russell King, Christian Gmeiner, David Airlie,
	Daniel Vetter, etnaviv, dri-devel

This is meant as a RFC since i'm not sure if this is the right
way to fix the problem:

So far the unmap from gpu address space only happened when dropping the
last ref in gem_free_object_unlocked, however that is skipped if there's
still multiple handles to the same GEM object.

Since userspace (here mesa) in the case of softpin hands back the memory
region to the pool of available GPU virtual memory closing the handle
via DRM_IOCTL_GEM_CLOSE this can lead to etnaviv_iommu_insert_exact
failing later since userspace thinks the vaddr is available while the
kernel thinks it isn't making the submit fail like

     [E] submit failed: -14 (No space left on device) (etna_cmd_stream_flush:244)

Fix this by unmapping the memory via the .gem_close_object callback.

The patch is against 5.9 and will need to be redone for drm-misc-next due to
the conversion to GEM object functions but i'm happy to do that it looks like
the right approach.

I can trigger the problem when plugging/unplugging a DP screen driven by DCSS
while DSI is driven by mxsfb. It preferably happens with 4k since this
allocates bigger chunks.

I also folded in a commit checking for the context->lock in
etnaviv_iommu_insert_exact and etnaviv_iommu_remove_mapping too to make it
match etnaviv_iommu_find_iova.

To: Lucas Stach <l.stach@pengutronix.de>,Russell King <linux+etnaviv@armlinux.org.uk>,Christian Gmeiner <christian.gmeiner@gmail.com>,David Airlie <airlied@linux.ie>,Daniel Vetter <daniel@ffwll.ch>,etnaviv@lists.freedesktop.org, dri-devel@lists.freedesktop.org

Guido Günther (2):
  drm: etnaviv: Add lockdep annotations for context lock
  drm: etnaviv: Unmap gems on gem_close

 drivers/gpu/drm/etnaviv/etnaviv_drv.c |  1 +
 drivers/gpu/drm/etnaviv/etnaviv_drv.h |  1 +
 drivers/gpu/drm/etnaviv/etnaviv_gem.c | 32 +++++++++++++++++++++++++++
 drivers/gpu/drm/etnaviv/etnaviv_mmu.c |  4 ++++
 4 files changed, 38 insertions(+)

-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/2] drm: etnaviv: Add lockdep annotations for context lock
  2020-10-29 14:20 ` [RFC PATCH 0/2] drm: etnaviv: Unmap gems on gem_close Guido Günther
@ 2020-10-29 14:20   ` Guido Günther
  2020-10-30 14:35     ` Lucas Stach
  2020-10-29 14:20   ` [RFC PATCH 2/2] drm: etnaviv: Unmap gems on gem_close Guido Günther
  1 sibling, 1 reply; 10+ messages in thread
From: Guido Günther @ 2020-10-29 14:20 UTC (permalink / raw)
  To: Lucas Stach, Russell King, Christian Gmeiner, David Airlie,
	Daniel Vetter, etnaviv, dri-devel

etnaviv_iommu_find_iova has it so etnaviv_iommu_insert_exact and
lockdep_assert_held should have it as well.

Signed-off-by: Guido Günther <agx@sigxcpu.org>
---
 drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
index 3607d348c298..cd599ac04663 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
@@ -131,6 +131,8 @@ static void etnaviv_iommu_remove_mapping(struct etnaviv_iommu_context *context,
 {
 	struct etnaviv_gem_object *etnaviv_obj = mapping->object;
 
+	lockdep_assert_held(&context->lock);
+
 	etnaviv_iommu_unmap(context, mapping->vram_node.start,
 			    etnaviv_obj->sgt, etnaviv_obj->base.size);
 	drm_mm_remove_node(&mapping->vram_node);
@@ -223,6 +225,8 @@ static int etnaviv_iommu_find_iova(struct etnaviv_iommu_context *context,
 static int etnaviv_iommu_insert_exact(struct etnaviv_iommu_context *context,
 		   struct drm_mm_node *node, size_t size, u64 va)
 {
+	lockdep_assert_held(&context->lock);
+
 	return drm_mm_insert_node_in_range(&context->mm, node, size, 0, 0, va,
 					   va + size, DRM_MM_INSERT_LOWEST);
 }
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/2] drm: etnaviv: Unmap gems on gem_close
  2020-10-29 14:20 ` [RFC PATCH 0/2] drm: etnaviv: Unmap gems on gem_close Guido Günther
  2020-10-29 14:20   ` [RFC PATCH 1/2] drm: etnaviv: Add lockdep annotations for context lock Guido Günther
@ 2020-10-29 14:20   ` Guido Günther
  2020-10-29 14:38     ` Lucas Stach
  1 sibling, 1 reply; 10+ messages in thread
From: Guido Günther @ 2020-10-29 14:20 UTC (permalink / raw)
  To: Lucas Stach, Russell King, Christian Gmeiner, David Airlie,
	Daniel Vetter, etnaviv, dri-devel

So far the unmap from gpu address space only happened when dropping the
last ref in gem_free_object_unlocked, however that is skipped if there's
still multiple handles to the same GEM object.

Since userspace (here mesa) in the case of softpin hands back the memory
region to the pool of available GPU virtual memory closing the handle
via DRM_IOCTL_GEM_CLOSE this can lead to etnaviv_iommu_insert_exact
failing later since userspace thinks the vaddr is available while the
kernel thinks it isn't making the submit fail like

  [E] submit failed: -14 (No space left on device) (etna_cmd_stream_flush:244)

Fix this by unmapping the memory via the .gem_close_object callback.

Signed-off-by: Guido Günther <agx@sigxcpu.org>
---
 drivers/gpu/drm/etnaviv/etnaviv_drv.c |  1 +
 drivers/gpu/drm/etnaviv/etnaviv_drv.h |  1 +
 drivers/gpu/drm/etnaviv/etnaviv_gem.c | 32 +++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
index a9a3afaef9a1..fdcc6405068c 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
@@ -491,6 +491,7 @@ static struct drm_driver etnaviv_drm_driver = {
 	.open               = etnaviv_open,
 	.postclose           = etnaviv_postclose,
 	.gem_free_object_unlocked = etnaviv_gem_free_object,
+	.gem_close_object   = etnaviv_gem_close_object,
 	.gem_vm_ops         = &vm_ops,
 	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
 	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
index 4d8dc9236e5f..2226a9af0d63 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
@@ -65,6 +65,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
 		struct drm_etnaviv_timespec *timeout);
 int etnaviv_gem_cpu_fini(struct drm_gem_object *obj);
 void etnaviv_gem_free_object(struct drm_gem_object *obj);
+void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *file);
 int etnaviv_gem_new_handle(struct drm_device *dev, struct drm_file *file,
 		u32 size, u32 flags, u32 *handle);
 int etnaviv_gem_new_userptr(struct drm_device *dev, struct drm_file *file,
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index f06e19e7be04..5aec4a23c77e 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -515,6 +515,38 @@ static const struct etnaviv_gem_ops etnaviv_gem_shmem_ops = {
 	.mmap = etnaviv_gem_mmap_obj,
 };
 
+void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *unused)
+{
+	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
+	struct etnaviv_vram_mapping *mapping, *tmp;
+
+	/* Handle this via etnaviv_gem_free_object */
+	if (obj->handle_count == 1)
+		return;
+
+	WARN_ON(is_active(etnaviv_obj));
+
+	/*
+	 * userspace wants to release the handle so we need to remove
+	 * the mapping from the gpu's virtual address space to stay
+	 * in sync.
+	 */
+	list_for_each_entry_safe(mapping, tmp, &etnaviv_obj->vram_list,
+				 obj_node) {
+		struct etnaviv_iommu_context *context = mapping->context;
+
+		WARN_ON(mapping->use);
+
+		if (context) {
+			etnaviv_iommu_unmap_gem(context, mapping);
+			etnaviv_iommu_context_put(context);
+		}
+
+		list_del(&mapping->obj_node);
+		kfree(mapping);
+	}
+}
+
 void etnaviv_gem_free_object(struct drm_gem_object *obj)
 {
 	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] drm: etnaviv: Unmap gems on gem_close
  2020-10-29 14:20   ` [RFC PATCH 2/2] drm: etnaviv: Unmap gems on gem_close Guido Günther
@ 2020-10-29 14:38     ` Lucas Stach
  2020-10-29 18:20       ` Daniel Vetter
  0 siblings, 1 reply; 10+ messages in thread
From: Lucas Stach @ 2020-10-29 14:38 UTC (permalink / raw)
  To: Guido Günther, Russell King, Christian Gmeiner,
	David Airlie, Daniel Vetter, etnaviv, dri-devel

Hi Guido,

Am Donnerstag, den 29.10.2020, 15:20 +0100 schrieb Guido Günther:
> So far the unmap from gpu address space only happened when dropping the
> last ref in gem_free_object_unlocked, however that is skipped if there's
> still multiple handles to the same GEM object.
> 
> Since userspace (here mesa) in the case of softpin hands back the memory
> region to the pool of available GPU virtual memory closing the handle
> via DRM_IOCTL_GEM_CLOSE this can lead to etnaviv_iommu_insert_exact
> failing later since userspace thinks the vaddr is available while the
> kernel thinks it isn't making the submit fail like
> 
>   [E] submit failed: -14 (No space left on device) (etna_cmd_stream_flush:244)
> 
> Fix this by unmapping the memory via the .gem_close_object callback.
> 
> Signed-off-by: Guido Günther <agx@sigxcpu.org>
> ---
>  drivers/gpu/drm/etnaviv/etnaviv_drv.c |  1 +
>  drivers/gpu/drm/etnaviv/etnaviv_drv.h |  1 +
>  drivers/gpu/drm/etnaviv/etnaviv_gem.c | 32 +++++++++++++++++++++++++++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> index a9a3afaef9a1..fdcc6405068c 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> @@ -491,6 +491,7 @@ static struct drm_driver etnaviv_drm_driver = {
>  	.open               = etnaviv_open,
>  	.postclose           = etnaviv_postclose,
>  	.gem_free_object_unlocked = etnaviv_gem_free_object,
> +	.gem_close_object   = etnaviv_gem_close_object,
>  	.gem_vm_ops         = &vm_ops,
>  	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>  	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> index 4d8dc9236e5f..2226a9af0d63 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> @@ -65,6 +65,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
>  		struct drm_etnaviv_timespec *timeout);
>  int etnaviv_gem_cpu_fini(struct drm_gem_object *obj);
>  void etnaviv_gem_free_object(struct drm_gem_object *obj);
> +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *file);
>  int etnaviv_gem_new_handle(struct drm_device *dev, struct drm_file *file,
>  		u32 size, u32 flags, u32 *handle);
>  int etnaviv_gem_new_userptr(struct drm_device *dev, struct drm_file *file,
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> index f06e19e7be04..5aec4a23c77e 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> @@ -515,6 +515,38 @@ static const struct etnaviv_gem_ops etnaviv_gem_shmem_ops = {
>  	.mmap = etnaviv_gem_mmap_obj,
>  };
>  
> +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *unused)
> +{
> +	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> +	struct etnaviv_vram_mapping *mapping, *tmp;
> +
> +	/* Handle this via etnaviv_gem_free_object */
> +	if (obj->handle_count == 1)
> +		return;
> +
> +	WARN_ON(is_active(etnaviv_obj));
> +
> +	/*
> +	 * userspace wants to release the handle so we need to remove
> +	 * the mapping from the gpu's virtual address space to stay
> +	 * in sync.
> +	 */
> +	list_for_each_entry_safe(mapping, tmp, &etnaviv_obj->vram_list,
> +				 obj_node) {
> +		struct etnaviv_iommu_context *context = mapping->context;
> +
> +		WARN_ON(mapping->use);
> +
> +		if (context) {
> +			etnaviv_iommu_unmap_gem(context, mapping);
> +			etnaviv_iommu_context_put(context);

I see the issue you are trying to fix here, but this is not a viable
fix. While userspace may close the handle, the GPU may still be
processing jobs referening the BO, so we can't just unmap it from the
address space.

I think what we need to do here is waiting for the current jobs/fences
of the BO when we hit the case where userspace tries to assign a new
address to the BO. Basically wait for current jobs -> unamp from the
address space -> map at new userspace assigned address.

Regards,
Lucas

> +		}
> +
> +		list_del(&mapping->obj_node);
> +		kfree(mapping);
> +	}
> +}
> +
>  void etnaviv_gem_free_object(struct drm_gem_object *obj)
>  {
>  	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] drm: etnaviv: Unmap gems on gem_close
  2020-10-29 14:38     ` Lucas Stach
@ 2020-10-29 18:20       ` Daniel Vetter
  2020-10-30  9:19         ` Lucas Stach
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Vetter @ 2020-10-29 18:20 UTC (permalink / raw)
  To: Lucas Stach
  Cc: David Airlie, Guido Günther, etnaviv, dri-devel, Russell King

On Thu, Oct 29, 2020 at 03:38:21PM +0100, Lucas Stach wrote:
> Hi Guido,
> 
> Am Donnerstag, den 29.10.2020, 15:20 +0100 schrieb Guido Günther:
> > So far the unmap from gpu address space only happened when dropping the
> > last ref in gem_free_object_unlocked, however that is skipped if there's
> > still multiple handles to the same GEM object.
> > 
> > Since userspace (here mesa) in the case of softpin hands back the memory
> > region to the pool of available GPU virtual memory closing the handle
> > via DRM_IOCTL_GEM_CLOSE this can lead to etnaviv_iommu_insert_exact
> > failing later since userspace thinks the vaddr is available while the
> > kernel thinks it isn't making the submit fail like
> > 
> >   [E] submit failed: -14 (No space left on device) (etna_cmd_stream_flush:244)
> > 
> > Fix this by unmapping the memory via the .gem_close_object callback.
> > 
> > Signed-off-by: Guido Günther <agx@sigxcpu.org>
> > ---
> >  drivers/gpu/drm/etnaviv/etnaviv_drv.c |  1 +
> >  drivers/gpu/drm/etnaviv/etnaviv_drv.h |  1 +
> >  drivers/gpu/drm/etnaviv/etnaviv_gem.c | 32 +++++++++++++++++++++++++++
> >  3 files changed, 34 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > index a9a3afaef9a1..fdcc6405068c 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > @@ -491,6 +491,7 @@ static struct drm_driver etnaviv_drm_driver = {
> >  	.open               = etnaviv_open,
> >  	.postclose           = etnaviv_postclose,
> >  	.gem_free_object_unlocked = etnaviv_gem_free_object,
> > +	.gem_close_object   = etnaviv_gem_close_object,
> >  	.gem_vm_ops         = &vm_ops,
> >  	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> >  	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > index 4d8dc9236e5f..2226a9af0d63 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > @@ -65,6 +65,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> >  		struct drm_etnaviv_timespec *timeout);
> >  int etnaviv_gem_cpu_fini(struct drm_gem_object *obj);
> >  void etnaviv_gem_free_object(struct drm_gem_object *obj);
> > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *file);
> >  int etnaviv_gem_new_handle(struct drm_device *dev, struct drm_file *file,
> >  		u32 size, u32 flags, u32 *handle);
> >  int etnaviv_gem_new_userptr(struct drm_device *dev, struct drm_file *file,
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > index f06e19e7be04..5aec4a23c77e 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > @@ -515,6 +515,38 @@ static const struct etnaviv_gem_ops etnaviv_gem_shmem_ops = {
> >  	.mmap = etnaviv_gem_mmap_obj,
> >  };
> >  
> > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *unused)
> > +{
> > +	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> > +	struct etnaviv_vram_mapping *mapping, *tmp;
> > +
> > +	/* Handle this via etnaviv_gem_free_object */
> > +	if (obj->handle_count == 1)
> > +		return;
> > +
> > +	WARN_ON(is_active(etnaviv_obj));
> > +
> > +	/*
> > +	 * userspace wants to release the handle so we need to remove
> > +	 * the mapping from the gpu's virtual address space to stay
> > +	 * in sync.
> > +	 */
> > +	list_for_each_entry_safe(mapping, tmp, &etnaviv_obj->vram_list,
> > +				 obj_node) {
> > +		struct etnaviv_iommu_context *context = mapping->context;
> > +
> > +		WARN_ON(mapping->use);
> > +
> > +		if (context) {
> > +			etnaviv_iommu_unmap_gem(context, mapping);
> > +			etnaviv_iommu_context_put(context);
> 
> I see the issue you are trying to fix here, but this is not a viable
> fix. While userspace may close the handle, the GPU may still be
> processing jobs referening the BO, so we can't just unmap it from the
> address space.
> 
> I think what we need to do here is waiting for the current jobs/fences
> of the BO when we hit the case where userspace tries to assign a new
> address to the BO. Basically wait for current jobs -> unamp from the
> address space -> map at new userspace assigned address.

Yeah was about to say the same. There's two solutions to this:
- let the kernel manage the VA space. This is what amdgpu does in some
  cases (but still no relocations)
- pipeline the VA/PTE updates in your driver, because userspace has a
  somewhat hard time figuring out when a buffer is done. Doing that would
  either at complexity or stalls when the kernel is doing all the tracking
  already anyway. Minimal fix is to do what Lucas explained above, but
  importantly with the kernel solution we have the option to fully
  pipeline everything and avoid stalls. I think this is what everyone else
  who lets userspace manage VA does in their kernel side.

-Daniel





> 
> Regards,
> Lucas
> 
> > +		}
> > +
> > +		list_del(&mapping->obj_node);
> > +		kfree(mapping);
> > +	}
> > +}
> > +
> >  void etnaviv_gem_free_object(struct drm_gem_object *obj)
> >  {
> >  	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] drm: etnaviv: Unmap gems on gem_close
  2020-10-29 18:20       ` Daniel Vetter
@ 2020-10-30  9:19         ` Lucas Stach
  2020-10-30  9:33           ` Daniel Vetter
  2020-10-30 14:17           ` Guido Günther
  0 siblings, 2 replies; 10+ messages in thread
From: Lucas Stach @ 2020-10-30  9:19 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: David Airlie, Guido Günther, etnaviv, dri-devel, Russell King

Am Donnerstag, den 29.10.2020, 19:20 +0100 schrieb Daniel Vetter:
> On Thu, Oct 29, 2020 at 03:38:21PM +0100, Lucas Stach wrote:
> > Hi Guido,
> > 
> > Am Donnerstag, den 29.10.2020, 15:20 +0100 schrieb Guido Günther:
> > > So far the unmap from gpu address space only happened when dropping the
> > > last ref in gem_free_object_unlocked, however that is skipped if there's
> > > still multiple handles to the same GEM object.
> > > 
> > > Since userspace (here mesa) in the case of softpin hands back the memory
> > > region to the pool of available GPU virtual memory closing the handle
> > > via DRM_IOCTL_GEM_CLOSE this can lead to etnaviv_iommu_insert_exact
> > > failing later since userspace thinks the vaddr is available while the
> > > kernel thinks it isn't making the submit fail like
> > > 
> > >   [E] submit failed: -14 (No space left on device) (etna_cmd_stream_flush:244)
> > > 
> > > Fix this by unmapping the memory via the .gem_close_object callback.
> > > 
> > > Signed-off-by: Guido Günther <agx@sigxcpu.org>
> > > ---
> > >  drivers/gpu/drm/etnaviv/etnaviv_drv.c |  1 +
> > >  drivers/gpu/drm/etnaviv/etnaviv_drv.h |  1 +
> > >  drivers/gpu/drm/etnaviv/etnaviv_gem.c | 32 +++++++++++++++++++++++++++
> > >  3 files changed, 34 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > index a9a3afaef9a1..fdcc6405068c 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > @@ -491,6 +491,7 @@ static struct drm_driver etnaviv_drm_driver = {
> > >  	.open               = etnaviv_open,
> > >  	.postclose           = etnaviv_postclose,
> > >  	.gem_free_object_unlocked = etnaviv_gem_free_object,
> > > +	.gem_close_object   = etnaviv_gem_close_object,
> > >  	.gem_vm_ops         = &vm_ops,
> > >  	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> > >  	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > index 4d8dc9236e5f..2226a9af0d63 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > @@ -65,6 +65,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> > >  		struct drm_etnaviv_timespec *timeout);
> > >  int etnaviv_gem_cpu_fini(struct drm_gem_object *obj);
> > >  void etnaviv_gem_free_object(struct drm_gem_object *obj);
> > > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *file);
> > >  int etnaviv_gem_new_handle(struct drm_device *dev, struct drm_file *file,
> > >  		u32 size, u32 flags, u32 *handle);
> > >  int etnaviv_gem_new_userptr(struct drm_device *dev, struct drm_file *file,
> > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > index f06e19e7be04..5aec4a23c77e 100644
> > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > @@ -515,6 +515,38 @@ static const struct etnaviv_gem_ops etnaviv_gem_shmem_ops = {
> > >  	.mmap = etnaviv_gem_mmap_obj,
> > >  };
> > >  
> > > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *unused)
> > > +{
> > > +	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> > > +	struct etnaviv_vram_mapping *mapping, *tmp;
> > > +
> > > +	/* Handle this via etnaviv_gem_free_object */
> > > +	if (obj->handle_count == 1)
> > > +		return;
> > > +
> > > +	WARN_ON(is_active(etnaviv_obj));
> > > +
> > > +	/*
> > > +	 * userspace wants to release the handle so we need to remove
> > > +	 * the mapping from the gpu's virtual address space to stay
> > > +	 * in sync.
> > > +	 */
> > > +	list_for_each_entry_safe(mapping, tmp, &etnaviv_obj->vram_list,
> > > +				 obj_node) {
> > > +		struct etnaviv_iommu_context *context = mapping->context;
> > > +
> > > +		WARN_ON(mapping->use);
> > > +
> > > +		if (context) {
> > > +			etnaviv_iommu_unmap_gem(context, mapping);
> > > +			etnaviv_iommu_context_put(context);
> > 
> > I see the issue you are trying to fix here, but this is not a viable
> > fix. While userspace may close the handle, the GPU may still be
> > processing jobs referening the BO, so we can't just unmap it from the
> > address space.
> > 
> > I think what we need to do here is waiting for the current jobs/fences
> > of the BO when we hit the case where userspace tries to assign a new
> > address to the BO. Basically wait for current jobs -> unamp from the
> > address space -> map at new userspace assigned address.
> 
> Yeah was about to say the same. There's two solutions to this:
> - let the kernel manage the VA space. This is what amdgpu does in some
>   cases (but still no relocations)
> - pipeline the VA/PTE updates in your driver, because userspace has a
>   somewhat hard time figuring out when a buffer is done. Doing that would
>   either at complexity or stalls when the kernel is doing all the tracking
>   already anyway. Minimal fix is to do what Lucas explained above, but
>   importantly with the kernel solution we have the option to fully
>   pipeline everything and avoid stalls. I think this is what everyone else
>   who lets userspace manage VA does in their kernel side.

I thought a bit more about this and the issue is only about userspace
reusing regions of its _private_ address space, before they are GPU
inactive. So the problem of tracking BO active for figuring out when it
is safe to close the handle (and thus forget about the last placement
of the BO in the AS) goes from "is this BO globally active anywhere?"
to "is this BO still active in one of the jobs I submitted?", which is
not hard at all for the userspace to figure out.

With this in mind I think we can reasonably declare the kernel behavior
of rejecting the submit as okay and just add the tiny bit of userspace
fencing needed for the userspace driver to not close the handle until
the last submit referencing this BO has finished.

Regards,
Lucas

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] drm: etnaviv: Unmap gems on gem_close
  2020-10-30  9:19         ` Lucas Stach
@ 2020-10-30  9:33           ` Daniel Vetter
  2020-10-30  9:38             ` Lucas Stach
  2020-10-30 14:17           ` Guido Günther
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Vetter @ 2020-10-30  9:33 UTC (permalink / raw)
  To: Lucas Stach
  Cc: David Airlie, Guido Günther, etnaviv, dri-devel, Russell King

On Fri, Oct 30, 2020 at 10:19:54AM +0100, Lucas Stach wrote:
> Am Donnerstag, den 29.10.2020, 19:20 +0100 schrieb Daniel Vetter:
> > On Thu, Oct 29, 2020 at 03:38:21PM +0100, Lucas Stach wrote:
> > > Hi Guido,
> > > 
> > > Am Donnerstag, den 29.10.2020, 15:20 +0100 schrieb Guido Günther:
> > > > So far the unmap from gpu address space only happened when dropping the
> > > > last ref in gem_free_object_unlocked, however that is skipped if there's
> > > > still multiple handles to the same GEM object.
> > > > 
> > > > Since userspace (here mesa) in the case of softpin hands back the memory
> > > > region to the pool of available GPU virtual memory closing the handle
> > > > via DRM_IOCTL_GEM_CLOSE this can lead to etnaviv_iommu_insert_exact
> > > > failing later since userspace thinks the vaddr is available while the
> > > > kernel thinks it isn't making the submit fail like
> > > > 
> > > >   [E] submit failed: -14 (No space left on device) (etna_cmd_stream_flush:244)
> > > > 
> > > > Fix this by unmapping the memory via the .gem_close_object callback.
> > > > 
> > > > Signed-off-by: Guido Günther <agx@sigxcpu.org>
> > > > ---
> > > >  drivers/gpu/drm/etnaviv/etnaviv_drv.c |  1 +
> > > >  drivers/gpu/drm/etnaviv/etnaviv_drv.h |  1 +
> > > >  drivers/gpu/drm/etnaviv/etnaviv_gem.c | 32 +++++++++++++++++++++++++++
> > > >  3 files changed, 34 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > > index a9a3afaef9a1..fdcc6405068c 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > > @@ -491,6 +491,7 @@ static struct drm_driver etnaviv_drm_driver = {
> > > >  	.open               = etnaviv_open,
> > > >  	.postclose           = etnaviv_postclose,
> > > >  	.gem_free_object_unlocked = etnaviv_gem_free_object,
> > > > +	.gem_close_object   = etnaviv_gem_close_object,
> > > >  	.gem_vm_ops         = &vm_ops,
> > > >  	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> > > >  	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > index 4d8dc9236e5f..2226a9af0d63 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > @@ -65,6 +65,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> > > >  		struct drm_etnaviv_timespec *timeout);
> > > >  int etnaviv_gem_cpu_fini(struct drm_gem_object *obj);
> > > >  void etnaviv_gem_free_object(struct drm_gem_object *obj);
> > > > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *file);
> > > >  int etnaviv_gem_new_handle(struct drm_device *dev, struct drm_file *file,
> > > >  		u32 size, u32 flags, u32 *handle);
> > > >  int etnaviv_gem_new_userptr(struct drm_device *dev, struct drm_file *file,
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > > index f06e19e7be04..5aec4a23c77e 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > > @@ -515,6 +515,38 @@ static const struct etnaviv_gem_ops etnaviv_gem_shmem_ops = {
> > > >  	.mmap = etnaviv_gem_mmap_obj,
> > > >  };
> > > >  
> > > > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *unused)
> > > > +{
> > > > +	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> > > > +	struct etnaviv_vram_mapping *mapping, *tmp;
> > > > +
> > > > +	/* Handle this via etnaviv_gem_free_object */
> > > > +	if (obj->handle_count == 1)
> > > > +		return;
> > > > +
> > > > +	WARN_ON(is_active(etnaviv_obj));
> > > > +
> > > > +	/*
> > > > +	 * userspace wants to release the handle so we need to remove
> > > > +	 * the mapping from the gpu's virtual address space to stay
> > > > +	 * in sync.
> > > > +	 */
> > > > +	list_for_each_entry_safe(mapping, tmp, &etnaviv_obj->vram_list,
> > > > +				 obj_node) {
> > > > +		struct etnaviv_iommu_context *context = mapping->context;
> > > > +
> > > > +		WARN_ON(mapping->use);
> > > > +
> > > > +		if (context) {
> > > > +			etnaviv_iommu_unmap_gem(context, mapping);
> > > > +			etnaviv_iommu_context_put(context);
> > > 
> > > I see the issue you are trying to fix here, but this is not a viable
> > > fix. While userspace may close the handle, the GPU may still be
> > > processing jobs referening the BO, so we can't just unmap it from the
> > > address space.
> > > 
> > > I think what we need to do here is waiting for the current jobs/fences
> > > of the BO when we hit the case where userspace tries to assign a new
> > > address to the BO. Basically wait for current jobs -> unamp from the
> > > address space -> map at new userspace assigned address.
> > 
> > Yeah was about to say the same. There's two solutions to this:
> > - let the kernel manage the VA space. This is what amdgpu does in some
> >   cases (but still no relocations)
> > - pipeline the VA/PTE updates in your driver, because userspace has a
> >   somewhat hard time figuring out when a buffer is done. Doing that would
> >   either at complexity or stalls when the kernel is doing all the tracking
> >   already anyway. Minimal fix is to do what Lucas explained above, but
> >   importantly with the kernel solution we have the option to fully
> >   pipeline everything and avoid stalls. I think this is what everyone else
> >   who lets userspace manage VA does in their kernel side.
> 
> I thought a bit more about this and the issue is only about userspace
> reusing regions of its _private_ address space, before they are GPU
> inactive. So the problem of tracking BO active for figuring out when it
> is safe to close the handle (and thus forget about the last placement
> of the BO in the AS) goes from "is this BO globally active anywhere?"
> to "is this BO still active in one of the jobs I submitted?", which is
> not hard at all for the userspace to figure out.
> 
> With this in mind I think we can reasonably declare the kernel behavior
> of rejecting the submit as okay and just add the tiny bit of userspace
> fencing needed for the userspace driver to not close the handle until
> the last submit referencing this BO has finished.

Hm why don't you have the same issue for shared buffers? Are these handled
with relocs?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] drm: etnaviv: Unmap gems on gem_close
  2020-10-30  9:33           ` Daniel Vetter
@ 2020-10-30  9:38             ` Lucas Stach
  0 siblings, 0 replies; 10+ messages in thread
From: Lucas Stach @ 2020-10-30  9:38 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: David Airlie, Guido Günther, etnaviv, dri-devel, Russell King

Am Freitag, den 30.10.2020, 10:33 +0100 schrieb Daniel Vetter:
> On Fri, Oct 30, 2020 at 10:19:54AM +0100, Lucas Stach wrote:
> > Am Donnerstag, den 29.10.2020, 19:20 +0100 schrieb Daniel Vetter:
> > > On Thu, Oct 29, 2020 at 03:38:21PM +0100, Lucas Stach wrote:
> > > > Hi Guido,
> > > > 
> > > > Am Donnerstag, den 29.10.2020, 15:20 +0100 schrieb Guido Günther:
> > > > > So far the unmap from gpu address space only happened when dropping the
> > > > > last ref in gem_free_object_unlocked, however that is skipped if there's
> > > > > still multiple handles to the same GEM object.
> > > > > 
> > > > > Since userspace (here mesa) in the case of softpin hands back the memory
> > > > > region to the pool of available GPU virtual memory closing the handle
> > > > > via DRM_IOCTL_GEM_CLOSE this can lead to etnaviv_iommu_insert_exact
> > > > > failing later since userspace thinks the vaddr is available while the
> > > > > kernel thinks it isn't making the submit fail like
> > > > > 
> > > > >   [E] submit failed: -14 (No space left on device) (etna_cmd_stream_flush:244)
> > > > > 
> > > > > Fix this by unmapping the memory via the .gem_close_object callback.
> > > > > 
> > > > > Signed-off-by: Guido Günther <agx@sigxcpu.org>
> > > > > ---
> > > > >  drivers/gpu/drm/etnaviv/etnaviv_drv.c |  1 +
> > > > >  drivers/gpu/drm/etnaviv/etnaviv_drv.h |  1 +
> > > > >  drivers/gpu/drm/etnaviv/etnaviv_gem.c | 32 +++++++++++++++++++++++++++
> > > > >  3 files changed, 34 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > > > index a9a3afaef9a1..fdcc6405068c 100644
> > > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > > > @@ -491,6 +491,7 @@ static struct drm_driver etnaviv_drm_driver = {
> > > > >  	.open               = etnaviv_open,
> > > > >  	.postclose           = etnaviv_postclose,
> > > > >  	.gem_free_object_unlocked = etnaviv_gem_free_object,
> > > > > +	.gem_close_object   = etnaviv_gem_close_object,
> > > > >  	.gem_vm_ops         = &vm_ops,
> > > > >  	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> > > > >  	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > > index 4d8dc9236e5f..2226a9af0d63 100644
> > > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > > @@ -65,6 +65,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> > > > >  		struct drm_etnaviv_timespec *timeout);
> > > > >  int etnaviv_gem_cpu_fini(struct drm_gem_object *obj);
> > > > >  void etnaviv_gem_free_object(struct drm_gem_object *obj);
> > > > > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *file);
> > > > >  int etnaviv_gem_new_handle(struct drm_device *dev, struct drm_file *file,
> > > > >  		u32 size, u32 flags, u32 *handle);
> > > > >  int etnaviv_gem_new_userptr(struct drm_device *dev, struct drm_file *file,
> > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > > > index f06e19e7be04..5aec4a23c77e 100644
> > > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > > > @@ -515,6 +515,38 @@ static const struct etnaviv_gem_ops etnaviv_gem_shmem_ops = {
> > > > >  	.mmap = etnaviv_gem_mmap_obj,
> > > > >  };
> > > > >  
> > > > > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *unused)
> > > > > +{
> > > > > +	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> > > > > +	struct etnaviv_vram_mapping *mapping, *tmp;
> > > > > +
> > > > > +	/* Handle this via etnaviv_gem_free_object */
> > > > > +	if (obj->handle_count == 1)
> > > > > +		return;
> > > > > +
> > > > > +	WARN_ON(is_active(etnaviv_obj));
> > > > > +
> > > > > +	/*
> > > > > +	 * userspace wants to release the handle so we need to remove
> > > > > +	 * the mapping from the gpu's virtual address space to stay
> > > > > +	 * in sync.
> > > > > +	 */
> > > > > +	list_for_each_entry_safe(mapping, tmp, &etnaviv_obj->vram_list,
> > > > > +				 obj_node) {
> > > > > +		struct etnaviv_iommu_context *context = mapping->context;
> > > > > +
> > > > > +		WARN_ON(mapping->use);
> > > > > +
> > > > > +		if (context) {
> > > > > +			etnaviv_iommu_unmap_gem(context, mapping);
> > > > > +			etnaviv_iommu_context_put(context);
> > > > 
> > > > I see the issue you are trying to fix here, but this is not a viable
> > > > fix. While userspace may close the handle, the GPU may still be
> > > > processing jobs referening the BO, so we can't just unmap it from the
> > > > address space.
> > > > 
> > > > I think what we need to do here is waiting for the current jobs/fences
> > > > of the BO when we hit the case where userspace tries to assign a new
> > > > address to the BO. Basically wait for current jobs -> unamp from the
> > > > address space -> map at new userspace assigned address.
> > > 
> > > Yeah was about to say the same. There's two solutions to this:
> > > - let the kernel manage the VA space. This is what amdgpu does in some
> > >   cases (but still no relocations)
> > > - pipeline the VA/PTE updates in your driver, because userspace has a
> > >   somewhat hard time figuring out when a buffer is done. Doing that would
> > >   either at complexity or stalls when the kernel is doing all the tracking
> > >   already anyway. Minimal fix is to do what Lucas explained above, but
> > >   importantly with the kernel solution we have the option to fully
> > >   pipeline everything and avoid stalls. I think this is what everyone else
> > >   who lets userspace manage VA does in their kernel side.
> > 
> > I thought a bit more about this and the issue is only about userspace
> > reusing regions of its _private_ address space, before they are GPU
> > inactive. So the problem of tracking BO active for figuring out when it
> > is safe to close the handle (and thus forget about the last placement
> > of the BO in the AS) goes from "is this BO globally active anywhere?"
> > to "is this BO still active in one of the jobs I submitted?", which is
> > not hard at all for the userspace to figure out.
> > 
> > With this in mind I think we can reasonably declare the kernel behavior
> > of rejecting the submit as okay and just add the tiny bit of userspace
> > fencing needed for the userspace driver to not close the handle until
> > the last submit referencing this BO has finished.
> 
> Hm why don't you have the same issue for shared buffers? Are these handled
> with relocs?

What's clashing here is the mapping in the private address space.
Shared buffers are mapped separately in the address space of each
client. So for the issue of unmapping and reusing the spot in the
address space we only care about the buffer being active for that
single client, we don't need to care about the buffer potentially still
being active with another client, as this won't hurt us here.

Regards,
Lucas

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] drm: etnaviv: Unmap gems on gem_close
  2020-10-30  9:19         ` Lucas Stach
  2020-10-30  9:33           ` Daniel Vetter
@ 2020-10-30 14:17           ` Guido Günther
  1 sibling, 0 replies; 10+ messages in thread
From: Guido Günther @ 2020-10-30 14:17 UTC (permalink / raw)
  To: Lucas Stach; +Cc: David Airlie, etnaviv, dri-devel, Russell King

Hi,
On Fri, Oct 30, 2020 at 10:19:54AM +0100, Lucas Stach wrote:
> Am Donnerstag, den 29.10.2020, 19:20 +0100 schrieb Daniel Vetter:
> > On Thu, Oct 29, 2020 at 03:38:21PM +0100, Lucas Stach wrote:
> > > Hi Guido,
> > > 
> > > Am Donnerstag, den 29.10.2020, 15:20 +0100 schrieb Guido Günther:
> > > > So far the unmap from gpu address space only happened when dropping the
> > > > last ref in gem_free_object_unlocked, however that is skipped if there's
> > > > still multiple handles to the same GEM object.
> > > > 
> > > > Since userspace (here mesa) in the case of softpin hands back the memory
> > > > region to the pool of available GPU virtual memory closing the handle
> > > > via DRM_IOCTL_GEM_CLOSE this can lead to etnaviv_iommu_insert_exact
> > > > failing later since userspace thinks the vaddr is available while the
> > > > kernel thinks it isn't making the submit fail like
> > > > 
> > > >   [E] submit failed: -14 (No space left on device) (etna_cmd_stream_flush:244)
> > > > 
> > > > Fix this by unmapping the memory via the .gem_close_object callback.
> > > > 
> > > > Signed-off-by: Guido Günther <agx@sigxcpu.org>
> > > > ---
> > > >  drivers/gpu/drm/etnaviv/etnaviv_drv.c |  1 +
> > > >  drivers/gpu/drm/etnaviv/etnaviv_drv.h |  1 +
> > > >  drivers/gpu/drm/etnaviv/etnaviv_gem.c | 32 +++++++++++++++++++++++++++
> > > >  3 files changed, 34 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > > index a9a3afaef9a1..fdcc6405068c 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > > @@ -491,6 +491,7 @@ static struct drm_driver etnaviv_drm_driver = {
> > > >  	.open               = etnaviv_open,
> > > >  	.postclose           = etnaviv_postclose,
> > > >  	.gem_free_object_unlocked = etnaviv_gem_free_object,
> > > > +	.gem_close_object   = etnaviv_gem_close_object,
> > > >  	.gem_vm_ops         = &vm_ops,
> > > >  	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> > > >  	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > index 4d8dc9236e5f..2226a9af0d63 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > @@ -65,6 +65,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> > > >  		struct drm_etnaviv_timespec *timeout);
> > > >  int etnaviv_gem_cpu_fini(struct drm_gem_object *obj);
> > > >  void etnaviv_gem_free_object(struct drm_gem_object *obj);
> > > > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *file);
> > > >  int etnaviv_gem_new_handle(struct drm_device *dev, struct drm_file *file,
> > > >  		u32 size, u32 flags, u32 *handle);
> > > >  int etnaviv_gem_new_userptr(struct drm_device *dev, struct drm_file *file,
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > > index f06e19e7be04..5aec4a23c77e 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > > @@ -515,6 +515,38 @@ static const struct etnaviv_gem_ops etnaviv_gem_shmem_ops = {
> > > >  	.mmap = etnaviv_gem_mmap_obj,
> > > >  };
> > > >  
> > > > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *unused)
> > > > +{
> > > > +	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> > > > +	struct etnaviv_vram_mapping *mapping, *tmp;
> > > > +
> > > > +	/* Handle this via etnaviv_gem_free_object */
> > > > +	if (obj->handle_count == 1)
> > > > +		return;
> > > > +
> > > > +	WARN_ON(is_active(etnaviv_obj));
> > > > +
> > > > +	/*
> > > > +	 * userspace wants to release the handle so we need to remove
> > > > +	 * the mapping from the gpu's virtual address space to stay
> > > > +	 * in sync.
> > > > +	 */
> > > > +	list_for_each_entry_safe(mapping, tmp, &etnaviv_obj->vram_list,
> > > > +				 obj_node) {
> > > > +		struct etnaviv_iommu_context *context = mapping->context;
> > > > +
> > > > +		WARN_ON(mapping->use);
> > > > +
> > > > +		if (context) {
> > > > +			etnaviv_iommu_unmap_gem(context, mapping);
> > > > +			etnaviv_iommu_context_put(context);
> > > 
> > > I see the issue you are trying to fix here, but this is not a viable
> > > fix. While userspace may close the handle, the GPU may still be
> > > processing jobs referening the BO, so we can't just unmap it from the
> > > address space.
> > > 
> > > I think what we need to do here is waiting for the current jobs/fences
> > > of the BO when we hit the case where userspace tries to assign a new
> > > address to the BO. Basically wait for current jobs -> unamp from the
> > > address space -> map at new userspace assigned address.
> > 
> > Yeah was about to say the same. There's two solutions to this:
> > - let the kernel manage the VA space. This is what amdgpu does in some
> >   cases (but still no relocations)
> > - pipeline the VA/PTE updates in your driver, because userspace has a
> >   somewhat hard time figuring out when a buffer is done. Doing that would
> >   either at complexity or stalls when the kernel is doing all the tracking
> >   already anyway. Minimal fix is to do what Lucas explained above, but
> >   importantly with the kernel solution we have the option to fully
> >   pipeline everything and avoid stalls. I think this is what everyone else
> >   who lets userspace manage VA does in their kernel side.
> 
> I thought a bit more about this and the issue is only about userspace
> reusing regions of its _private_ address space, before they are GPU
> inactive. So the problem of tracking BO active for figuring out when it
> is safe to close the handle (and thus forget about the last placement
> of the BO in the AS) goes from "is this BO globally active anywhere?"
> to "is this BO still active in one of the jobs I submitted?", which is
> not hard at all for the userspace to figure out.

It's about private address space, yes.
  
> With this in mind I think we can reasonably declare the kernel behavior
> of rejecting the submit as okay and just add the tiny bit of userspace
> fencing needed for the userspace driver to not close the handle until
> the last submit referencing this BO has finished.

Sounds reasonable, i'll have a look.
Cheers,
 -- Guido

> 
> Regards,
> Lucas
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/2] drm: etnaviv: Add lockdep annotations for context lock
  2020-10-29 14:20   ` [RFC PATCH 1/2] drm: etnaviv: Add lockdep annotations for context lock Guido Günther
@ 2020-10-30 14:35     ` Lucas Stach
  0 siblings, 0 replies; 10+ messages in thread
From: Lucas Stach @ 2020-10-30 14:35 UTC (permalink / raw)
  To: Guido Günther, Russell King, Christian Gmeiner,
	David Airlie, Daniel Vetter, etnaviv, dri-devel

Am Donnerstag, den 29.10.2020, 15:20 +0100 schrieb Guido Günther:
> etnaviv_iommu_find_iova has it so etnaviv_iommu_insert_exact and
> lockdep_assert_held should have it as well.

This sounds reasonable to me. I've added this patch to my etnaviv/next
branch.

Regards,
Lucas

> Signed-off-by: Guido Günther <agx@sigxcpu.org>
> ---
>  drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
> index 3607d348c298..cd599ac04663 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
> @@ -131,6 +131,8 @@ static void etnaviv_iommu_remove_mapping(struct etnaviv_iommu_context *context,
>  {
>  	struct etnaviv_gem_object *etnaviv_obj = mapping->object;
>  
> +	lockdep_assert_held(&context->lock);
> +
>  	etnaviv_iommu_unmap(context, mapping->vram_node.start,
>  			    etnaviv_obj->sgt, etnaviv_obj->base.size);
>  	drm_mm_remove_node(&mapping->vram_node);
> @@ -223,6 +225,8 @@ static int etnaviv_iommu_find_iova(struct etnaviv_iommu_context *context,
>  static int etnaviv_iommu_insert_exact(struct etnaviv_iommu_context *context,
>  		   struct drm_mm_node *node, size_t size, u64 va)
>  {
> +	lockdep_assert_held(&context->lock);
> +
>  	return drm_mm_insert_node_in_range(&context->mm, node, size, 0, 0, va,
>  					   va + size, DRM_MM_INSERT_LOWEST);
>  }

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-10-30 14:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1603979517.git.agx@sigxcpu.org>
2020-10-29 14:20 ` [RFC PATCH 0/2] drm: etnaviv: Unmap gems on gem_close Guido Günther
2020-10-29 14:20   ` [RFC PATCH 1/2] drm: etnaviv: Add lockdep annotations for context lock Guido Günther
2020-10-30 14:35     ` Lucas Stach
2020-10-29 14:20   ` [RFC PATCH 2/2] drm: etnaviv: Unmap gems on gem_close Guido Günther
2020-10-29 14:38     ` Lucas Stach
2020-10-29 18:20       ` Daniel Vetter
2020-10-30  9:19         ` Lucas Stach
2020-10-30  9:33           ` Daniel Vetter
2020-10-30  9:38             ` Lucas Stach
2020-10-30 14:17           ` Guido Günther

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.