linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/14] drm/msm: de-struct_mutex-ification
@ 2020-10-12  2:09 Rob Clark
  2020-10-12  2:09 ` [PATCH v2 01/22] drm/msm/gem: Add obj->lock wrappers Rob Clark
                   ` (21 more replies)
  0 siblings, 22 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, AngeloGioacchino Del Regno,
	Brian Masney, Christophe JAILLET, Daniel Vetter, Emil Velikov,
	Eric Anholt, open list:DRM DRIVER FOR MSM ADRENO GPU,
	Harigovindan P, Jonathan Marek, Jordan Crouse,
	moderated list:DMA BUFFER SHARING FRAMEWORK,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list,
	open list:DMA BUFFER SHARING FRAMEWORK, Liviu Dudau,
	Matthias Kaehlcke, Rajendra Nayak, Sam Ravnborg, Sharat Masetty,
	Thomas Zimmermann

From: Rob Clark <robdclark@chromium.org>

This doesn't remove *all* the struct_mutex, but it covers the worst
of it, ie. shrinker/madvise/free/retire.  The submit path still uses
struct_mutex, but it still needs *something* serialize a portion of
the submit path, and lock_stat mostly just shows the lock contention
there being with other submits.  And there are a few other bits of
struct_mutex usage in less critical paths (debugfs, etc).  But this
seems like a reasonable step in the right direction.

v2: teach lockdep about shrinker locking patters (danvet) and
    convert to obj->resv locking (danvet)

Rob Clark (22):
  drm/msm/gem: Add obj->lock wrappers
  drm/msm/gem: Rename internal get_iova_locked helper
  drm/msm/gem: Move prototypes to msm_gem.h
  drm/msm/gem: Add some _locked() helpers
  drm/msm/gem: Move locking in shrinker path
  drm/msm/submit: Move copy_from_user ahead of locking bos
  drm/msm: Do rpm get sooner in the submit path
  drm/msm/gem: Switch over to obj->resv for locking
  drm/msm: Use correct drm_gem_object_put() in fail case
  drm/msm: Drop chatty trace
  drm/msm: Move update_fences()
  drm/msm: Add priv->mm_lock to protect active/inactive lists
  drm/msm: Document and rename preempt_lock
  drm/msm: Protect ring->submits with it's own lock
  drm/msm: Refcount submits
  drm/msm: Remove obj->gpu
  drm/msm: Drop struct_mutex from the retire path
  drm/msm: Drop struct_mutex in free_object() path
  drm/msm: remove msm_gem_free_work
  drm/msm: drop struct_mutex in madvise path
  drm/msm: Drop struct_mutex in shrinker path
  drm/msm: Don't implicit-sync if only a single ring

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     |   4 +-
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c |  12 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c     |   4 +-
 drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c |   1 +
 drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c |   1 +
 drivers/gpu/drm/msm/dsi/dsi_host.c        |   1 +
 drivers/gpu/drm/msm/msm_debugfs.c         |   7 +
 drivers/gpu/drm/msm/msm_drv.c             |  21 +-
 drivers/gpu/drm/msm/msm_drv.h             |  73 ++-----
 drivers/gpu/drm/msm/msm_fbdev.c           |   1 +
 drivers/gpu/drm/msm/msm_gem.c             | 245 ++++++++++------------
 drivers/gpu/drm/msm/msm_gem.h             | 131 ++++++++++--
 drivers/gpu/drm/msm/msm_gem_shrinker.c    |  81 +++----
 drivers/gpu/drm/msm/msm_gem_submit.c      | 154 +++++++++-----
 drivers/gpu/drm/msm/msm_gpu.c             |  98 +++++----
 drivers/gpu/drm/msm/msm_gpu.h             |   5 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c      |   3 +-
 drivers/gpu/drm/msm/msm_ringbuffer.h      |  13 +-
 18 files changed, 459 insertions(+), 396 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 01/22] drm/msm/gem: Add obj->lock wrappers
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 02/22] drm/msm/gem: Rename internal get_iova_locked helper Rob Clark
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

This will make it easier to transition over to obj->resv locking for
everything that is per-bo locking.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_gem.c | 99 ++++++++++++++++-------------------
 drivers/gpu/drm/msm/msm_gem.h | 28 ++++++++++
 2 files changed, 74 insertions(+), 53 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 14e14caf90f9..afef9c6b1a1c 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -178,15 +178,15 @@ struct page **msm_gem_get_pages(struct drm_gem_object *obj)
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	struct page **p;
 
-	mutex_lock(&msm_obj->lock);
+	msm_gem_lock(obj);
 
 	if (WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED)) {
-		mutex_unlock(&msm_obj->lock);
+		msm_gem_unlock(obj);
 		return ERR_PTR(-EBUSY);
 	}
 
 	p = get_pages(obj);
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 	return p;
 }
 
@@ -252,14 +252,14 @@ vm_fault_t msm_gem_fault(struct vm_fault *vmf)
 	 * vm_ops.open/drm_gem_mmap_obj and close get and put
 	 * a reference on obj. So, we dont need to hold one here.
 	 */
-	err = mutex_lock_interruptible(&msm_obj->lock);
+	err = msm_gem_lock_interruptible(obj);
 	if (err) {
 		ret = VM_FAULT_NOPAGE;
 		goto out;
 	}
 
 	if (WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED)) {
-		mutex_unlock(&msm_obj->lock);
+		msm_gem_unlock(obj);
 		return VM_FAULT_SIGBUS;
 	}
 
@@ -280,7 +280,7 @@ vm_fault_t msm_gem_fault(struct vm_fault *vmf)
 
 	ret = vmf_insert_mixed(vma, vmf->address, __pfn_to_pfn_t(pfn, PFN_DEV));
 out_unlock:
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 out:
 	return ret;
 }
@@ -289,10 +289,9 @@ vm_fault_t msm_gem_fault(struct vm_fault *vmf)
 static uint64_t mmap_offset(struct drm_gem_object *obj)
 {
 	struct drm_device *dev = obj->dev;
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	int ret;
 
-	WARN_ON(!mutex_is_locked(&msm_obj->lock));
+	WARN_ON(!msm_gem_is_locked(obj));
 
 	/* Make it mmapable */
 	ret = drm_gem_create_mmap_offset(obj);
@@ -308,11 +307,10 @@ static uint64_t mmap_offset(struct drm_gem_object *obj)
 uint64_t msm_gem_mmap_offset(struct drm_gem_object *obj)
 {
 	uint64_t offset;
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
-	mutex_lock(&msm_obj->lock);
+	msm_gem_lock(obj);
 	offset = mmap_offset(obj);
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 	return offset;
 }
 
@@ -322,7 +320,7 @@ static struct msm_gem_vma *add_vma(struct drm_gem_object *obj,
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	struct msm_gem_vma *vma;
 
-	WARN_ON(!mutex_is_locked(&msm_obj->lock));
+	WARN_ON(!msm_gem_is_locked(obj));
 
 	vma = kzalloc(sizeof(*vma), GFP_KERNEL);
 	if (!vma)
@@ -341,7 +339,7 @@ static struct msm_gem_vma *lookup_vma(struct drm_gem_object *obj,
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	struct msm_gem_vma *vma;
 
-	WARN_ON(!mutex_is_locked(&msm_obj->lock));
+	WARN_ON(!msm_gem_is_locked(obj));
 
 	list_for_each_entry(vma, &msm_obj->vmas, list) {
 		if (vma->aspace == aspace)
@@ -360,14 +358,14 @@ static void del_vma(struct msm_gem_vma *vma)
 	kfree(vma);
 }
 
-/* Called with msm_obj->lock locked */
+/* Called with msm_obj locked */
 static void
 put_iova(struct drm_gem_object *obj)
 {
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	struct msm_gem_vma *vma, *tmp;
 
-	WARN_ON(!mutex_is_locked(&msm_obj->lock));
+	WARN_ON(!msm_gem_is_locked(obj));
 
 	list_for_each_entry_safe(vma, tmp, &msm_obj->vmas, list) {
 		if (vma->aspace) {
@@ -382,11 +380,10 @@ static int msm_gem_get_iova_locked(struct drm_gem_object *obj,
 		struct msm_gem_address_space *aspace, uint64_t *iova,
 		u64 range_start, u64 range_end)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	struct msm_gem_vma *vma;
 	int ret = 0;
 
-	WARN_ON(!mutex_is_locked(&msm_obj->lock));
+	WARN_ON(!msm_gem_is_locked(obj));
 
 	vma = lookup_vma(obj, aspace);
 
@@ -421,7 +418,7 @@ static int msm_gem_pin_iova(struct drm_gem_object *obj,
 	if (msm_obj->flags & MSM_BO_MAP_PRIV)
 		prot |= IOMMU_PRIV;
 
-	WARN_ON(!mutex_is_locked(&msm_obj->lock));
+	WARN_ON(!msm_gem_is_locked(obj));
 
 	if (WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED))
 		return -EBUSY;
@@ -446,11 +443,10 @@ int msm_gem_get_and_pin_iova_range(struct drm_gem_object *obj,
 		struct msm_gem_address_space *aspace, uint64_t *iova,
 		u64 range_start, u64 range_end)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	u64 local;
 	int ret;
 
-	mutex_lock(&msm_obj->lock);
+	msm_gem_lock(obj);
 
 	ret = msm_gem_get_iova_locked(obj, aspace, &local,
 		range_start, range_end);
@@ -461,7 +457,7 @@ int msm_gem_get_and_pin_iova_range(struct drm_gem_object *obj,
 	if (!ret)
 		*iova = local;
 
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 	return ret;
 }
 
@@ -479,12 +475,11 @@ int msm_gem_get_and_pin_iova(struct drm_gem_object *obj,
 int msm_gem_get_iova(struct drm_gem_object *obj,
 		struct msm_gem_address_space *aspace, uint64_t *iova)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	int ret;
 
-	mutex_lock(&msm_obj->lock);
+	msm_gem_lock(obj);
 	ret = msm_gem_get_iova_locked(obj, aspace, iova, 0, U64_MAX);
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 
 	return ret;
 }
@@ -495,12 +490,11 @@ int msm_gem_get_iova(struct drm_gem_object *obj,
 uint64_t msm_gem_iova(struct drm_gem_object *obj,
 		struct msm_gem_address_space *aspace)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	struct msm_gem_vma *vma;
 
-	mutex_lock(&msm_obj->lock);
+	msm_gem_lock(obj);
 	vma = lookup_vma(obj, aspace);
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 	WARN_ON(!vma);
 
 	return vma ? vma->iova : 0;
@@ -514,16 +508,15 @@ uint64_t msm_gem_iova(struct drm_gem_object *obj,
 void msm_gem_unpin_iova(struct drm_gem_object *obj,
 		struct msm_gem_address_space *aspace)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	struct msm_gem_vma *vma;
 
-	mutex_lock(&msm_obj->lock);
+	msm_gem_lock(obj);
 	vma = lookup_vma(obj, aspace);
 
 	if (!WARN_ON(!vma))
 		msm_gem_unmap_vma(aspace, vma);
 
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 }
 
 int msm_gem_dumb_create(struct drm_file *file, struct drm_device *dev,
@@ -564,20 +557,20 @@ static void *get_vaddr(struct drm_gem_object *obj, unsigned madv)
 	if (obj->import_attach)
 		return ERR_PTR(-ENODEV);
 
-	mutex_lock(&msm_obj->lock);
+	msm_gem_lock(obj);
 
 	if (WARN_ON(msm_obj->madv > madv)) {
 		DRM_DEV_ERROR(obj->dev->dev, "Invalid madv state: %u vs %u\n",
 			msm_obj->madv, madv);
-		mutex_unlock(&msm_obj->lock);
+		msm_gem_unlock(obj);
 		return ERR_PTR(-EBUSY);
 	}
 
 	/* increment vmap_count *before* vmap() call, so shrinker can
-	 * check vmap_count (is_vunmapable()) outside of msm_obj->lock.
+	 * check vmap_count (is_vunmapable()) outside of msm_obj lock.
 	 * This guarantees that we won't try to msm_gem_vunmap() this
 	 * same object from within the vmap() call (while we already
-	 * hold msm_obj->lock)
+	 * hold msm_obj lock)
 	 */
 	msm_obj->vmap_count++;
 
@@ -595,12 +588,12 @@ static void *get_vaddr(struct drm_gem_object *obj, unsigned madv)
 		}
 	}
 
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 	return msm_obj->vaddr;
 
 fail:
 	msm_obj->vmap_count--;
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 	return ERR_PTR(ret);
 }
 
@@ -624,10 +617,10 @@ void msm_gem_put_vaddr(struct drm_gem_object *obj)
 {
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
-	mutex_lock(&msm_obj->lock);
+	msm_gem_lock(obj);
 	WARN_ON(msm_obj->vmap_count < 1);
 	msm_obj->vmap_count--;
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 }
 
 /* Update madvise status, returns true if not purged, else
@@ -637,7 +630,7 @@ int msm_gem_madvise(struct drm_gem_object *obj, unsigned madv)
 {
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
-	mutex_lock(&msm_obj->lock);
+	msm_gem_lock(obj);
 
 	WARN_ON(!mutex_is_locked(&obj->dev->struct_mutex));
 
@@ -646,7 +639,7 @@ int msm_gem_madvise(struct drm_gem_object *obj, unsigned madv)
 
 	madv = msm_obj->madv;
 
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 
 	return (madv != __MSM_MADV_PURGED);
 }
@@ -683,14 +676,14 @@ void msm_gem_purge(struct drm_gem_object *obj, enum msm_gem_lock subclass)
 	invalidate_mapping_pages(file_inode(obj->filp)->i_mapping,
 			0, (loff_t)-1);
 
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 }
 
 static void msm_gem_vunmap_locked(struct drm_gem_object *obj)
 {
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
-	WARN_ON(!mutex_is_locked(&msm_obj->lock));
+	WARN_ON(!msm_gem_is_locked(obj));
 
 	if (!msm_obj->vaddr || WARN_ON(!is_vunmapable(msm_obj)))
 		return;
@@ -705,7 +698,7 @@ void msm_gem_vunmap(struct drm_gem_object *obj, enum msm_gem_lock subclass)
 
 	mutex_lock_nested(&msm_obj->lock, subclass);
 	msm_gem_vunmap_locked(obj);
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 }
 
 /* must be called before _move_to_active().. */
@@ -816,7 +809,7 @@ void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 	uint64_t off = drm_vma_node_start(&obj->vma_node);
 	const char *madv;
 
-	mutex_lock(&msm_obj->lock);
+	msm_gem_lock(obj);
 
 	switch (msm_obj->madv) {
 	case __MSM_MADV_PURGED:
@@ -884,7 +877,7 @@ void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 		describe_fence(fence, "Exclusive", m);
 	rcu_read_unlock();
 
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 }
 
 void msm_gem_describe_objects(struct list_head *list, struct seq_file *m)
@@ -929,7 +922,7 @@ static void free_object(struct msm_gem_object *msm_obj)
 
 	list_del(&msm_obj->mm_list);
 
-	mutex_lock(&msm_obj->lock);
+	msm_gem_lock(obj);
 
 	put_iova(obj);
 
@@ -950,7 +943,7 @@ static void free_object(struct msm_gem_object *msm_obj)
 
 	drm_gem_object_release(obj);
 
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 	kfree(msm_obj);
 }
 
@@ -1070,10 +1063,10 @@ static struct drm_gem_object *_msm_gem_new(struct drm_device *dev,
 		struct msm_gem_vma *vma;
 		struct page **pages;
 
-		mutex_lock(&msm_obj->lock);
+		msm_gem_lock(obj);
 
 		vma = add_vma(obj, NULL);
-		mutex_unlock(&msm_obj->lock);
+		msm_gem_unlock(obj);
 		if (IS_ERR(vma)) {
 			ret = PTR_ERR(vma);
 			goto fail;
@@ -1157,22 +1150,22 @@ struct drm_gem_object *msm_gem_import(struct drm_device *dev,
 	npages = size / PAGE_SIZE;
 
 	msm_obj = to_msm_bo(obj);
-	mutex_lock(&msm_obj->lock);
+	msm_gem_lock(obj);
 	msm_obj->sgt = sgt;
 	msm_obj->pages = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL);
 	if (!msm_obj->pages) {
-		mutex_unlock(&msm_obj->lock);
+		msm_gem_unlock(obj);
 		ret = -ENOMEM;
 		goto fail;
 	}
 
 	ret = drm_prime_sg_to_page_addr_arrays(sgt, msm_obj->pages, NULL, npages);
 	if (ret) {
-		mutex_unlock(&msm_obj->lock);
+		msm_gem_unlock(obj);
 		goto fail;
 	}
 
-	mutex_unlock(&msm_obj->lock);
+	msm_gem_unlock(obj);
 
 	mutex_lock(&dev->struct_mutex);
 	list_add_tail(&msm_obj->mm_list, &priv->inactive_list);
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index a1bf741b9b89..f6482154e8bb 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -93,6 +93,34 @@ struct msm_gem_object {
 };
 #define to_msm_bo(x) container_of(x, struct msm_gem_object, base)
 
+static inline void
+msm_gem_lock(struct drm_gem_object *obj)
+{
+	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	mutex_lock(&msm_obj->lock);
+}
+
+static inline int
+msm_gem_lock_interruptible(struct drm_gem_object *obj)
+{
+	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	return mutex_lock_interruptible(&msm_obj->lock);
+}
+
+static inline void
+msm_gem_unlock(struct drm_gem_object *obj)
+{
+	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	mutex_unlock(&msm_obj->lock);
+}
+
+static inline bool
+msm_gem_is_locked(struct drm_gem_object *obj)
+{
+	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	return mutex_is_locked(&msm_obj->lock);
+}
+
 static inline bool is_active(struct msm_gem_object *msm_obj)
 {
 	return atomic_read(&msm_obj->active_count);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 02/22] drm/msm/gem: Rename internal get_iova_locked helper
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
  2020-10-12  2:09 ` [PATCH v2 01/22] drm/msm/gem: Add obj->lock wrappers Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 03/22] drm/msm/gem: Move prototypes to msm_gem.h Rob Clark
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

We'll need to introduce a _locked() version of msm_gem_get_iova(), so we
need to make that name available.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_gem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index afef9c6b1a1c..dec89fe79025 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -376,7 +376,7 @@ put_iova(struct drm_gem_object *obj)
 	}
 }
 
-static int msm_gem_get_iova_locked(struct drm_gem_object *obj,
+static int get_iova_locked(struct drm_gem_object *obj,
 		struct msm_gem_address_space *aspace, uint64_t *iova,
 		u64 range_start, u64 range_end)
 {
@@ -448,7 +448,7 @@ int msm_gem_get_and_pin_iova_range(struct drm_gem_object *obj,
 
 	msm_gem_lock(obj);
 
-	ret = msm_gem_get_iova_locked(obj, aspace, &local,
+	ret = get_iova_locked(obj, aspace, &local,
 		range_start, range_end);
 
 	if (!ret)
@@ -478,7 +478,7 @@ int msm_gem_get_iova(struct drm_gem_object *obj,
 	int ret;
 
 	msm_gem_lock(obj);
-	ret = msm_gem_get_iova_locked(obj, aspace, iova, 0, U64_MAX);
+	ret = get_iova_locked(obj, aspace, iova, 0, U64_MAX);
 	msm_gem_unlock(obj);
 
 	return ret;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 03/22] drm/msm/gem: Move prototypes to msm_gem.h
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
  2020-10-12  2:09 ` [PATCH v2 01/22] drm/msm/gem: Add obj->lock wrappers Rob Clark
  2020-10-12  2:09 ` [PATCH v2 02/22] drm/msm/gem: Rename internal get_iova_locked helper Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 04/22] drm/msm/gem: Add some _locked() helpers Rob Clark
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	Sumit Semwal, Christian König, Thomas Zimmermann,
	Sam Ravnborg, Emil Velikov, Christophe JAILLET, Brian Masney,
	Harigovindan P, Jeffrey Hugo, Rajendra Nayak,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list,
	open list:DMA BUFFER SHARING FRAMEWORK,
	moderated list:DMA BUFFER SHARING FRAMEWORK

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c |  1 +
 drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c |  1 +
 drivers/gpu/drm/msm/dsi/dsi_host.c        |  1 +
 drivers/gpu/drm/msm/msm_drv.h             | 54 ----------------------
 drivers/gpu/drm/msm/msm_fbdev.c           |  1 +
 drivers/gpu/drm/msm/msm_gem.h             | 56 +++++++++++++++++++++++
 6 files changed, 60 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c b/drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c
index a0253297bc76..b65b2329cc8d 100644
--- a/drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c
+++ b/drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c
@@ -11,6 +11,7 @@
 #include <drm/drm_vblank.h>
 
 #include "mdp4_kms.h"
+#include "msm_gem.h"
 
 struct mdp4_crtc {
 	struct drm_crtc base;
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c
index c39dad151bb6..81fbd52ad7e7 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c
@@ -15,6 +15,7 @@
 #include <drm/drm_vblank.h>
 
 #include "mdp5_kms.h"
+#include "msm_gem.h"
 
 #define CURSOR_WIDTH	64
 #define CURSOR_HEIGHT	64
diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c b/drivers/gpu/drm/msm/dsi/dsi_host.c
index b17ac6c27554..5e7cdc11c764 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -26,6 +26,7 @@
 #include "sfpb.xml.h"
 #include "dsi_cfg.h"
 #include "msm_kms.h"
+#include "msm_gem.h"
 
 #define DSI_RESET_TOGGLE_DELAY_MS 20
 
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index b9dd8f8f4887..79ee7d05b363 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -273,28 +273,6 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 void msm_gem_shrinker_init(struct drm_device *dev);
 void msm_gem_shrinker_cleanup(struct drm_device *dev);
 
-int msm_gem_mmap_obj(struct drm_gem_object *obj,
-			struct vm_area_struct *vma);
-int msm_gem_mmap(struct file *filp, struct vm_area_struct *vma);
-vm_fault_t msm_gem_fault(struct vm_fault *vmf);
-uint64_t msm_gem_mmap_offset(struct drm_gem_object *obj);
-int msm_gem_get_iova(struct drm_gem_object *obj,
-		struct msm_gem_address_space *aspace, uint64_t *iova);
-int msm_gem_get_and_pin_iova_range(struct drm_gem_object *obj,
-		struct msm_gem_address_space *aspace, uint64_t *iova,
-		u64 range_start, u64 range_end);
-int msm_gem_get_and_pin_iova(struct drm_gem_object *obj,
-		struct msm_gem_address_space *aspace, uint64_t *iova);
-uint64_t msm_gem_iova(struct drm_gem_object *obj,
-		struct msm_gem_address_space *aspace);
-void msm_gem_unpin_iova(struct drm_gem_object *obj,
-		struct msm_gem_address_space *aspace);
-struct page **msm_gem_get_pages(struct drm_gem_object *obj);
-void msm_gem_put_pages(struct drm_gem_object *obj);
-int msm_gem_dumb_create(struct drm_file *file, struct drm_device *dev,
-		struct drm_mode_create_dumb *args);
-int msm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
-		uint32_t handle, uint64_t *offset);
 struct sg_table *msm_gem_prime_get_sg_table(struct drm_gem_object *obj);
 void *msm_gem_prime_vmap(struct drm_gem_object *obj);
 void msm_gem_prime_vunmap(struct drm_gem_object *obj, void *vaddr);
@@ -303,38 +281,8 @@ struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev,
 		struct dma_buf_attachment *attach, struct sg_table *sg);
 int msm_gem_prime_pin(struct drm_gem_object *obj);
 void msm_gem_prime_unpin(struct drm_gem_object *obj);
-void *msm_gem_get_vaddr(struct drm_gem_object *obj);
-void *msm_gem_get_vaddr_active(struct drm_gem_object *obj);
-void msm_gem_put_vaddr(struct drm_gem_object *obj);
-int msm_gem_madvise(struct drm_gem_object *obj, unsigned madv);
-int msm_gem_sync_object(struct drm_gem_object *obj,
-		struct msm_fence_context *fctx, bool exclusive);
-void msm_gem_active_get(struct drm_gem_object *obj, struct msm_gpu *gpu);
-void msm_gem_active_put(struct drm_gem_object *obj);
-int msm_gem_cpu_prep(struct drm_gem_object *obj, uint32_t op, ktime_t *timeout);
-int msm_gem_cpu_fini(struct drm_gem_object *obj);
-void msm_gem_free_object(struct drm_gem_object *obj);
-int msm_gem_new_handle(struct drm_device *dev, struct drm_file *file,
-		uint32_t size, uint32_t flags, uint32_t *handle, char *name);
-struct drm_gem_object *msm_gem_new(struct drm_device *dev,
-		uint32_t size, uint32_t flags);
-struct drm_gem_object *msm_gem_new_locked(struct drm_device *dev,
-		uint32_t size, uint32_t flags);
-void *msm_gem_kernel_new(struct drm_device *dev, uint32_t size,
-		uint32_t flags, struct msm_gem_address_space *aspace,
-		struct drm_gem_object **bo, uint64_t *iova);
-void *msm_gem_kernel_new_locked(struct drm_device *dev, uint32_t size,
-		uint32_t flags, struct msm_gem_address_space *aspace,
-		struct drm_gem_object **bo, uint64_t *iova);
-void msm_gem_kernel_put(struct drm_gem_object *bo,
-		struct msm_gem_address_space *aspace, bool locked);
-struct drm_gem_object *msm_gem_import(struct drm_device *dev,
-		struct dma_buf *dmabuf, struct sg_table *sgt);
 void msm_gem_free_work(struct work_struct *work);
 
-__printf(2, 3)
-void msm_gem_object_set_name(struct drm_gem_object *bo, const char *fmt, ...);
-
 int msm_framebuffer_prepare(struct drm_framebuffer *fb,
 		struct msm_gem_address_space *aspace);
 void msm_framebuffer_cleanup(struct drm_framebuffer *fb,
@@ -447,8 +395,6 @@ void __init msm_dpu_register(void);
 void __exit msm_dpu_unregister(void);
 
 #ifdef CONFIG_DEBUG_FS
-void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m);
-void msm_gem_describe_objects(struct list_head *list, struct seq_file *m);
 void msm_framebuffer_describe(struct drm_framebuffer *fb, struct seq_file *m);
 int msm_debugfs_late_init(struct drm_device *dev);
 int msm_rd_debugfs_init(struct drm_minor *minor);
diff --git a/drivers/gpu/drm/msm/msm_fbdev.c b/drivers/gpu/drm/msm/msm_fbdev.c
index 47235f8c5922..678dba1725a6 100644
--- a/drivers/gpu/drm/msm/msm_fbdev.c
+++ b/drivers/gpu/drm/msm/msm_fbdev.c
@@ -9,6 +9,7 @@
 #include <drm/drm_fourcc.h>
 
 #include "msm_drv.h"
+#include "msm_gem.h"
 #include "msm_kms.h"
 
 extern int msm_gem_mmap_obj(struct drm_gem_object *obj,
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index f6482154e8bb..fbad08badf43 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -93,6 +93,62 @@ struct msm_gem_object {
 };
 #define to_msm_bo(x) container_of(x, struct msm_gem_object, base)
 
+int msm_gem_mmap_obj(struct drm_gem_object *obj,
+			struct vm_area_struct *vma);
+int msm_gem_mmap(struct file *filp, struct vm_area_struct *vma);
+vm_fault_t msm_gem_fault(struct vm_fault *vmf);
+uint64_t msm_gem_mmap_offset(struct drm_gem_object *obj);
+int msm_gem_get_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova);
+int msm_gem_get_and_pin_iova_range(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova,
+		u64 range_start, u64 range_end);
+int msm_gem_get_and_pin_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova);
+uint64_t msm_gem_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace);
+void msm_gem_unpin_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace);
+struct page **msm_gem_get_pages(struct drm_gem_object *obj);
+void msm_gem_put_pages(struct drm_gem_object *obj);
+int msm_gem_dumb_create(struct drm_file *file, struct drm_device *dev,
+		struct drm_mode_create_dumb *args);
+int msm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
+		uint32_t handle, uint64_t *offset);
+void *msm_gem_get_vaddr(struct drm_gem_object *obj);
+void *msm_gem_get_vaddr_active(struct drm_gem_object *obj);
+void msm_gem_put_vaddr(struct drm_gem_object *obj);
+int msm_gem_madvise(struct drm_gem_object *obj, unsigned madv);
+int msm_gem_sync_object(struct drm_gem_object *obj,
+		struct msm_fence_context *fctx, bool exclusive);
+void msm_gem_active_get(struct drm_gem_object *obj, struct msm_gpu *gpu);
+void msm_gem_active_put(struct drm_gem_object *obj);
+int msm_gem_cpu_prep(struct drm_gem_object *obj, uint32_t op, ktime_t *timeout);
+int msm_gem_cpu_fini(struct drm_gem_object *obj);
+void msm_gem_free_object(struct drm_gem_object *obj);
+int msm_gem_new_handle(struct drm_device *dev, struct drm_file *file,
+		uint32_t size, uint32_t flags, uint32_t *handle, char *name);
+struct drm_gem_object *msm_gem_new(struct drm_device *dev,
+		uint32_t size, uint32_t flags);
+struct drm_gem_object *msm_gem_new_locked(struct drm_device *dev,
+		uint32_t size, uint32_t flags);
+void *msm_gem_kernel_new(struct drm_device *dev, uint32_t size,
+		uint32_t flags, struct msm_gem_address_space *aspace,
+		struct drm_gem_object **bo, uint64_t *iova);
+void *msm_gem_kernel_new_locked(struct drm_device *dev, uint32_t size,
+		uint32_t flags, struct msm_gem_address_space *aspace,
+		struct drm_gem_object **bo, uint64_t *iova);
+void msm_gem_kernel_put(struct drm_gem_object *bo,
+		struct msm_gem_address_space *aspace, bool locked);
+struct drm_gem_object *msm_gem_import(struct drm_device *dev,
+		struct dma_buf *dmabuf, struct sg_table *sgt);
+__printf(2, 3)
+void msm_gem_object_set_name(struct drm_gem_object *bo, const char *fmt, ...);
+#ifdef CONFIG_DEBUG_FS
+void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m);
+void msm_gem_describe_objects(struct list_head *list, struct seq_file *m);
+#endif
+
 static inline void
 msm_gem_lock(struct drm_gem_object *obj)
 {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 04/22] drm/msm/gem: Add some _locked() helpers
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (2 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 03/22] drm/msm/gem: Move prototypes to msm_gem.h Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 05/22] drm/msm/gem: Move locking in shrinker path Rob Clark
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

When we cut-over to using dma_resv_lock/etc instead of msm_obj->lock,
we'll need these for the submit path (where resv->lock is already held).

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_gem.c | 50 +++++++++++++++++++++++++++--------
 drivers/gpu/drm/msm/msm_gem.h |  4 +++
 2 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index dec89fe79025..7bca2e815933 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -435,18 +435,14 @@ static int msm_gem_pin_iova(struct drm_gem_object *obj,
 			msm_obj->sgt, obj->size >> PAGE_SHIFT);
 }
 
-/*
- * get iova and pin it. Should have a matching put
- * limits iova to specified range (in pages)
- */
-int msm_gem_get_and_pin_iova_range(struct drm_gem_object *obj,
+static int get_and_pin_iova_range_locked(struct drm_gem_object *obj,
 		struct msm_gem_address_space *aspace, uint64_t *iova,
 		u64 range_start, u64 range_end)
 {
 	u64 local;
 	int ret;
 
-	msm_gem_lock(obj);
+	WARN_ON(!msm_gem_is_locked(obj));
 
 	ret = get_iova_locked(obj, aspace, &local,
 		range_start, range_end);
@@ -457,10 +453,32 @@ int msm_gem_get_and_pin_iova_range(struct drm_gem_object *obj,
 	if (!ret)
 		*iova = local;
 
+	return ret;
+}
+
+/*
+ * get iova and pin it. Should have a matching put
+ * limits iova to specified range (in pages)
+ */
+int msm_gem_get_and_pin_iova_range(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova,
+		u64 range_start, u64 range_end)
+{
+	int ret;
+
+	msm_gem_lock(obj);
+	ret = get_and_pin_iova_range_locked(obj, aspace, iova, range_start, range_end);
 	msm_gem_unlock(obj);
+
 	return ret;
 }
 
+int msm_gem_get_and_pin_iova_locked(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova)
+{
+	return get_and_pin_iova_range_locked(obj, aspace, iova, 0, U64_MAX);
+}
+
 /* get iova and pin it. Should have a matching put */
 int msm_gem_get_and_pin_iova(struct drm_gem_object *obj,
 		struct msm_gem_address_space *aspace, uint64_t *iova)
@@ -501,21 +519,31 @@ uint64_t msm_gem_iova(struct drm_gem_object *obj,
 }
 
 /*
- * Unpin a iova by updating the reference counts. The memory isn't actually
- * purged until something else (shrinker, mm_notifier, destroy, etc) decides
- * to get rid of it
+ * Locked variant of msm_gem_unpin_iova()
  */
-void msm_gem_unpin_iova(struct drm_gem_object *obj,
+void msm_gem_unpin_iova_locked(struct drm_gem_object *obj,
 		struct msm_gem_address_space *aspace)
 {
 	struct msm_gem_vma *vma;
 
-	msm_gem_lock(obj);
+	WARN_ON(!msm_gem_is_locked(obj));
+
 	vma = lookup_vma(obj, aspace);
 
 	if (!WARN_ON(!vma))
 		msm_gem_unmap_vma(aspace, vma);
+}
 
+/*
+ * Unpin a iova by updating the reference counts. The memory isn't actually
+ * purged until something else (shrinker, mm_notifier, destroy, etc) decides
+ * to get rid of it
+ */
+void msm_gem_unpin_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace)
+{
+	msm_gem_lock(obj);
+	msm_gem_unpin_iova_locked(obj, aspace);
 	msm_gem_unlock(obj);
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index fbad08badf43..016f616dd118 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -103,10 +103,14 @@ int msm_gem_get_iova(struct drm_gem_object *obj,
 int msm_gem_get_and_pin_iova_range(struct drm_gem_object *obj,
 		struct msm_gem_address_space *aspace, uint64_t *iova,
 		u64 range_start, u64 range_end);
+int msm_gem_get_and_pin_iova_locked(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova);
 int msm_gem_get_and_pin_iova(struct drm_gem_object *obj,
 		struct msm_gem_address_space *aspace, uint64_t *iova);
 uint64_t msm_gem_iova(struct drm_gem_object *obj,
 		struct msm_gem_address_space *aspace);
+void msm_gem_unpin_iova_locked(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace);
 void msm_gem_unpin_iova(struct drm_gem_object *obj,
 		struct msm_gem_address_space *aspace);
 struct page **msm_gem_get_pages(struct drm_gem_object *obj);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 05/22] drm/msm/gem: Move locking in shrinker path
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (3 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 04/22] drm/msm/gem: Add some _locked() helpers Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 06/22] drm/msm/submit: Move copy_from_user ahead of locking bos Rob Clark
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

Move grabbing the bo lock into shrinker, with a msm_gem_trylock() to
skip over bo's that are already locked.  This gets rid of the nested
lock classes.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_gem.c          | 24 +++++----------------
 drivers/gpu/drm/msm/msm_gem.h          | 29 ++++++++++----------------
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 27 +++++++++++++++++-------
 3 files changed, 35 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 7bca2e815933..ff8ca257bdc6 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -17,8 +17,6 @@
 #include "msm_gpu.h"
 #include "msm_mmu.h"
 
-static void msm_gem_vunmap_locked(struct drm_gem_object *obj);
-
 
 static dma_addr_t physaddr(struct drm_gem_object *obj)
 {
@@ -672,20 +670,19 @@ int msm_gem_madvise(struct drm_gem_object *obj, unsigned madv)
 	return (madv != __MSM_MADV_PURGED);
 }
 
-void msm_gem_purge(struct drm_gem_object *obj, enum msm_gem_lock subclass)
+void msm_gem_purge(struct drm_gem_object *obj)
 {
 	struct drm_device *dev = obj->dev;
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
+	WARN_ON(!msm_gem_is_locked(obj));
 	WARN_ON(!is_purgeable(msm_obj));
 	WARN_ON(obj->import_attach);
 
-	mutex_lock_nested(&msm_obj->lock, subclass);
-
 	put_iova(obj);
 
-	msm_gem_vunmap_locked(obj);
+	msm_gem_vunmap(obj);
 
 	put_pages(obj);
 
@@ -703,11 +700,9 @@ void msm_gem_purge(struct drm_gem_object *obj, enum msm_gem_lock subclass)
 
 	invalidate_mapping_pages(file_inode(obj->filp)->i_mapping,
 			0, (loff_t)-1);
-
-	msm_gem_unlock(obj);
 }
 
-static void msm_gem_vunmap_locked(struct drm_gem_object *obj)
+void msm_gem_vunmap(struct drm_gem_object *obj)
 {
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
@@ -720,15 +715,6 @@ static void msm_gem_vunmap_locked(struct drm_gem_object *obj)
 	msm_obj->vaddr = NULL;
 }
 
-void msm_gem_vunmap(struct drm_gem_object *obj, enum msm_gem_lock subclass)
-{
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
-
-	mutex_lock_nested(&msm_obj->lock, subclass);
-	msm_gem_vunmap_locked(obj);
-	msm_gem_unlock(obj);
-}
-
 /* must be called before _move_to_active().. */
 int msm_gem_sync_object(struct drm_gem_object *obj,
 		struct msm_fence_context *fctx, bool exclusive)
@@ -965,7 +951,7 @@ static void free_object(struct msm_gem_object *msm_obj)
 
 		drm_prime_gem_destroy(obj, msm_obj->sgt);
 	} else {
-		msm_gem_vunmap_locked(obj);
+		msm_gem_vunmap(obj);
 		put_pages(obj);
 	}
 
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 016f616dd118..947eeaca661d 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -160,6 +160,13 @@ msm_gem_lock(struct drm_gem_object *obj)
 	mutex_lock(&msm_obj->lock);
 }
 
+static inline bool __must_check
+msm_gem_trylock(struct drm_gem_object *obj)
+{
+	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	return mutex_trylock_recursive(&msm_obj->lock) == MUTEX_TRYLOCK_SUCCESS;
+}
+
 static inline int
 msm_gem_lock_interruptible(struct drm_gem_object *obj)
 {
@@ -188,6 +195,7 @@ static inline bool is_active(struct msm_gem_object *msm_obj)
 
 static inline bool is_purgeable(struct msm_gem_object *msm_obj)
 {
+	WARN_ON(!msm_gem_is_locked(&msm_obj->base));
 	WARN_ON(!mutex_is_locked(&msm_obj->base.dev->struct_mutex));
 	return (msm_obj->madv == MSM_MADV_DONTNEED) && msm_obj->sgt &&
 			!msm_obj->base.dma_buf && !msm_obj->base.import_attach;
@@ -195,27 +203,12 @@ static inline bool is_purgeable(struct msm_gem_object *msm_obj)
 
 static inline bool is_vunmapable(struct msm_gem_object *msm_obj)
 {
+	WARN_ON(!msm_gem_is_locked(&msm_obj->base));
 	return (msm_obj->vmap_count == 0) && msm_obj->vaddr;
 }
 
-/* The shrinker can be triggered while we hold objA->lock, and need
- * to grab objB->lock to purge it.  Lockdep just sees these as a single
- * class of lock, so we use subclasses to teach it the difference.
- *
- * OBJ_LOCK_NORMAL is implicit (ie. normal mutex_lock() call), and
- * OBJ_LOCK_SHRINKER is used by shrinker.
- *
- * It is *essential* that we never go down paths that could trigger the
- * shrinker for a purgable object.  This is ensured by checking that
- * msm_obj->madv == MSM_MADV_WILLNEED.
- */
-enum msm_gem_lock {
-	OBJ_LOCK_NORMAL,
-	OBJ_LOCK_SHRINKER,
-};
-
-void msm_gem_purge(struct drm_gem_object *obj, enum msm_gem_lock subclass);
-void msm_gem_vunmap(struct drm_gem_object *obj, enum msm_gem_lock subclass);
+void msm_gem_purge(struct drm_gem_object *obj);
+void msm_gem_vunmap(struct drm_gem_object *obj);
 void msm_gem_free_work(struct work_struct *work);
 
 /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc,
diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index 482576d7a39a..2dc0ffa925b4 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -52,8 +52,11 @@ msm_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 		return 0;
 
 	list_for_each_entry(msm_obj, &priv->inactive_list, mm_list) {
+		if (!msm_gem_trylock(&msm_obj->base))
+			continue;
 		if (is_purgeable(msm_obj))
 			count += msm_obj->base.size >> PAGE_SHIFT;
+		msm_gem_unlock(&msm_obj->base);
 	}
 
 	if (unlock)
@@ -78,10 +81,13 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 	list_for_each_entry(msm_obj, &priv->inactive_list, mm_list) {
 		if (freed >= sc->nr_to_scan)
 			break;
+		if (!msm_gem_trylock(&msm_obj->base))
+			continue;
 		if (is_purgeable(msm_obj)) {
-			msm_gem_purge(&msm_obj->base, OBJ_LOCK_SHRINKER);
+			msm_gem_purge(&msm_obj->base);
 			freed += msm_obj->base.size >> PAGE_SHIFT;
 		}
+		msm_gem_unlock(&msm_obj->base);
 	}
 
 	if (unlock)
@@ -107,15 +113,20 @@ msm_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr)
 		return NOTIFY_DONE;
 
 	list_for_each_entry(msm_obj, &priv->inactive_list, mm_list) {
+		if (!msm_gem_trylock(&msm_obj->base))
+			continue;
 		if (is_vunmapable(msm_obj)) {
-			msm_gem_vunmap(&msm_obj->base, OBJ_LOCK_SHRINKER);
-			/* since we don't know any better, lets bail after a few
-			 * and if necessary the shrinker will be invoked again.
-			 * Seems better than unmapping *everything*
-			 */
-			if (++unmapped >= 15)
-				break;
+			msm_gem_vunmap(&msm_obj->base);
+			unmapped++;
 		}
+		msm_gem_unlock(&msm_obj->base);
+
+		/* since we don't know any better, lets bail after a few
+		 * and if necessary the shrinker will be invoked again.
+		 * Seems better than unmapping *everything*
+		 */
+		if (++unmapped >= 15)
+			break;
 	}
 
 	if (unlock)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 06/22] drm/msm/submit: Move copy_from_user ahead of locking bos
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (4 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 05/22] drm/msm/gem: Move locking in shrinker path Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path Rob Clark
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

We cannot switch to using obj->resv for locking without first moving all
the copy_from_user() ahead of submit_lock_objects().  Otherwise in the
mm fault path we aquire mm->mmap_sem before obj lock, but in the submit
path the order is reversed.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_gem.h        |   3 +
 drivers/gpu/drm/msm/msm_gem_submit.c | 121 ++++++++++++++++-----------
 2 files changed, 76 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 947eeaca661d..744889436a98 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -238,7 +238,10 @@ struct msm_gem_submit {
 		uint32_t type;
 		uint32_t size;  /* in dwords */
 		uint64_t iova;
+		uint32_t offset;/* in dwords */
 		uint32_t idx;   /* cmdstream buffer idx in bos[] */
+		uint32_t nr_relocs;
+		struct drm_msm_gem_submit_reloc *relocs;
 	} *cmd;  /* array of size nr_cmds */
 	struct {
 		uint32_t flags;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index aa5c60a7132d..002130d826aa 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -62,11 +62,16 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
 
 void msm_gem_submit_free(struct msm_gem_submit *submit)
 {
+	unsigned i;
+
 	dma_fence_put(submit->fence);
 	list_del(&submit->node);
 	put_pid(submit->pid);
 	msm_submitqueue_put(submit->queue);
 
+	for (i = 0; i < submit->nr_cmds; i++)
+		kfree(submit->cmd[i].relocs);
+
 	kfree(submit);
 }
 
@@ -150,6 +155,60 @@ static int submit_lookup_objects(struct msm_gem_submit *submit,
 	return ret;
 }
 
+static int submit_lookup_cmds(struct msm_gem_submit *submit,
+		struct drm_msm_gem_submit *args, struct drm_file *file)
+{
+	unsigned i, sz;
+	int ret = 0;
+
+	for (i = 0; i < args->nr_cmds; i++) {
+		struct drm_msm_gem_submit_cmd submit_cmd;
+		void __user *userptr =
+			u64_to_user_ptr(args->cmds + (i * sizeof(submit_cmd)));
+
+		ret = copy_from_user(&submit_cmd, userptr, sizeof(submit_cmd));
+		if (ret) {
+			ret = -EFAULT;
+			goto out;
+		}
+
+		/* validate input from userspace: */
+		switch (submit_cmd.type) {
+		case MSM_SUBMIT_CMD_BUF:
+		case MSM_SUBMIT_CMD_IB_TARGET_BUF:
+		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
+			break;
+		default:
+			DRM_ERROR("invalid type: %08x\n", submit_cmd.type);
+			return -EINVAL;
+		}
+
+		if (submit_cmd.size % 4) {
+			DRM_ERROR("non-aligned cmdstream buffer size: %u\n",
+					submit_cmd.size);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		submit->cmd[i].type = submit_cmd.type;
+		submit->cmd[i].size = submit_cmd.size / 4;
+		submit->cmd[i].offset = submit_cmd.submit_offset / 4;
+		submit->cmd[i].idx  = submit_cmd.submit_idx;
+		submit->cmd[i].nr_relocs = submit_cmd.nr_relocs;
+
+		sz = sizeof(struct drm_msm_gem_submit_reloc) * submit_cmd.nr_relocs;
+		submit->cmd[i].relocs = kmalloc(sz, GFP_KERNEL);
+		ret = copy_from_user(submit->cmd[i].relocs, userptr, sz);
+		if (ret) {
+			ret = -EFAULT;
+			goto out;
+		}
+	}
+
+out:
+	return ret;
+}
+
 static void submit_unlock_unpin_bo(struct msm_gem_submit *submit,
 		int i, bool backoff)
 {
@@ -301,7 +360,7 @@ static int submit_bo(struct msm_gem_submit *submit, uint32_t idx,
 
 /* process the reloc's and patch up the cmdstream as needed: */
 static int submit_reloc(struct msm_gem_submit *submit, struct msm_gem_object *obj,
-		uint32_t offset, uint32_t nr_relocs, uint64_t relocs)
+		uint32_t offset, uint32_t nr_relocs, struct drm_msm_gem_submit_reloc *relocs)
 {
 	uint32_t i, last_offset = 0;
 	uint32_t *ptr;
@@ -327,18 +386,11 @@ static int submit_reloc(struct msm_gem_submit *submit, struct msm_gem_object *ob
 	}
 
 	for (i = 0; i < nr_relocs; i++) {
-		struct drm_msm_gem_submit_reloc submit_reloc;
-		void __user *userptr =
-			u64_to_user_ptr(relocs + (i * sizeof(submit_reloc)));
+		struct drm_msm_gem_submit_reloc submit_reloc = relocs[i];
 		uint32_t off;
 		uint64_t iova;
 		bool valid;
 
-		if (copy_from_user(&submit_reloc, userptr, sizeof(submit_reloc))) {
-			ret = -EFAULT;
-			goto out;
-		}
-
 		if (submit_reloc.submit_offset % 4) {
 			DRM_ERROR("non-aligned reloc offset: %u\n",
 					submit_reloc.submit_offset);
@@ -694,6 +746,10 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 	if (ret)
 		goto out;
 
+	ret = submit_lookup_cmds(submit, args, file);
+	if (ret)
+		goto out;
+
 	/* copy_*_user while holding a ww ticket upsets lockdep */
 	ww_acquire_init(&submit->ticket, &reservation_ww_class);
 	has_ww_ticket = true;
@@ -710,60 +766,29 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 		goto out;
 
 	for (i = 0; i < args->nr_cmds; i++) {
-		struct drm_msm_gem_submit_cmd submit_cmd;
-		void __user *userptr =
-			u64_to_user_ptr(args->cmds + (i * sizeof(submit_cmd)));
 		struct msm_gem_object *msm_obj;
 		uint64_t iova;
 
-		ret = copy_from_user(&submit_cmd, userptr, sizeof(submit_cmd));
-		if (ret) {
-			ret = -EFAULT;
-			goto out;
-		}
-
-		/* validate input from userspace: */
-		switch (submit_cmd.type) {
-		case MSM_SUBMIT_CMD_BUF:
-		case MSM_SUBMIT_CMD_IB_TARGET_BUF:
-		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-			break;
-		default:
-			DRM_ERROR("invalid type: %08x\n", submit_cmd.type);
-			ret = -EINVAL;
-			goto out;
-		}
-
-		ret = submit_bo(submit, submit_cmd.submit_idx,
+		ret = submit_bo(submit, submit->cmd[i].idx,
 				&msm_obj, &iova, NULL);
 		if (ret)
 			goto out;
 
-		if (submit_cmd.size % 4) {
-			DRM_ERROR("non-aligned cmdstream buffer size: %u\n",
-					submit_cmd.size);
+		if (!submit->cmd[i].size ||
+			((submit->cmd[i].size + submit->cmd[i].offset) >
+				msm_obj->base.size / 4)) {
+			DRM_ERROR("invalid cmdstream size: %u\n", submit->cmd[i].size * 4);
 			ret = -EINVAL;
 			goto out;
 		}
 
-		if (!submit_cmd.size ||
-			((submit_cmd.size + submit_cmd.submit_offset) >
-				msm_obj->base.size)) {
-			DRM_ERROR("invalid cmdstream size: %u\n", submit_cmd.size);
-			ret = -EINVAL;
-			goto out;
-		}
-
-		submit->cmd[i].type = submit_cmd.type;
-		submit->cmd[i].size = submit_cmd.size / 4;
-		submit->cmd[i].iova = iova + submit_cmd.submit_offset;
-		submit->cmd[i].idx  = submit_cmd.submit_idx;
+		submit->cmd[i].iova = iova + (submit->cmd[i].offset * 4);
 
 		if (submit->valid)
 			continue;
 
-		ret = submit_reloc(submit, msm_obj, submit_cmd.submit_offset,
-				submit_cmd.nr_relocs, submit_cmd.relocs);
+		ret = submit_reloc(submit, msm_obj, submit->cmd[i].offset * 4,
+				submit->cmd[i].nr_relocs, submit->cmd[i].relocs);
 		if (ret)
 			goto out;
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (5 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 06/22] drm/msm/submit: Move copy_from_user ahead of locking bos Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12 14:35   ` Daniel Vetter
  2020-10-12  2:09 ` [PATCH v2 08/22] drm/msm/gem: Switch over to obj->resv for locking Rob Clark
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

Unfortunately, due to an dev_pm_opp locking interaction with
mm->mmap_sem, we need to do pm get before aquiring obj locks,
otherwise we can have anger lockdep with the chain:

  opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex

For an explicit fencing userspace, the impact should be minimal
as we do all the fence waits before this point.  It could result
in some needless resumes in error cases, etc.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 002130d826aa..a9422d043bfe 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -744,11 +744,20 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 
 	ret = submit_lookup_objects(submit, args, file);
 	if (ret)
-		goto out;
+		goto out_pre_pm;
 
 	ret = submit_lookup_cmds(submit, args, file);
 	if (ret)
-		goto out;
+		goto out_pre_pm;
+
+	/*
+	 * Thanks to dev_pm_opp opp_table_lock interactions with mm->mmap_sem
+	 * in the resume path, we need to to rpm get before we lock objs.
+	 * Which unfortunately might involve powering up the GPU sooner than
+	 * is necessary.  But at least in the explicit fencing case, we will
+	 * have already done all the fence waiting.
+	 */
+	pm_runtime_get_sync(&gpu->pdev->dev);
 
 	/* copy_*_user while holding a ww ticket upsets lockdep */
 	ww_acquire_init(&submit->ticket, &reservation_ww_class);
@@ -825,6 +834,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 
 
 out:
+	pm_runtime_put(&gpu->pdev->dev);
+out_pre_pm:
 	submit_cleanup(submit);
 	if (has_ww_ticket)
 		ww_acquire_fini(&submit->ticket);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 08/22] drm/msm/gem: Switch over to obj->resv for locking
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (6 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 09/22] drm/msm: Use correct drm_gem_object_put() in fail case Rob Clark
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_gem.c        |  4 +---
 drivers/gpu/drm/msm/msm_gem.h        | 16 +++++-----------
 drivers/gpu/drm/msm/msm_gem_submit.c |  4 ++--
 drivers/gpu/drm/msm/msm_gpu.c        |  2 +-
 4 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index ff8ca257bdc6..210bf5c9c2dd 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -955,9 +955,9 @@ static void free_object(struct msm_gem_object *msm_obj)
 		put_pages(obj);
 	}
 
+	msm_gem_unlock(obj);
 	drm_gem_object_release(obj);
 
-	msm_gem_unlock(obj);
 	kfree(msm_obj);
 }
 
@@ -1029,8 +1029,6 @@ static int msm_gem_new_impl(struct drm_device *dev,
 	if (!msm_obj)
 		return -ENOMEM;
 
-	mutex_init(&msm_obj->lock);
-
 	msm_obj->flags = flags;
 	msm_obj->madv = MSM_MADV_WILLNEED;
 
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 744889436a98..ec01f35ce57b 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -85,7 +85,6 @@ struct msm_gem_object {
 	 * an IOMMU.  Also used for stolen/splashscreen buffer.
 	 */
 	struct drm_mm_node *vram_node;
-	struct mutex lock; /* Protects resources associated with bo */
 
 	char name[32]; /* Identifier to print for the debugfs files */
 
@@ -156,36 +155,31 @@ void msm_gem_describe_objects(struct list_head *list, struct seq_file *m);
 static inline void
 msm_gem_lock(struct drm_gem_object *obj)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
-	mutex_lock(&msm_obj->lock);
+	dma_resv_lock(obj->resv, NULL);
 }
 
 static inline bool __must_check
 msm_gem_trylock(struct drm_gem_object *obj)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
-	return mutex_trylock_recursive(&msm_obj->lock) == MUTEX_TRYLOCK_SUCCESS;
+	return dma_resv_trylock(obj->resv);
 }
 
 static inline int
 msm_gem_lock_interruptible(struct drm_gem_object *obj)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
-	return mutex_lock_interruptible(&msm_obj->lock);
+	return dma_resv_lock_interruptible(obj->resv, NULL);
 }
 
 static inline void
 msm_gem_unlock(struct drm_gem_object *obj)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
-	mutex_unlock(&msm_obj->lock);
+	dma_resv_unlock(obj->resv);
 }
 
 static inline bool
 msm_gem_is_locked(struct drm_gem_object *obj)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
-	return mutex_is_locked(&msm_obj->lock);
+	return dma_resv_is_locked(obj->resv);
 }
 
 static inline bool is_active(struct msm_gem_object *msm_obj)
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index a9422d043bfe..35b7d9d06850 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -215,7 +215,7 @@ static void submit_unlock_unpin_bo(struct msm_gem_submit *submit,
 	struct msm_gem_object *msm_obj = submit->bos[i].obj;
 
 	if (submit->bos[i].flags & BO_PINNED)
-		msm_gem_unpin_iova(&msm_obj->base, submit->aspace);
+		msm_gem_unpin_iova_locked(&msm_obj->base, submit->aspace);
 
 	if (submit->bos[i].flags & BO_LOCKED)
 		dma_resv_unlock(msm_obj->base.resv);
@@ -318,7 +318,7 @@ static int submit_pin_objects(struct msm_gem_submit *submit)
 		uint64_t iova;
 
 		/* if locking succeeded, pin bo: */
-		ret = msm_gem_get_and_pin_iova(&msm_obj->base,
+		ret = msm_gem_get_and_pin_iova_locked(&msm_obj->base,
 				submit->aspace, &iova);
 
 		if (ret)
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 55d16489d0f3..dbd9020713e5 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -784,7 +784,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 
 		/* submit takes a reference to the bo and iova until retired: */
 		drm_gem_object_get(&msm_obj->base);
-		msm_gem_get_and_pin_iova(&msm_obj->base, submit->aspace, &iova);
+		msm_gem_get_and_pin_iova_locked(&msm_obj->base, submit->aspace, &iova);
 
 		if (submit->bos[i].flags & MSM_SUBMIT_BO_WRITE)
 			dma_resv_add_excl_fence(drm_obj->resv, submit->fence);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 09/22] drm/msm: Use correct drm_gem_object_put() in fail case
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (7 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 08/22] drm/msm/gem: Switch over to obj->resv for locking Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 10/22] drm/msm: Drop chatty trace Rob Clark
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

We only want to use the _unlocked() variant in the unlocked case.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_gem.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 210bf5c9c2dd..833e3d3c6e8c 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -1120,7 +1120,11 @@ static struct drm_gem_object *_msm_gem_new(struct drm_device *dev,
 	return obj;
 
 fail:
-	drm_gem_object_put(obj);
+	if (struct_mutex_locked) {
+		drm_gem_object_put_locked(obj);
+	} else {
+		drm_gem_object_put(obj);
+	}
 	return ERR_PTR(ret);
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 10/22] drm/msm: Drop chatty trace
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (8 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 09/22] drm/msm: Use correct drm_gem_object_put() in fail case Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 11/22] drm/msm: Move update_fences() Rob Clark
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Jordan Crouse, Rob Clark, Sean Paul,
	David Airlie, open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

It is somewhat redundant with the gpu tracepoints, and anyways not too
useful to justify spamming the log when debug traces are enabled.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_gpu.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index dbd9020713e5..677b11c5a151 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -535,7 +535,6 @@ static void recover_worker(struct work_struct *work)
 
 static void hangcheck_timer_reset(struct msm_gpu *gpu)
 {
-	DBG("%s", gpu->name);
 	mod_timer(&gpu->hangcheck_timer,
 			round_jiffies_up(jiffies + DRM_MSM_HANGCHECK_JIFFIES));
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 11/22] drm/msm: Move update_fences()
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (9 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 10/22] drm/msm: Drop chatty trace Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 12/22] drm/msm: Add priv->mm_lock to protect active/inactive lists Rob Clark
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Jordan Crouse, Rob Clark, Sean Paul,
	David Airlie, open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

Small cleanup, update_fences() is used in the hangcheck path, but also
in the normal retire path.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_gpu.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 677b11c5a151..e5b7c8a77c99 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -265,6 +265,20 @@ int msm_gpu_hw_init(struct msm_gpu *gpu)
 	return ret;
 }
 
+static void update_fences(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
+		uint32_t fence)
+{
+	struct msm_gem_submit *submit;
+
+	list_for_each_entry(submit, &ring->submits, node) {
+		if (submit->seqno > fence)
+			break;
+
+		msm_update_fence(submit->ring->fctx,
+			submit->fence->seqno);
+	}
+}
+
 #ifdef CONFIG_DEV_COREDUMP
 static ssize_t msm_gpu_devcoredump_read(char *buffer, loff_t offset,
 		size_t count, void *data, size_t datalen)
@@ -411,20 +425,6 @@ static void msm_gpu_crashstate_capture(struct msm_gpu *gpu,
  * Hangcheck detection for locked gpu:
  */
 
-static void update_fences(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
-		uint32_t fence)
-{
-	struct msm_gem_submit *submit;
-
-	list_for_each_entry(submit, &ring->submits, node) {
-		if (submit->seqno > fence)
-			break;
-
-		msm_update_fence(submit->ring->fctx,
-			submit->fence->seqno);
-	}
-}
-
 static struct msm_gem_submit *
 find_submit(struct msm_ringbuffer *ring, uint32_t fence)
 {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 12/22] drm/msm: Add priv->mm_lock to protect active/inactive lists
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (10 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 11/22] drm/msm: Move update_fences() Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 13/22] drm/msm: Document and rename preempt_lock Rob Clark
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Jordan Crouse, Rob Clark, Sean Paul,
	David Airlie, open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

Rather than relying on the big dev->struct_mutex hammer, introduce a
more specific lock for protecting the bo lists.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_debugfs.c      |  7 +++++++
 drivers/gpu/drm/msm/msm_drv.c          |  7 +++++++
 drivers/gpu/drm/msm/msm_drv.h          | 13 +++++++++++-
 drivers/gpu/drm/msm/msm_gem.c          | 28 +++++++++++++++-----------
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 12 +++++++++++
 drivers/gpu/drm/msm/msm_gpu.h          |  5 ++++-
 6 files changed, 58 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_debugfs.c b/drivers/gpu/drm/msm/msm_debugfs.c
index ee2e270f464c..64afbed89821 100644
--- a/drivers/gpu/drm/msm/msm_debugfs.c
+++ b/drivers/gpu/drm/msm/msm_debugfs.c
@@ -112,6 +112,11 @@ static int msm_gem_show(struct drm_device *dev, struct seq_file *m)
 {
 	struct msm_drm_private *priv = dev->dev_private;
 	struct msm_gpu *gpu = priv->gpu;
+	int ret;
+
+	ret = mutex_lock_interruptible(&priv->mm_lock);
+	if (ret)
+		return ret;
 
 	if (gpu) {
 		seq_printf(m, "Active Objects (%s):\n", gpu->name);
@@ -121,6 +126,8 @@ static int msm_gem_show(struct drm_device *dev, struct seq_file *m)
 	seq_printf(m, "Inactive Objects:\n");
 	msm_gem_describe_objects(&priv->inactive_list, m);
 
+	mutex_unlock(&priv->mm_lock);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 49685571dc0e..81cb2cecc829 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -7,6 +7,7 @@
 
 #include <linux/dma-mapping.h>
 #include <linux/kthread.h>
+#include <linux/sched/mm.h>
 #include <linux/uaccess.h>
 #include <uapi/linux/sched/types.h>
 
@@ -441,6 +442,12 @@ static int msm_drm_init(struct device *dev, struct drm_driver *drv)
 	init_llist_head(&priv->free_list);
 
 	INIT_LIST_HEAD(&priv->inactive_list);
+	mutex_init(&priv->mm_lock);
+
+	/* Teach lockdep about lock ordering wrt. shrinker: */
+	fs_reclaim_acquire(GFP_KERNEL);
+	might_lock(&priv->mm_lock);
+	fs_reclaim_release(GFP_KERNEL);
 
 	drm_mode_config_init(ddev);
 
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 79ee7d05b363..a17dadd38685 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -174,8 +174,19 @@ struct msm_drm_private {
 	struct msm_rd_state *hangrd;   /* debugfs to dump hanging submits */
 	struct msm_perf_state *perf;
 
-	/* list of GEM objects: */
+	/*
+	 * List of inactive GEM objects.  Every bo is either in the inactive_list
+	 * or gpu->active_list (for the gpu it is active on[1])
+	 *
+	 * These lists are protected by mm_lock.  If struct_mutex is involved, it
+	 * should be aquired prior to mm_lock.  One should *not* hold mm_lock in
+	 * get_pages()/vmap()/etc paths, as they can trigger the shrinker.
+	 *
+	 * [1] if someone ever added support for the old 2d cores, there could be
+	 *     more than one gpu object
+	 */
 	struct list_head inactive_list;
+	struct mutex mm_lock;
 
 	/* worker for delayed free of objects: */
 	struct work_struct free_work;
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 833e3d3c6e8c..15f81ed2e154 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -753,13 +753,17 @@ int msm_gem_sync_object(struct drm_gem_object *obj,
 void msm_gem_active_get(struct drm_gem_object *obj, struct msm_gpu *gpu)
 {
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
-	WARN_ON(!mutex_is_locked(&obj->dev->struct_mutex));
+	struct msm_drm_private *priv = obj->dev->dev_private;
+
+	might_sleep();
 	WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED);
 
 	if (!atomic_fetch_inc(&msm_obj->active_count)) {
+		mutex_lock(&priv->mm_lock);
 		msm_obj->gpu = gpu;
 		list_del_init(&msm_obj->mm_list);
 		list_add_tail(&msm_obj->mm_list, &gpu->active_list);
+		mutex_unlock(&priv->mm_lock);
 	}
 }
 
@@ -768,12 +772,14 @@ void msm_gem_active_put(struct drm_gem_object *obj)
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	struct msm_drm_private *priv = obj->dev->dev_private;
 
-	WARN_ON(!mutex_is_locked(&obj->dev->struct_mutex));
+	might_sleep();
 
 	if (!atomic_dec_return(&msm_obj->active_count)) {
+		mutex_lock(&priv->mm_lock);
 		msm_obj->gpu = NULL;
 		list_del_init(&msm_obj->mm_list);
 		list_add_tail(&msm_obj->mm_list, &priv->inactive_list);
+		mutex_unlock(&priv->mm_lock);
 	}
 }
 
@@ -928,13 +934,16 @@ static void free_object(struct msm_gem_object *msm_obj)
 {
 	struct drm_gem_object *obj = &msm_obj->base;
 	struct drm_device *dev = obj->dev;
+	struct msm_drm_private *priv = dev->dev_private;
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
 
 	/* object should not be on active list: */
 	WARN_ON(is_active(msm_obj));
 
+	mutex_lock(&priv->mm_lock);
 	list_del(&msm_obj->mm_list);
+	mutex_unlock(&priv->mm_lock);
 
 	msm_gem_lock(obj);
 
@@ -1108,14 +1117,9 @@ static struct drm_gem_object *_msm_gem_new(struct drm_device *dev,
 		mapping_set_gfp_mask(obj->filp->f_mapping, GFP_HIGHUSER);
 	}
 
-	if (struct_mutex_locked) {
-		WARN_ON(!mutex_is_locked(&dev->struct_mutex));
-		list_add_tail(&msm_obj->mm_list, &priv->inactive_list);
-	} else {
-		mutex_lock(&dev->struct_mutex);
-		list_add_tail(&msm_obj->mm_list, &priv->inactive_list);
-		mutex_unlock(&dev->struct_mutex);
-	}
+	mutex_lock(&priv->mm_lock);
+	list_add_tail(&msm_obj->mm_list, &priv->inactive_list);
+	mutex_unlock(&priv->mm_lock);
 
 	return obj;
 
@@ -1183,9 +1187,9 @@ struct drm_gem_object *msm_gem_import(struct drm_device *dev,
 
 	msm_gem_unlock(obj);
 
-	mutex_lock(&dev->struct_mutex);
+	mutex_lock(&priv->mm_lock);
 	list_add_tail(&msm_obj->mm_list, &priv->inactive_list);
-	mutex_unlock(&dev->struct_mutex);
+	mutex_unlock(&priv->mm_lock);
 
 	return obj;
 
diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index 2dc0ffa925b4..6be073b8ca08 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -51,6 +51,8 @@ msm_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 	if (!msm_gem_shrinker_lock(dev, &unlock))
 		return 0;
 
+	mutex_lock(&priv->mm_lock);
+
 	list_for_each_entry(msm_obj, &priv->inactive_list, mm_list) {
 		if (!msm_gem_trylock(&msm_obj->base))
 			continue;
@@ -59,6 +61,8 @@ msm_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 		msm_gem_unlock(&msm_obj->base);
 	}
 
+	mutex_unlock(&priv->mm_lock);
+
 	if (unlock)
 		mutex_unlock(&dev->struct_mutex);
 
@@ -78,6 +82,8 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 	if (!msm_gem_shrinker_lock(dev, &unlock))
 		return SHRINK_STOP;
 
+	mutex_lock(&priv->mm_lock);
+
 	list_for_each_entry(msm_obj, &priv->inactive_list, mm_list) {
 		if (freed >= sc->nr_to_scan)
 			break;
@@ -90,6 +96,8 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 		msm_gem_unlock(&msm_obj->base);
 	}
 
+	mutex_unlock(&priv->mm_lock);
+
 	if (unlock)
 		mutex_unlock(&dev->struct_mutex);
 
@@ -112,6 +120,8 @@ msm_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr)
 	if (!msm_gem_shrinker_lock(dev, &unlock))
 		return NOTIFY_DONE;
 
+	mutex_lock(&priv->mm_lock);
+
 	list_for_each_entry(msm_obj, &priv->inactive_list, mm_list) {
 		if (!msm_gem_trylock(&msm_obj->base))
 			continue;
@@ -129,6 +139,8 @@ msm_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr)
 			break;
 	}
 
+	mutex_unlock(&priv->mm_lock);
+
 	if (unlock)
 		mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 6c9e1fdc1a76..1806e87600c0 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -94,7 +94,10 @@ struct msm_gpu {
 	struct msm_ringbuffer *rb[MSM_GPU_MAX_RINGS];
 	int nr_rings;
 
-	/* list of GEM active objects: */
+	/*
+	 * List of GEM active objects on this gpu.  Protected by
+	 * msm_drm_private::mm_lock
+	 */
 	struct list_head active_list;
 
 	/* does gpu need hw_init? */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 13/22] drm/msm: Document and rename preempt_lock
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (11 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 12/22] drm/msm: Add priv->mm_lock to protect active/inactive lists Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 14/22] drm/msm: Protect ring->submits with it's own lock Rob Clark
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Jordan Crouse, Rob Clark, Sean Paul,
	David Airlie, Eric Anholt, AngeloGioacchino Del Regno,
	Emil Velikov, Jonathan Marek, Sharat Masetty,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

Before adding another lock, give ring->lock a more descriptive name.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     |  4 ++--
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 12 ++++++------
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c     |  4 ++--
 drivers/gpu/drm/msm/msm_ringbuffer.c      |  2 +-
 drivers/gpu/drm/msm/msm_ringbuffer.h      |  7 ++++++-
 5 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index c941c8138f25..543437a2186e 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -36,7 +36,7 @@ void a5xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
 		OUT_RING(ring, upper_32_bits(shadowptr(a5xx_gpu, ring)));
 	}
 
-	spin_lock_irqsave(&ring->lock, flags);
+	spin_lock_irqsave(&ring->preempt_lock, flags);
 
 	/* Copy the shadow to the actual register */
 	ring->cur = ring->next;
@@ -44,7 +44,7 @@ void a5xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
 	/* Make sure to wrap wptr if we need to */
 	wptr = get_wptr(ring);
 
-	spin_unlock_irqrestore(&ring->lock, flags);
+	spin_unlock_irqrestore(&ring->preempt_lock, flags);
 
 	/* Make sure everything is posted before making a decision */
 	mb();
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
index 7e04509c4e1f..183de1139eeb 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
@@ -45,9 +45,9 @@ static inline void update_wptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 	if (!ring)
 		return;
 
-	spin_lock_irqsave(&ring->lock, flags);
+	spin_lock_irqsave(&ring->preempt_lock, flags);
 	wptr = get_wptr(ring);
-	spin_unlock_irqrestore(&ring->lock, flags);
+	spin_unlock_irqrestore(&ring->preempt_lock, flags);
 
 	gpu_write(gpu, REG_A5XX_CP_RB_WPTR, wptr);
 }
@@ -62,9 +62,9 @@ static struct msm_ringbuffer *get_next_ring(struct msm_gpu *gpu)
 		bool empty;
 		struct msm_ringbuffer *ring = gpu->rb[i];
 
-		spin_lock_irqsave(&ring->lock, flags);
+		spin_lock_irqsave(&ring->preempt_lock, flags);
 		empty = (get_wptr(ring) == ring->memptrs->rptr);
-		spin_unlock_irqrestore(&ring->lock, flags);
+		spin_unlock_irqrestore(&ring->preempt_lock, flags);
 
 		if (!empty)
 			return ring;
@@ -132,9 +132,9 @@ void a5xx_preempt_trigger(struct msm_gpu *gpu)
 	}
 
 	/* Make sure the wptr doesn't update while we're in motion */
-	spin_lock_irqsave(&ring->lock, flags);
+	spin_lock_irqsave(&ring->preempt_lock, flags);
 	a5xx_gpu->preempt[ring->id]->wptr = get_wptr(ring);
-	spin_unlock_irqrestore(&ring->lock, flags);
+	spin_unlock_irqrestore(&ring->preempt_lock, flags);
 
 	/* Set the address of the incoming preemption record */
 	gpu_write64(gpu, REG_A5XX_CP_CONTEXT_SWITCH_RESTORE_ADDR_LO,
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 8915882e4444..fc85f008d69d 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -65,7 +65,7 @@ static void a6xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 		OUT_RING(ring, upper_32_bits(shadowptr(a6xx_gpu, ring)));
 	}
 
-	spin_lock_irqsave(&ring->lock, flags);
+	spin_lock_irqsave(&ring->preempt_lock, flags);
 
 	/* Copy the shadow to the actual register */
 	ring->cur = ring->next;
@@ -73,7 +73,7 @@ static void a6xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 	/* Make sure to wrap wptr if we need to */
 	wptr = get_wptr(ring);
 
-	spin_unlock_irqrestore(&ring->lock, flags);
+	spin_unlock_irqrestore(&ring->preempt_lock, flags);
 
 	/* Make sure everything is posted before making a decision */
 	mb();
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 935bf9b1d941..1b6958e908dc 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -46,7 +46,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
 	ring->memptrs_iova = memptrs_iova;
 
 	INIT_LIST_HEAD(&ring->submits);
-	spin_lock_init(&ring->lock);
+	spin_lock_init(&ring->preempt_lock);
 
 	snprintf(name, sizeof(name), "gpu-ring-%d", ring->id);
 
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 0987d6bf848c..4956d1bc5d0e 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -46,7 +46,12 @@ struct msm_ringbuffer {
 	struct msm_rbmemptrs *memptrs;
 	uint64_t memptrs_iova;
 	struct msm_fence_context *fctx;
-	spinlock_t lock;
+
+	/*
+	 * preempt_lock protects preemption and serializes wptr updates against
+	 * preemption.  Can be aquired from irq context.
+	 */
+	spinlock_t preempt_lock;
 };
 
 struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 14/22] drm/msm: Protect ring->submits with it's own lock
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (12 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 13/22] drm/msm: Document and rename preempt_lock Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 15/22] drm/msm: Refcount submits Rob Clark
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Jordan Crouse, Rob Clark, Sean Paul,
	David Airlie, open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

One less place to rely on dev->struct_mutex.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_gem_submit.c |  2 ++
 drivers/gpu/drm/msm/msm_gpu.c        | 37 ++++++++++++++++++++++------
 drivers/gpu/drm/msm/msm_ringbuffer.c |  1 +
 drivers/gpu/drm/msm/msm_ringbuffer.h |  6 +++++
 4 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 35b7d9d06850..a91c1b99db97 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -65,7 +65,9 @@ void msm_gem_submit_free(struct msm_gem_submit *submit)
 	unsigned i;
 
 	dma_fence_put(submit->fence);
+	spin_lock(&submit->ring->submit_lock);
 	list_del(&submit->node);
+	spin_unlock(&submit->ring->submit_lock);
 	put_pid(submit->pid);
 	msm_submitqueue_put(submit->queue);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index e5b7c8a77c99..bb904e467b24 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -270,6 +270,7 @@ static void update_fences(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
 {
 	struct msm_gem_submit *submit;
 
+	spin_lock(&ring->submit_lock);
 	list_for_each_entry(submit, &ring->submits, node) {
 		if (submit->seqno > fence)
 			break;
@@ -277,6 +278,7 @@ static void update_fences(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
 		msm_update_fence(submit->ring->fctx,
 			submit->fence->seqno);
 	}
+	spin_unlock(&ring->submit_lock);
 }
 
 #ifdef CONFIG_DEV_COREDUMP
@@ -430,11 +432,14 @@ find_submit(struct msm_ringbuffer *ring, uint32_t fence)
 {
 	struct msm_gem_submit *submit;
 
-	WARN_ON(!mutex_is_locked(&ring->gpu->dev->struct_mutex));
-
-	list_for_each_entry(submit, &ring->submits, node)
-		if (submit->seqno == fence)
+	spin_lock(&ring->submit_lock);
+	list_for_each_entry(submit, &ring->submits, node) {
+		if (submit->seqno == fence) {
+			spin_unlock(&ring->submit_lock);
 			return submit;
+		}
+	}
+	spin_unlock(&ring->submit_lock);
 
 	return NULL;
 }
@@ -523,8 +528,10 @@ static void recover_worker(struct work_struct *work)
 		for (i = 0; i < gpu->nr_rings; i++) {
 			struct msm_ringbuffer *ring = gpu->rb[i];
 
+			spin_lock(&ring->submit_lock);
 			list_for_each_entry(submit, &ring->submits, node)
 				gpu->funcs->submit(gpu, submit);
+			spin_unlock(&ring->submit_lock);
 		}
 	}
 
@@ -711,7 +718,6 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
 static void retire_submits(struct msm_gpu *gpu)
 {
 	struct drm_device *dev = gpu->dev;
-	struct msm_gem_submit *submit, *tmp;
 	int i;
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
@@ -720,9 +726,24 @@ static void retire_submits(struct msm_gpu *gpu)
 	for (i = 0; i < gpu->nr_rings; i++) {
 		struct msm_ringbuffer *ring = gpu->rb[i];
 
-		list_for_each_entry_safe(submit, tmp, &ring->submits, node) {
-			if (dma_fence_is_signaled(submit->fence))
+		while (true) {
+			struct msm_gem_submit *submit = NULL;
+
+			spin_lock(&ring->submit_lock);
+			submit = list_first_entry_or_null(&ring->submits,
+					struct msm_gem_submit, node);
+			spin_unlock(&ring->submit_lock);
+
+			/*
+			 * If no submit, we are done.  If submit->fence hasn't
+			 * been signalled, then later submits are not signalled
+			 * either, so we are also done.
+			 */
+			if (submit && dma_fence_is_signaled(submit->fence)) {
 				retire_submit(gpu, ring, submit);
+			} else {
+				break;
+			}
 		}
 	}
 }
@@ -765,7 +786,9 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 
 	submit->seqno = ++ring->seqno;
 
+	spin_lock(&ring->submit_lock);
 	list_add_tail(&submit->node, &ring->submits);
+	spin_unlock(&ring->submit_lock);
 
 	msm_rd_dump_submit(priv->rd, submit, NULL);
 
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 1b6958e908dc..4d2a2a4abef8 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -46,6 +46,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id,
 	ring->memptrs_iova = memptrs_iova;
 
 	INIT_LIST_HEAD(&ring->submits);
+	spin_lock_init(&ring->submit_lock);
 	spin_lock_init(&ring->preempt_lock);
 
 	snprintf(name, sizeof(name), "gpu-ring-%d", ring->id);
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 4956d1bc5d0e..fe55d4a1aa16 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -39,7 +39,13 @@ struct msm_ringbuffer {
 	int id;
 	struct drm_gem_object *bo;
 	uint32_t *start, *end, *cur, *next;
+
+	/*
+	 * List of in-flight submits on this ring.  Protected by submit_lock.
+	 */
 	struct list_head submits;
+	spinlock_t submit_lock;
+
 	uint64_t iova;
 	uint32_t seqno;
 	uint32_t hangcheck_fence;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 15/22] drm/msm: Refcount submits
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (13 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 14/22] drm/msm: Protect ring->submits with it's own lock Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 16/22] drm/msm: Remove obj->gpu Rob Clark
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Jordan Crouse, Rob Clark, Sean Paul,
	David Airlie, open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

Before we remove dev->struct_mutex from the retire path, we have to deal
with the situation of a submit retiring before the submit ioctl returns.

To deal with this, ring->submits will hold a reference to the submit,
which is dropped when the submit is retired.  And the submit ioctl path
holds it's own ref, which it drops when it is done with the submit.

Also, add to submit list *after* getting/pinning bo's, to prevent badness
in case the completed fence is corrupted, and retire_worker mistakenly
believes the submit is done too early.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.h        |  1 -
 drivers/gpu/drm/msm/msm_gem.h        | 13 +++++++++++++
 drivers/gpu/drm/msm/msm_gem_submit.c | 11 +++++------
 drivers/gpu/drm/msm/msm_gpu.c        | 21 ++++++++++++++++-----
 4 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index a17dadd38685..2ef5cff19883 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -277,7 +277,6 @@ void msm_unregister_mmu(struct drm_device *dev, struct msm_mmu *mmu);
 
 bool msm_use_mmu(struct drm_device *dev);
 
-void msm_gem_submit_free(struct msm_gem_submit *submit);
 int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 		struct drm_file *file);
 
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index ec01f35ce57b..93ee73c620ed 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -211,6 +211,7 @@ void msm_gem_free_work(struct work_struct *work);
  * lasts for the duration of the submit-ioctl.
  */
 struct msm_gem_submit {
+	struct kref ref;
 	struct drm_device *dev;
 	struct msm_gpu *gpu;
 	struct msm_gem_address_space *aspace;
@@ -247,6 +248,18 @@ struct msm_gem_submit {
 	} bos[];
 };
 
+void __msm_gem_submit_destroy(struct kref *kref);
+
+static inline void msm_gem_submit_get(struct msm_gem_submit *submit)
+{
+	kref_get(&submit->ref);
+}
+
+static inline void msm_gem_submit_put(struct msm_gem_submit *submit)
+{
+	kref_put(&submit->ref, __msm_gem_submit_destroy);
+}
+
 /* helper to determine of a buffer in submit should be dumped, used for both
  * devcoredump and debugfs cmdstream dumping:
  */
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index a91c1b99db97..3151a0ca8904 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -42,6 +42,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
 	if (!submit)
 		return NULL;
 
+	kref_init(&submit->ref);
 	submit->dev = dev;
 	submit->aspace = queue->ctx->aspace;
 	submit->gpu = gpu;
@@ -60,14 +61,13 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
 	return submit;
 }
 
-void msm_gem_submit_free(struct msm_gem_submit *submit)
+void __msm_gem_submit_destroy(struct kref *kref)
 {
+	struct msm_gem_submit *submit =
+			container_of(kref, struct msm_gem_submit, ref);
 	unsigned i;
 
 	dma_fence_put(submit->fence);
-	spin_lock(&submit->ring->submit_lock);
-	list_del(&submit->node);
-	spin_unlock(&submit->ring->submit_lock);
 	put_pid(submit->pid);
 	msm_submitqueue_put(submit->queue);
 
@@ -841,8 +841,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 	submit_cleanup(submit);
 	if (has_ww_ticket)
 		ww_acquire_fini(&submit->ticket);
-	if (ret)
-		msm_gem_submit_free(submit);
+	msm_gem_submit_put(submit);
 out_unlock:
 	if (ret && (out_fence_fd >= 0))
 		put_unused_fd(out_fence_fd);
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index bb904e467b24..18a7948ac437 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -712,7 +712,12 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
 
 	pm_runtime_mark_last_busy(&gpu->pdev->dev);
 	pm_runtime_put_autosuspend(&gpu->pdev->dev);
-	msm_gem_submit_free(submit);
+
+	spin_lock(&ring->submit_lock);
+	list_del(&submit->node);
+	spin_unlock(&ring->submit_lock);
+
+	msm_gem_submit_put(submit);
 }
 
 static void retire_submits(struct msm_gpu *gpu)
@@ -786,10 +791,6 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 
 	submit->seqno = ++ring->seqno;
 
-	spin_lock(&ring->submit_lock);
-	list_add_tail(&submit->node, &ring->submits);
-	spin_unlock(&ring->submit_lock);
-
 	msm_rd_dump_submit(priv->rd, submit, NULL);
 
 	update_sw_cntrs(gpu);
@@ -816,6 +817,16 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 		msm_gem_active_get(drm_obj, gpu);
 	}
 
+	/*
+	 * ring->submits holds a ref to the submit, to deal with the case
+	 * that a submit completes before msm_ioctl_gem_submit() returns.
+	 */
+	msm_gem_submit_get(submit);
+
+	spin_lock(&ring->submit_lock);
+	list_add_tail(&submit->node, &ring->submits);
+	spin_unlock(&ring->submit_lock);
+
 	gpu->funcs->submit(gpu, submit);
 	priv->lastctx = submit->queue->ctx;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 16/22] drm/msm: Remove obj->gpu
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (14 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 15/22] drm/msm: Refcount submits Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 17/22] drm/msm: Drop struct_mutex from the retire path Rob Clark
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

It cannot be atomically updated with obj->active_count, and the only
purpose is a useless WARN_ON() (which becomes a buggy WARN_ON() once
retire_submits() is not serialized with incoming submits via
struct_mutex)

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_gem.c | 2 --
 drivers/gpu/drm/msm/msm_gem.h | 1 -
 drivers/gpu/drm/msm/msm_gpu.c | 5 -----
 3 files changed, 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 15f81ed2e154..cdbbdd848fe3 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -760,7 +760,6 @@ void msm_gem_active_get(struct drm_gem_object *obj, struct msm_gpu *gpu)
 
 	if (!atomic_fetch_inc(&msm_obj->active_count)) {
 		mutex_lock(&priv->mm_lock);
-		msm_obj->gpu = gpu;
 		list_del_init(&msm_obj->mm_list);
 		list_add_tail(&msm_obj->mm_list, &gpu->active_list);
 		mutex_unlock(&priv->mm_lock);
@@ -776,7 +775,6 @@ void msm_gem_active_put(struct drm_gem_object *obj)
 
 	if (!atomic_dec_return(&msm_obj->active_count)) {
 		mutex_lock(&priv->mm_lock);
-		msm_obj->gpu = NULL;
 		list_del_init(&msm_obj->mm_list);
 		list_add_tail(&msm_obj->mm_list, &priv->inactive_list);
 		mutex_unlock(&priv->mm_lock);
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 93ee73c620ed..bf5f9e94d0d3 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -64,7 +64,6 @@ struct msm_gem_object {
 	 *
 	 */
 	struct list_head mm_list;
-	struct msm_gpu *gpu;     /* non-null if active */
 
 	/* Transiently in the process of submit ioctl, objects associated
 	 * with the submit are on submit->bo_list.. this only lasts for
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 18a7948ac437..8278a4df331a 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -800,11 +800,6 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 		struct drm_gem_object *drm_obj = &msm_obj->base;
 		uint64_t iova;
 
-		/* can't happen yet.. but when we add 2d support we'll have
-		 * to deal w/ cross-ring synchronization:
-		 */
-		WARN_ON(is_active(msm_obj) && (msm_obj->gpu != gpu));
-
 		/* submit takes a reference to the bo and iova until retired: */
 		drm_gem_object_get(&msm_obj->base);
 		msm_gem_get_and_pin_iova_locked(&msm_obj->base, submit->aspace, &iova);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 17/22] drm/msm: Drop struct_mutex from the retire path
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (15 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 16/22] drm/msm: Remove obj->gpu Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 18/22] drm/msm: Drop struct_mutex in free_object() path Rob Clark
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Jordan Crouse, Rob Clark, Sean Paul,
	David Airlie, open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

Now that we are not relying on dev->struct_mutex to protect the
ring->submits lists, drop the struct_mutex lock.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_gpu.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 8278a4df331a..a754e84b8b5d 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -707,7 +707,7 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
 
 		msm_gem_active_put(&msm_obj->base);
 		msm_gem_unpin_iova(&msm_obj->base, submit->aspace);
-		drm_gem_object_put_locked(&msm_obj->base);
+		drm_gem_object_put(&msm_obj->base);
 	}
 
 	pm_runtime_mark_last_busy(&gpu->pdev->dev);
@@ -722,11 +722,8 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
 
 static void retire_submits(struct msm_gpu *gpu)
 {
-	struct drm_device *dev = gpu->dev;
 	int i;
 
-	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
-
 	/* Retire the commits starting with highest priority */
 	for (i = 0; i < gpu->nr_rings; i++) {
 		struct msm_ringbuffer *ring = gpu->rb[i];
@@ -756,15 +753,12 @@ static void retire_submits(struct msm_gpu *gpu)
 static void retire_worker(struct work_struct *work)
 {
 	struct msm_gpu *gpu = container_of(work, struct msm_gpu, retire_work);
-	struct drm_device *dev = gpu->dev;
 	int i;
 
 	for (i = 0; i < gpu->nr_rings; i++)
 		update_fences(gpu, gpu->rb[i], gpu->rb[i]->memptrs->fence);
 
-	mutex_lock(&dev->struct_mutex);
 	retire_submits(gpu);
-	mutex_unlock(&dev->struct_mutex);
 }
 
 /* call from irq handler to schedule work to retire bo's */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 18/22] drm/msm: Drop struct_mutex in free_object() path
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (16 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 17/22] drm/msm: Drop struct_mutex from the retire path Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 19/22] drm/msm: remove msm_gem_free_work Rob Clark
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

Now that active_list/inactive_list is protected by mm_lock, we no longer
need dev->struct_mutex in the free_object() path.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_gem.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index cdbbdd848fe3..9ead1bf223e9 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -934,8 +934,6 @@ static void free_object(struct msm_gem_object *msm_obj)
 	struct drm_device *dev = obj->dev;
 	struct msm_drm_private *priv = dev->dev_private;
 
-	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
-
 	/* object should not be on active list: */
 	WARN_ON(is_active(msm_obj));
 
@@ -972,20 +970,14 @@ void msm_gem_free_work(struct work_struct *work)
 {
 	struct msm_drm_private *priv =
 		container_of(work, struct msm_drm_private, free_work);
-	struct drm_device *dev = priv->dev;
 	struct llist_node *freed;
 	struct msm_gem_object *msm_obj, *next;
 
 	while ((freed = llist_del_all(&priv->free_list))) {
-
-		mutex_lock(&dev->struct_mutex);
-
 		llist_for_each_entry_safe(msm_obj, next,
 					  freed, freed)
 			free_object(msm_obj);
 
-		mutex_unlock(&dev->struct_mutex);
-
 		if (need_resched())
 			break;
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 19/22] drm/msm: remove msm_gem_free_work
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (17 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 18/22] drm/msm: Drop struct_mutex in free_object() path Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 20/22] drm/msm: drop struct_mutex in madvise path Rob Clark
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

Now that we don't need struct_mutex in the free path, we can get rid of
the asynchronous free altogether.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_drv.c |  3 ---
 drivers/gpu/drm/msm/msm_drv.h |  5 -----
 drivers/gpu/drm/msm/msm_gem.c | 27 ---------------------------
 drivers/gpu/drm/msm/msm_gem.h |  1 -
 4 files changed, 36 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 81cb2cecc829..49e6daf30b42 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -438,9 +438,6 @@ static int msm_drm_init(struct device *dev, struct drm_driver *drv)
 
 	priv->wq = alloc_ordered_workqueue("msm", 0);
 
-	INIT_WORK(&priv->free_work, msm_gem_free_work);
-	init_llist_head(&priv->free_list);
-
 	INIT_LIST_HEAD(&priv->inactive_list);
 	mutex_init(&priv->mm_lock);
 
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 2ef5cff19883..af296712eae8 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -188,10 +188,6 @@ struct msm_drm_private {
 	struct list_head inactive_list;
 	struct mutex mm_lock;
 
-	/* worker for delayed free of objects: */
-	struct work_struct free_work;
-	struct llist_head free_list;
-
 	struct workqueue_struct *wq;
 
 	unsigned int num_planes;
@@ -291,7 +287,6 @@ struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev,
 		struct dma_buf_attachment *attach, struct sg_table *sg);
 int msm_gem_prime_pin(struct drm_gem_object *obj);
 void msm_gem_prime_unpin(struct drm_gem_object *obj);
-void msm_gem_free_work(struct work_struct *work);
 
 int msm_framebuffer_prepare(struct drm_framebuffer *fb,
 		struct msm_gem_address_space *aspace);
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 9ead1bf223e9..b60eaf6266e2 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -924,16 +924,6 @@ void msm_gem_free_object(struct drm_gem_object *obj)
 	struct drm_device *dev = obj->dev;
 	struct msm_drm_private *priv = dev->dev_private;
 
-	if (llist_add(&msm_obj->freed, &priv->free_list))
-		queue_work(priv->wq, &priv->free_work);
-}
-
-static void free_object(struct msm_gem_object *msm_obj)
-{
-	struct drm_gem_object *obj = &msm_obj->base;
-	struct drm_device *dev = obj->dev;
-	struct msm_drm_private *priv = dev->dev_private;
-
 	/* object should not be on active list: */
 	WARN_ON(is_active(msm_obj));
 
@@ -966,23 +956,6 @@ static void free_object(struct msm_gem_object *msm_obj)
 	kfree(msm_obj);
 }
 
-void msm_gem_free_work(struct work_struct *work)
-{
-	struct msm_drm_private *priv =
-		container_of(work, struct msm_drm_private, free_work);
-	struct llist_node *freed;
-	struct msm_gem_object *msm_obj, *next;
-
-	while ((freed = llist_del_all(&priv->free_list))) {
-		llist_for_each_entry_safe(msm_obj, next,
-					  freed, freed)
-			free_object(msm_obj);
-
-		if (need_resched())
-			break;
-	}
-}
-
 /* convenience method to construct a GEM buffer object, and userspace handle */
 int msm_gem_new_handle(struct drm_device *dev, struct drm_file *file,
 		uint32_t size, uint32_t flags, uint32_t *handle,
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index bf5f9e94d0d3..c12fedf88e85 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -202,7 +202,6 @@ static inline bool is_vunmapable(struct msm_gem_object *msm_obj)
 
 void msm_gem_purge(struct drm_gem_object *obj);
 void msm_gem_vunmap(struct drm_gem_object *obj);
-void msm_gem_free_work(struct work_struct *work);
 
 /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc,
  * associated with the cmdstream submission for synchronization (and
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 20/22] drm/msm: drop struct_mutex in madvise path
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (18 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 19/22] drm/msm: remove msm_gem_free_work Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 21/22] drm/msm: Drop struct_mutex in shrinker path Rob Clark
  2020-10-12  2:09 ` [PATCH v2 22/22] drm/msm: Don't implicit-sync if only a single ring Rob Clark
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

The obj->lock is sufficient for what we need.

This *does* have the implication that userspace can try to shoot
themselves in the foot by racing madvise(DONTNEED) with submit.  But
the result will be about the same if they did madvise(DONTNEED) before
the submit ioctl, ie. they might not get want they want if they race
with shrinker.  But iova fault handling is robust enough, and userspace
is only shooting it's own foot.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_drv.c | 11 ++---------
 drivers/gpu/drm/msm/msm_gem.c |  4 +---
 drivers/gpu/drm/msm/msm_gem.h |  2 --
 3 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 49e6daf30b42..f2d58fe25497 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -912,14 +912,9 @@ static int msm_ioctl_gem_madvise(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		return ret;
-
 	obj = drm_gem_object_lookup(file, args->handle);
 	if (!obj) {
-		ret = -ENOENT;
-		goto unlock;
+		return -ENOENT;
 	}
 
 	ret = msm_gem_madvise(obj, args->madv);
@@ -928,10 +923,8 @@ static int msm_ioctl_gem_madvise(struct drm_device *dev, void *data,
 		ret = 0;
 	}
 
-	drm_gem_object_put_locked(obj);
+	drm_gem_object_put(obj);
 
-unlock:
-	mutex_unlock(&dev->struct_mutex);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index b60eaf6266e2..8852c05775dc 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -658,8 +658,6 @@ int msm_gem_madvise(struct drm_gem_object *obj, unsigned madv)
 
 	msm_gem_lock(obj);
 
-	WARN_ON(!mutex_is_locked(&obj->dev->struct_mutex));
-
 	if (msm_obj->madv != __MSM_MADV_PURGED)
 		msm_obj->madv = madv;
 
@@ -676,7 +674,6 @@ void msm_gem_purge(struct drm_gem_object *obj)
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
-	WARN_ON(!msm_gem_is_locked(obj));
 	WARN_ON(!is_purgeable(msm_obj));
 	WARN_ON(obj->import_attach);
 
@@ -756,6 +753,7 @@ void msm_gem_active_get(struct drm_gem_object *obj, struct msm_gpu *gpu)
 	struct msm_drm_private *priv = obj->dev->dev_private;
 
 	might_sleep();
+	WARN_ON(!msm_gem_is_locked(obj));
 	WARN_ON(msm_obj->madv != MSM_MADV_WILLNEED);
 
 	if (!atomic_fetch_inc(&msm_obj->active_count)) {
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index c12fedf88e85..1f8f5f3d08c0 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -188,8 +188,6 @@ static inline bool is_active(struct msm_gem_object *msm_obj)
 
 static inline bool is_purgeable(struct msm_gem_object *msm_obj)
 {
-	WARN_ON(!msm_gem_is_locked(&msm_obj->base));
-	WARN_ON(!mutex_is_locked(&msm_obj->base.dev->struct_mutex));
 	return (msm_obj->madv == MSM_MADV_DONTNEED) && msm_obj->sgt &&
 			!msm_obj->base.dma_buf && !msm_obj->base.import_attach;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 21/22] drm/msm: Drop struct_mutex in shrinker path
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (19 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 20/22] drm/msm: drop struct_mutex in madvise path Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12  2:09 ` [PATCH v2 22/22] drm/msm: Don't implicit-sync if only a single ring Rob Clark
  21 siblings, 0 replies; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

Now that the inactive_list is protected by mm_lock, and everything
else on per-obj basis is protected by obj->lock, we no longer depend
on struct_mutex.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_gem.c          |  1 -
 drivers/gpu/drm/msm/msm_gem_shrinker.c | 54 --------------------------
 2 files changed, 55 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 8852c05775dc..ca00c3ccd413 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -673,7 +673,6 @@ void msm_gem_purge(struct drm_gem_object *obj)
 	struct drm_device *dev = obj->dev;
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 
-	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
 	WARN_ON(!is_purgeable(msm_obj));
 	WARN_ON(obj->import_attach);
 
diff --git a/drivers/gpu/drm/msm/msm_gem_shrinker.c b/drivers/gpu/drm/msm/msm_gem_shrinker.c
index 6be073b8ca08..6f4b1355725f 100644
--- a/drivers/gpu/drm/msm/msm_gem_shrinker.c
+++ b/drivers/gpu/drm/msm/msm_gem_shrinker.c
@@ -8,48 +8,13 @@
 #include "msm_gem.h"
 #include "msm_gpu_trace.h"
 
-static bool msm_gem_shrinker_lock(struct drm_device *dev, bool *unlock)
-{
-	/* NOTE: we are *closer* to being able to get rid of
-	 * mutex_trylock_recursive().. the msm_gem code itself does
-	 * not need struct_mutex, although codepaths that can trigger
-	 * shrinker are still called in code-paths that hold the
-	 * struct_mutex.
-	 *
-	 * Also, msm_obj->madv is protected by struct_mutex.
-	 *
-	 * The next step is probably split out a seperate lock for
-	 * protecting inactive_list, so that shrinker does not need
-	 * struct_mutex.
-	 */
-	switch (mutex_trylock_recursive(&dev->struct_mutex)) {
-	case MUTEX_TRYLOCK_FAILED:
-		return false;
-
-	case MUTEX_TRYLOCK_SUCCESS:
-		*unlock = true;
-		return true;
-
-	case MUTEX_TRYLOCK_RECURSIVE:
-		*unlock = false;
-		return true;
-	}
-
-	BUG();
-}
-
 static unsigned long
 msm_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 {
 	struct msm_drm_private *priv =
 		container_of(shrinker, struct msm_drm_private, shrinker);
-	struct drm_device *dev = priv->dev;
 	struct msm_gem_object *msm_obj;
 	unsigned long count = 0;
-	bool unlock;
-
-	if (!msm_gem_shrinker_lock(dev, &unlock))
-		return 0;
 
 	mutex_lock(&priv->mm_lock);
 
@@ -63,9 +28,6 @@ msm_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 
 	mutex_unlock(&priv->mm_lock);
 
-	if (unlock)
-		mutex_unlock(&dev->struct_mutex);
-
 	return count;
 }
 
@@ -74,13 +36,8 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 {
 	struct msm_drm_private *priv =
 		container_of(shrinker, struct msm_drm_private, shrinker);
-	struct drm_device *dev = priv->dev;
 	struct msm_gem_object *msm_obj;
 	unsigned long freed = 0;
-	bool unlock;
-
-	if (!msm_gem_shrinker_lock(dev, &unlock))
-		return SHRINK_STOP;
 
 	mutex_lock(&priv->mm_lock);
 
@@ -98,9 +55,6 @@ msm_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 
 	mutex_unlock(&priv->mm_lock);
 
-	if (unlock)
-		mutex_unlock(&dev->struct_mutex);
-
 	if (freed > 0)
 		trace_msm_gem_purge(freed << PAGE_SHIFT);
 
@@ -112,13 +66,8 @@ msm_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr)
 {
 	struct msm_drm_private *priv =
 		container_of(nb, struct msm_drm_private, vmap_notifier);
-	struct drm_device *dev = priv->dev;
 	struct msm_gem_object *msm_obj;
 	unsigned unmapped = 0;
-	bool unlock;
-
-	if (!msm_gem_shrinker_lock(dev, &unlock))
-		return NOTIFY_DONE;
 
 	mutex_lock(&priv->mm_lock);
 
@@ -141,9 +90,6 @@ msm_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr)
 
 	mutex_unlock(&priv->mm_lock);
 
-	if (unlock)
-		mutex_unlock(&dev->struct_mutex);
-
 	*(unsigned long *)ptr += unmapped;
 
 	if (unmapped > 0)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 22/22] drm/msm: Don't implicit-sync if only a single ring
  2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
                   ` (20 preceding siblings ...)
  2020-10-12  2:09 ` [PATCH v2 21/22] drm/msm: Drop struct_mutex in shrinker path Rob Clark
@ 2020-10-12  2:09 ` Rob Clark
  2020-10-12 14:40   ` Daniel Vetter
  21 siblings, 1 reply; 50+ messages in thread
From: Rob Clark @ 2020-10-12  2:09 UTC (permalink / raw)
  To: dri-devel
  Cc: Daniel Vetter, Rob Clark, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

From: Rob Clark <robdclark@chromium.org>

Any cross-device sync use-cases *must* use explicit sync.  And if there
is only a single ring (no-preemption), everything is FIFO order and
there is no need to implicit-sync.

Mesa should probably just always use MSM_SUBMIT_NO_IMPLICIT, as behavior
is undefined when fences are not used to synchronize buffer usage across
contexts (which is the only case where multiple different priority rings
could come into play).

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 3151a0ca8904..c69803ea53c8 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -277,7 +277,7 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
 	return ret;
 }
 
-static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
+static int submit_fence_sync(struct msm_gem_submit *submit, bool implicit_sync)
 {
 	int i, ret = 0;
 
@@ -297,7 +297,7 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
 				return ret;
 		}
 
-		if (no_implicit)
+		if (!implicit_sync)
 			continue;
 
 		ret = msm_gem_sync_object(&msm_obj->base, submit->ring->fctx,
@@ -768,7 +768,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 	if (ret)
 		goto out;
 
-	ret = submit_fence_sync(submit, !!(args->flags & MSM_SUBMIT_NO_IMPLICIT));
+	ret = submit_fence_sync(submit, (gpu->nr_rings > 1) &&
+			!(args->flags & MSM_SUBMIT_NO_IMPLICIT));
 	if (ret)
 		goto out;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-10-12  2:09 ` [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path Rob Clark
@ 2020-10-12 14:35   ` Daniel Vetter
  2020-10-12 15:43     ` Rob Clark
  0 siblings, 1 reply; 50+ messages in thread
From: Daniel Vetter @ 2020-10-12 14:35 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, Daniel Vetter, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

On Sun, Oct 11, 2020 at 07:09:34PM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Unfortunately, due to an dev_pm_opp locking interaction with
> mm->mmap_sem, we need to do pm get before aquiring obj locks,
> otherwise we can have anger lockdep with the chain:

tbh this sounds like a bug in that subsystem, since it means we cannot use
said subsystem in mmap handlers either.

So if you have some remapping unit or need to wake up your gpu to blt the
buffer into system memory first, we're toast. That doesn't sound right. So
maybe Cc: pm folks and figure out how to fix this long term properly? Imo
not a good reason to hold up this patch set, since unwrangling mmap_sem
tends to be work ...
-Daniel

> 
>   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> 
> For an explicit fencing userspace, the impact should be minimal
> as we do all the fence waits before this point.  It could result
> in some needless resumes in error cases, etc.
> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>  drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> index 002130d826aa..a9422d043bfe 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -744,11 +744,20 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
>  
>  	ret = submit_lookup_objects(submit, args, file);
>  	if (ret)
> -		goto out;
> +		goto out_pre_pm;
>  
>  	ret = submit_lookup_cmds(submit, args, file);
>  	if (ret)
> -		goto out;
> +		goto out_pre_pm;
> +
> +	/*
> +	 * Thanks to dev_pm_opp opp_table_lock interactions with mm->mmap_sem
> +	 * in the resume path, we need to to rpm get before we lock objs.
> +	 * Which unfortunately might involve powering up the GPU sooner than
> +	 * is necessary.  But at least in the explicit fencing case, we will
> +	 * have already done all the fence waiting.
> +	 */
> +	pm_runtime_get_sync(&gpu->pdev->dev);
>  
>  	/* copy_*_user while holding a ww ticket upsets lockdep */
>  	ww_acquire_init(&submit->ticket, &reservation_ww_class);
> @@ -825,6 +834,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
>  
>  
>  out:
> +	pm_runtime_put(&gpu->pdev->dev);
> +out_pre_pm:
>  	submit_cleanup(submit);
>  	if (has_ww_ticket)
>  		ww_acquire_fini(&submit->ticket);
> -- 
> 2.26.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 22/22] drm/msm: Don't implicit-sync if only a single ring
  2020-10-12  2:09 ` [PATCH v2 22/22] drm/msm: Don't implicit-sync if only a single ring Rob Clark
@ 2020-10-12 14:40   ` Daniel Vetter
  2020-10-12 15:07     ` Rob Clark
  0 siblings, 1 reply; 50+ messages in thread
From: Daniel Vetter @ 2020-10-12 14:40 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, Daniel Vetter, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

On Sun, Oct 11, 2020 at 07:09:49PM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Any cross-device sync use-cases *must* use explicit sync.  And if there
> is only a single ring (no-preemption), everything is FIFO order and
> there is no need to implicit-sync.
> 
> Mesa should probably just always use MSM_SUBMIT_NO_IMPLICIT, as behavior
> is undefined when fences are not used to synchronize buffer usage across
> contexts (which is the only case where multiple different priority rings
> could come into play).

Uh does this mean msm is broken on dri2/3 and wayland? Or I'm I just
confused by your commit message?

Since for these protocols we do expect implicit sync accross processes to
work. Even across devices (and nvidia have actually provided quite a bunch
of patches to make this work in i915 - ttm based drivers get this right,
plus dumb scanout drivers using the right helpers also get this all
right).
-Daniel

> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>  drivers/gpu/drm/msm/msm_gem_submit.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> index 3151a0ca8904..c69803ea53c8 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -277,7 +277,7 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
>  	return ret;
>  }
>  
> -static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
> +static int submit_fence_sync(struct msm_gem_submit *submit, bool implicit_sync)
>  {
>  	int i, ret = 0;
>  
> @@ -297,7 +297,7 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
>  				return ret;
>  		}
>  
> -		if (no_implicit)
> +		if (!implicit_sync)
>  			continue;
>  
>  		ret = msm_gem_sync_object(&msm_obj->base, submit->ring->fctx,
> @@ -768,7 +768,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
>  	if (ret)
>  		goto out;
>  
> -	ret = submit_fence_sync(submit, !!(args->flags & MSM_SUBMIT_NO_IMPLICIT));
> +	ret = submit_fence_sync(submit, (gpu->nr_rings > 1) &&
> +			!(args->flags & MSM_SUBMIT_NO_IMPLICIT));
>  	if (ret)
>  		goto out;
>  
> -- 
> 2.26.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 22/22] drm/msm: Don't implicit-sync if only a single ring
  2020-10-12 14:40   ` Daniel Vetter
@ 2020-10-12 15:07     ` Rob Clark
  2020-10-13 11:08       ` Daniel Vetter
  0 siblings, 1 reply; 50+ messages in thread
From: Rob Clark @ 2020-10-12 15:07 UTC (permalink / raw)
  To: Rob Clark, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list
  Cc: Daniel Vetter

On Mon, Oct 12, 2020 at 7:40 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Sun, Oct 11, 2020 at 07:09:49PM -0700, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Any cross-device sync use-cases *must* use explicit sync.  And if there
> > is only a single ring (no-preemption), everything is FIFO order and
> > there is no need to implicit-sync.
> >
> > Mesa should probably just always use MSM_SUBMIT_NO_IMPLICIT, as behavior
> > is undefined when fences are not used to synchronize buffer usage across
> > contexts (which is the only case where multiple different priority rings
> > could come into play).
>
> Uh does this mean msm is broken on dri2/3 and wayland? Or I'm I just
> confused by your commit message?

No, I don't think so.  If there is only a single priority level
ringbuffer (ie. no preemption to higher priority ring) then everything
is inherently FIFO order.

For cases where we are sharing buffers with something external to drm,
explicit sync will be used.  And we don't implicit sync with display,
otherwise x11 (frontbuffer rendering) would not work

BR,
-R

> Since for these protocols we do expect implicit sync accross processes to
> work. Even across devices (and nvidia have actually provided quite a bunch
> of patches to make this work in i915 - ttm based drivers get this right,
> plus dumb scanout drivers using the right helpers also get this all
> right).
> -Daniel
>
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >  drivers/gpu/drm/msm/msm_gem_submit.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > index 3151a0ca8904..c69803ea53c8 100644
> > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > @@ -277,7 +277,7 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
> >       return ret;
> >  }
> >
> > -static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
> > +static int submit_fence_sync(struct msm_gem_submit *submit, bool implicit_sync)
> >  {
> >       int i, ret = 0;
> >
> > @@ -297,7 +297,7 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
> >                               return ret;
> >               }
> >
> > -             if (no_implicit)
> > +             if (!implicit_sync)
> >                       continue;
> >
> >               ret = msm_gem_sync_object(&msm_obj->base, submit->ring->fctx,
> > @@ -768,7 +768,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> >       if (ret)
> >               goto out;
> >
> > -     ret = submit_fence_sync(submit, !!(args->flags & MSM_SUBMIT_NO_IMPLICIT));
> > +     ret = submit_fence_sync(submit, (gpu->nr_rings > 1) &&
> > +                     !(args->flags & MSM_SUBMIT_NO_IMPLICIT));
> >       if (ret)
> >               goto out;
> >
> > --
> > 2.26.2
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-10-12 14:35   ` Daniel Vetter
@ 2020-10-12 15:43     ` Rob Clark
  2020-10-20  9:07       ` Viresh Kumar
  0 siblings, 1 reply; 50+ messages in thread
From: Rob Clark @ 2020-10-12 15:43 UTC (permalink / raw)
  To: Rob Clark, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list
  Cc: Daniel Vetter, Menon, Nishanth, Viresh Kumar

On Mon, Oct 12, 2020 at 7:35 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Sun, Oct 11, 2020 at 07:09:34PM -0700, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Unfortunately, due to an dev_pm_opp locking interaction with
> > mm->mmap_sem, we need to do pm get before aquiring obj locks,
> > otherwise we can have anger lockdep with the chain:
>
> tbh this sounds like a bug in that subsystem, since it means we cannot use
> said subsystem in mmap handlers either.
>
> So if you have some remapping unit or need to wake up your gpu to blt the
> buffer into system memory first, we're toast. That doesn't sound right. So
> maybe Cc: pm folks and figure out how to fix this long term properly? Imo
> not a good reason to hold up this patch set, since unwrangling mmap_sem
> tends to be work ...

+ a couple of PM folks

Looks like it has been this way for quite some time, so I guess the
overlap between things using dev_pm_opp and mmap is low..

fwiw, example splat so folks can see the locking interaction I am
talking about.. I suspect the pm_opp interaction with mm->mmap_sem is
from the debugfs calls while opp_table_lock is held?

[   15.627855] ======================================================
[   15.634202] WARNING: possible circular locking dependency detected
[   15.640550] 5.4.70 #41 Not tainted
[   15.644050] ------------------------------------------------------
[   15.650397] chrome/1805 is trying to acquire lock:
[   15.655314] ffffffed90720738 (opp_table_lock){+.+.}, at:
_find_opp_table+0x34/0x74
[   15.663092]
[   15.663092] but task is already holding lock:
[   15.669082] ffffff80ff3911a8 (reservation_ww_class_mutex){+.+.},
at: submit_lock_objects+0x70/0x1ec
[   15.678369]
[   15.678369] which lock already depends on the new lock.
[   15.678369]
[   15.686764]
[   15.686764] the existing dependency chain (in reverse order) is:
[   15.694438]
[   15.694438] -> #3 (reservation_ww_class_mutex){+.+.}:
[   15.701146]        __mutex_lock_common+0xec/0xc0c
[   15.705978]        ww_mutex_lock_interruptible+0x5c/0xc4
[   15.711432]        msm_gem_fault+0x2c/0x124
[   15.715731]        __do_fault+0x40/0x16c
[   15.719766]        handle_mm_fault+0x7cc/0xd98
[   15.724337]        do_page_fault+0x230/0x3b4
[   15.728721]        do_translation_fault+0x5c/0x78
[   15.733558]        do_mem_abort+0x4c/0xb4
[   15.737680]        el0_da+0x1c/0x20
[   15.741266]
[   15.741266] -> #2 (&mm->mmap_sem){++++}:
[   15.746809]        __might_fault+0x70/0x98
[   15.751022]        compat_filldir+0xf8/0x48c
[   15.755412]        dcache_readdir+0x70/0x1dc
[   15.759808]        iterate_dir+0xd4/0x180
[   15.763931]        __arm64_compat_sys_getdents+0xa0/0x19c
[   15.769476]        el0_svc_common+0xa8/0x178
[   15.773861]        el0_svc_compat_handler+0x2c/0x40
[   15.778868]        el0_svc_compat+0x8/0x10
[   15.783075]
[   15.783075] -> #1 (&sb->s_type->i_mutex_key#3){++++}:
[   15.789788]        down_write+0x54/0x16c
[   15.793826]        debugfs_remove_recursive+0x50/0x158
[   15.799108]        opp_debug_unregister+0x34/0x114
[   15.804028]        dev_pm_opp_put_opp_table+0xd0/0x14c
[   15.809308]        dev_pm_opp_put_clkname+0x3c/0x50
[   15.814318]        msm_dsi_host_destroy+0xb0/0xcc
[   15.819149]        dsi_destroy+0x40/0x58
[   15.823184]        dsi_bind+0x90/0x170
[   15.827041]        component_bind_all+0xf0/0x208
[   15.831787]        msm_drm_init+0x188/0x60c
[   15.836084]        msm_drm_bind+0x24/0x30
[   15.840205]        try_to_bring_up_master+0x15c/0x1a4
[   15.845396]        __component_add+0x98/0x14c
[   15.849878]        component_add+0x28/0x34
[   15.854086]        dp_display_probe+0x324/0x370
[   15.858744]        platform_drv_probe+0x90/0xb0
[   15.863400]        really_probe+0x134/0x2ec
[   15.867699]        driver_probe_device+0x64/0xfc
[   15.872443]        __device_attach_driver+0x8c/0xa4
[   15.877459]        bus_for_each_drv+0x90/0xd8
[   15.881939]        __device_attach+0xc0/0x148
[   15.886420]        device_initial_probe+0x20/0x2c
[   15.891254]        bus_probe_device+0x34/0x94
[   15.895726]        deferred_probe_work_func+0x78/0xb4
[   15.900914]        process_one_work+0x30c/0x5d0
[   15.905573]        worker_thread+0x240/0x3f0
[   15.909959]        kthread+0x144/0x154
[   15.913809]        ret_from_fork+0x10/0x18
[   15.918016]
[   15.918016] -> #0 (opp_table_lock){+.+.}:
[   15.923660]        __lock_acquire+0xee4/0x2450
[   15.928230]        lock_acquire+0x1cc/0x210
[   15.932527]        __mutex_lock_common+0xec/0xc0c
[   15.937359]        mutex_lock_nested+0x40/0x50
[   15.941928]        _find_opp_table+0x34/0x74
[   15.946312]        dev_pm_opp_find_freq_exact+0x2c/0xdc
[   15.951680]        a6xx_gmu_resume+0xc8/0xecc
[   15.952812] fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-ce"
[   15.956161]        a6xx_pm_resume+0x148/0x200
[   15.956166]        adreno_resume+0x28/0x34
[   15.956171]        pm_generic_runtime_resume+0x34/0x48
[   15.956174]        __rpm_callback+0x70/0x10c
[   15.956176]        rpm_callback+0x34/0x8c
[   15.956179]        rpm_resume+0x414/0x550
[   15.956182]        __pm_runtime_resume+0x7c/0xa0
[   15.956185]        msm_gpu_submit+0x60/0x1c0
[   15.956190]        msm_ioctl_gem_submit+0xadc/0xb60
[   16.003961]        drm_ioctl_kernel+0x9c/0x118
[   16.008532]        drm_ioctl+0x27c/0x408
[   16.012562]        drm_compat_ioctl+0xcc/0xdc
[   16.017038]        __se_compat_sys_ioctl+0x100/0x206c
[   16.022224]        __arm64_compat_sys_ioctl+0x20/0x2c
[   16.027412]        el0_svc_common+0xa8/0x178
[   16.031800]        el0_svc_compat_handler+0x2c/0x40
[   16.036810]        el0_svc_compat+0x8/0x10
[   16.041021]
[   16.041021] other info that might help us debug this:
[   16.041021]
[   16.049235] Chain exists of:
[   16.049235]   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
[   16.049235]
[   16.061014]  Possible unsafe locking scenario:
[   16.061014]
[   16.067091]        CPU0                    CPU1
[   16.071750]        ----                    ----
[   16.076399]   lock(reservation_ww_class_mutex);
[   16.081059]                                lock(&mm->mmap_sem);
[   16.087134]                                lock(reservation_ww_class_mutex);
[   16.094369]   lock(opp_table_lock);
[   16.097961]
[   16.097961]  *** DEADLOCK ***
[   16.097961]
[   16.104038] 3 locks held by chrome/1805:
[   16.108068]  #0: ffffff80fb20c0d8 (&dev->struct_mutex){+.+.}, at:
msm_ioctl_gem_submit+0x264/0xb60
[   16.117264]  #1: ffffff80dd712c70
(reservation_ww_class_acquire){+.+.}, at:
msm_ioctl_gem_submit+0x8e8/0xb60
[   16.127357]  #2: ffffff80ff3911a8
(reservation_ww_class_mutex){+.+.}, at: submit_lock_objects+0x70/0x1ec
[   16.137089]
[   16.137089] stack backtrace:
[   16.141567] CPU: 4 PID: 1805 Comm: chrome Not tainted 5.4.70 #41
[   16.147733] Hardware name: Google Lazor (rev1+) with LTE (DT)
[   16.153632] Call trace:
[   16.156154]  dump_backtrace+0x0/0x158
[   16.159924]  show_stack+0x20/0x2c
[   16.163340]  dump_stack+0xc8/0x160
[   16.166840]  print_circular_bug+0x2c4/0x2c8
[   16.171144]  check_noncircular+0x1a8/0x1b0
[   16.175351]  __lock_acquire+0xee4/0x2450
[   16.179382]  lock_acquire+0x1cc/0x210
[   16.183146]  __mutex_lock_common+0xec/0xc0c
[   16.187450]  mutex_lock_nested+0x40/0x50
[   16.191481]  _find_opp_table+0x34/0x74
[   16.195344]  dev_pm_opp_find_freq_exact+0x2c/0xdc
[   16.200178]  a6xx_gmu_resume+0xc8/0xecc
[   16.204120]  a6xx_pm_resume+0x148/0x200
[   16.208064]  adreno_resume+0x28/0x34
[   16.211743]  pm_generic_runtime_resume+0x34/0x48
[   16.216488]  __rpm_callback+0x70/0x10c
[   16.220342]  rpm_callback+0x34/0x8c
[   16.223933]  rpm_resume+0x414/0x550
[   16.227524]  __pm_runtime_resume+0x7c/0xa0
[   16.231731]  msm_gpu_submit+0x60/0x1c0
[   16.235586]  msm_ioctl_gem_submit+0xadc/0xb60
[   16.240066]  drm_ioctl_kernel+0x9c/0x118
[   16.244097]  drm_ioctl+0x27c/0x408
[   16.247602]  drm_compat_ioctl+0xcc/0xdc
[   16.251546]  __se_compat_sys_ioctl+0x100/0x206c
[   16.256204]  __arm64_compat_sys_ioctl+0x20/0x2c
[   16.260861]  el0_svc_common+0xa8/0x178
[   16.264716]  el0_svc_compat_handler+0x2c/0x40
[   16.269196]  el0_svc_compat+0x8/0x10

BR,
-R

> -Daniel
>
> >
> >   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> >
> > For an explicit fencing userspace, the impact should be minimal
> > as we do all the fence waits before this point.  It could result
> > in some needless resumes in error cases, etc.
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >  drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > index 002130d826aa..a9422d043bfe 100644
> > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > @@ -744,11 +744,20 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> >
> >       ret = submit_lookup_objects(submit, args, file);
> >       if (ret)
> > -             goto out;
> > +             goto out_pre_pm;
> >
> >       ret = submit_lookup_cmds(submit, args, file);
> >       if (ret)
> > -             goto out;
> > +             goto out_pre_pm;
> > +
> > +     /*
> > +      * Thanks to dev_pm_opp opp_table_lock interactions with mm->mmap_sem
> > +      * in the resume path, we need to to rpm get before we lock objs.
> > +      * Which unfortunately might involve powering up the GPU sooner than
> > +      * is necessary.  But at least in the explicit fencing case, we will
> > +      * have already done all the fence waiting.
> > +      */
> > +     pm_runtime_get_sync(&gpu->pdev->dev);
> >
> >       /* copy_*_user while holding a ww ticket upsets lockdep */
> >       ww_acquire_init(&submit->ticket, &reservation_ww_class);
> > @@ -825,6 +834,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> >
> >
> >  out:
> > +     pm_runtime_put(&gpu->pdev->dev);
> > +out_pre_pm:
> >       submit_cleanup(submit);
> >       if (has_ww_ticket)
> >               ww_acquire_fini(&submit->ticket);
> > --
> > 2.26.2
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 22/22] drm/msm: Don't implicit-sync if only a single ring
  2020-10-12 15:07     ` Rob Clark
@ 2020-10-13 11:08       ` Daniel Vetter
  2020-10-13 16:15         ` [Freedreno] " Rob Clark
  0 siblings, 1 reply; 50+ messages in thread
From: Daniel Vetter @ 2020-10-13 11:08 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list,
	Daniel Vetter

On Mon, Oct 12, 2020 at 08:07:38AM -0700, Rob Clark wrote:
> On Mon, Oct 12, 2020 at 7:40 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Sun, Oct 11, 2020 at 07:09:49PM -0700, Rob Clark wrote:
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > Any cross-device sync use-cases *must* use explicit sync.  And if there
> > > is only a single ring (no-preemption), everything is FIFO order and
> > > there is no need to implicit-sync.
> > >
> > > Mesa should probably just always use MSM_SUBMIT_NO_IMPLICIT, as behavior
> > > is undefined when fences are not used to synchronize buffer usage across
> > > contexts (which is the only case where multiple different priority rings
> > > could come into play).
> >
> > Uh does this mean msm is broken on dri2/3 and wayland? Or I'm I just
> > confused by your commit message?
> 
> No, I don't think so.  If there is only a single priority level
> ringbuffer (ie. no preemption to higher priority ring) then everything
> is inherently FIFO order.

Well eventually you get a scheduler I guess/hope :-)

> For cases where we are sharing buffers with something external to drm,
> explicit sync will be used.  And we don't implicit sync with display,
> otherwise x11 (frontbuffer rendering) would not work

Uh now I'm even more confused. The implicit sync fences in dma_resv are
kinda for everyone. That's also why dma_resv with the common locking
approach is a useful idea.

So display should definitely support implicit sync, and iirc msm does have
the helper hooked up.

Wrt other subsystems, I guess passing dma_fence around somehow doesn't fit
into v4l (the patches never landed), so v4l doesn't do any kind of sync
right now. But this could be fixed. Not sure what else is going on.

So I guess I still have no idea why you put that into the commit message.

btw for what you're trying to do yourself, the way to do this is to
allocate a fence timeline for your engine, compare fences, and no-op them
all out if their own the same timeline.
-Daniel

> 
> BR,
> -R
> 
> > Since for these protocols we do expect implicit sync accross processes to
> > work. Even across devices (and nvidia have actually provided quite a bunch
> > of patches to make this work in i915 - ttm based drivers get this right,
> > plus dumb scanout drivers using the right helpers also get this all
> > right).
> > -Daniel
> >
> > >
> > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > ---
> > >  drivers/gpu/drm/msm/msm_gem_submit.c | 7 ++++---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > index 3151a0ca8904..c69803ea53c8 100644
> > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > @@ -277,7 +277,7 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
> > >       return ret;
> > >  }
> > >
> > > -static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
> > > +static int submit_fence_sync(struct msm_gem_submit *submit, bool implicit_sync)
> > >  {
> > >       int i, ret = 0;
> > >
> > > @@ -297,7 +297,7 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
> > >                               return ret;
> > >               }
> > >
> > > -             if (no_implicit)
> > > +             if (!implicit_sync)
> > >                       continue;
> > >
> > >               ret = msm_gem_sync_object(&msm_obj->base, submit->ring->fctx,
> > > @@ -768,7 +768,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > >       if (ret)
> > >               goto out;
> > >
> > > -     ret = submit_fence_sync(submit, !!(args->flags & MSM_SUBMIT_NO_IMPLICIT));
> > > +     ret = submit_fence_sync(submit, (gpu->nr_rings > 1) &&
> > > +                     !(args->flags & MSM_SUBMIT_NO_IMPLICIT));
> > >       if (ret)
> > >               goto out;
> > >
> > > --
> > > 2.26.2
> > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Freedreno] [PATCH v2 22/22] drm/msm: Don't implicit-sync if only a single ring
  2020-10-13 11:08       ` Daniel Vetter
@ 2020-10-13 16:15         ` Rob Clark
  2020-10-15  8:22           ` Daniel Vetter
  0 siblings, 1 reply; 50+ messages in thread
From: Rob Clark @ 2020-10-13 16:15 UTC (permalink / raw)
  To: Rob Clark, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list
  Cc: Daniel Vetter

On Tue, Oct 13, 2020 at 4:08 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Mon, Oct 12, 2020 at 08:07:38AM -0700, Rob Clark wrote:
> > On Mon, Oct 12, 2020 at 7:40 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Sun, Oct 11, 2020 at 07:09:49PM -0700, Rob Clark wrote:
> > > > From: Rob Clark <robdclark@chromium.org>
> > > >
> > > > Any cross-device sync use-cases *must* use explicit sync.  And if there
> > > > is only a single ring (no-preemption), everything is FIFO order and
> > > > there is no need to implicit-sync.
> > > >
> > > > Mesa should probably just always use MSM_SUBMIT_NO_IMPLICIT, as behavior
> > > > is undefined when fences are not used to synchronize buffer usage across
> > > > contexts (which is the only case where multiple different priority rings
> > > > could come into play).
> > >
> > > Uh does this mean msm is broken on dri2/3 and wayland? Or I'm I just
> > > confused by your commit message?
> >
> > No, I don't think so.  If there is only a single priority level
> > ringbuffer (ie. no preemption to higher priority ring) then everything
> > is inherently FIFO order.
>
> Well eventually you get a scheduler I guess/hope :-)

we do have one currently for some gens, but not others.. hence the
check for # of rings.  (Ie. there is a ring per priority level, if
only one ring, that means no preemption/scheduler)

> > For cases where we are sharing buffers with something external to drm,
> > explicit sync will be used.  And we don't implicit sync with display,
> > otherwise x11 (frontbuffer rendering) would not work
>
> Uh now I'm even more confused. The implicit sync fences in dma_resv are
> kinda for everyone. That's also why dma_resv with the common locking
> approach is a useful idea.
>
> So display should definitely support implicit sync, and iirc msm does have
> the helper hooked up.

yup

> Wrt other subsystems, I guess passing dma_fence around somehow doesn't fit
> into v4l (the patches never landed), so v4l doesn't do any kind of sync
> right now. But this could be fixed. Not sure what else is going on.
>
> So I guess I still have no idea why you put that into the commit message.
>
> btw for what you're trying to do yourself, the way to do this is to
> allocate a fence timeline for your engine, compare fences, and no-op them
> all out if their own the same timeline.

we do that already (with a fence timeline per-ring, in the case of
gens which support multiple rings / preemption).. this patch just
short-circuits that in the case where we already knows the fences will
of the same timeline

BR,
-R

> -Daniel
>
> >
> > BR,
> > -R
> >
> > > Since for these protocols we do expect implicit sync accross processes to
> > > work. Even across devices (and nvidia have actually provided quite a bunch
> > > of patches to make this work in i915 - ttm based drivers get this right,
> > > plus dumb scanout drivers using the right helpers also get this all
> > > right).
> > > -Daniel
> > >
> > > >
> > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > ---
> > > >  drivers/gpu/drm/msm/msm_gem_submit.c | 7 ++++---
> > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > index 3151a0ca8904..c69803ea53c8 100644
> > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > @@ -277,7 +277,7 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
> > > >       return ret;
> > > >  }
> > > >
> > > > -static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
> > > > +static int submit_fence_sync(struct msm_gem_submit *submit, bool implicit_sync)
> > > >  {
> > > >       int i, ret = 0;
> > > >
> > > > @@ -297,7 +297,7 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
> > > >                               return ret;
> > > >               }
> > > >
> > > > -             if (no_implicit)
> > > > +             if (!implicit_sync)
> > > >                       continue;
> > > >
> > > >               ret = msm_gem_sync_object(&msm_obj->base, submit->ring->fctx,
> > > > @@ -768,7 +768,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > > >       if (ret)
> > > >               goto out;
> > > >
> > > > -     ret = submit_fence_sync(submit, !!(args->flags & MSM_SUBMIT_NO_IMPLICIT));
> > > > +     ret = submit_fence_sync(submit, (gpu->nr_rings > 1) &&
> > > > +                     !(args->flags & MSM_SUBMIT_NO_IMPLICIT));
> > > >       if (ret)
> > > >               goto out;
> > > >
> > > > --
> > > > 2.26.2
> > > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> Freedreno mailing list
> Freedreno@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Freedreno] [PATCH v2 22/22] drm/msm: Don't implicit-sync if only a single ring
  2020-10-13 16:15         ` [Freedreno] " Rob Clark
@ 2020-10-15  8:22           ` Daniel Vetter
  0 siblings, 0 replies; 50+ messages in thread
From: Daniel Vetter @ 2020-10-15  8:22 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list

On Tue, Oct 13, 2020 at 6:15 PM Rob Clark <robdclark@gmail.com> wrote:
>
> On Tue, Oct 13, 2020 at 4:08 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Mon, Oct 12, 2020 at 08:07:38AM -0700, Rob Clark wrote:
> > > On Mon, Oct 12, 2020 at 7:40 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >
> > > > On Sun, Oct 11, 2020 at 07:09:49PM -0700, Rob Clark wrote:
> > > > > From: Rob Clark <robdclark@chromium.org>
> > > > >
> > > > > Any cross-device sync use-cases *must* use explicit sync.  And if there
> > > > > is only a single ring (no-preemption), everything is FIFO order and
> > > > > there is no need to implicit-sync.
> > > > >
> > > > > Mesa should probably just always use MSM_SUBMIT_NO_IMPLICIT, as behavior
> > > > > is undefined when fences are not used to synchronize buffer usage across
> > > > > contexts (which is the only case where multiple different priority rings
> > > > > could come into play).
> > > >
> > > > Uh does this mean msm is broken on dri2/3 and wayland? Or I'm I just
> > > > confused by your commit message?
> > >
> > > No, I don't think so.  If there is only a single priority level
> > > ringbuffer (ie. no preemption to higher priority ring) then everything
> > > is inherently FIFO order.
> >
> > Well eventually you get a scheduler I guess/hope :-)
>
> we do have one currently for some gens, but not others.. hence the
> check for # of rings.  (Ie. there is a ring per priority level, if
> only one ring, that means no preemption/scheduler)

Even without preempt a scheduler is somewhat useful, if you have a
very spammy client. Of course it assumes that everyone submits
reasonably short workloads, otherwise nothing you can do.

> > > For cases where we are sharing buffers with something external to drm,
> > > explicit sync will be used.  And we don't implicit sync with display,
> > > otherwise x11 (frontbuffer rendering) would not work
> >
> > Uh now I'm even more confused. The implicit sync fences in dma_resv are
> > kinda for everyone. That's also why dma_resv with the common locking
> > approach is a useful idea.
> >
> > So display should definitely support implicit sync, and iirc msm does have
> > the helper hooked up.
>
> yup
>
> > Wrt other subsystems, I guess passing dma_fence around somehow doesn't fit
> > into v4l (the patches never landed), so v4l doesn't do any kind of sync
> > right now. But this could be fixed. Not sure what else is going on.
> >
> > So I guess I still have no idea why you put that into the commit message.
> >
> > btw for what you're trying to do yourself, the way to do this is to
> > allocate a fence timeline for your engine, compare fences, and no-op them
> > all out if their own the same timeline.
>
> we do that already (with a fence timeline per-ring, in the case of
> gens which support multiple rings / preemption).. this patch just
> short-circuits that in the case where we already knows the fences will
> of the same timeline

Ok so I think it's all good, no misunderstanding, but the commit
message. I think if you delete the first sentence that cross-device
sync must use explicit fences then it all makes sense and is
consistent. Or clarify it that this is cross-engine sync with explicit
internal synchronization, to differentiate it against cross-device
sync (as seen by userspace, like different drm_device instances) and
explicit dma_fence synchronization controlled by userspace.
-Daniel

> BR,
> -R
>
> > -Daniel
> >
> > >
> > > BR,
> > > -R
> > >
> > > > Since for these protocols we do expect implicit sync accross processes to
> > > > work. Even across devices (and nvidia have actually provided quite a bunch
> > > > of patches to make this work in i915 - ttm based drivers get this right,
> > > > plus dumb scanout drivers using the right helpers also get this all
> > > > right).
> > > > -Daniel
> > > >
> > > > >
> > > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > > ---
> > > > >  drivers/gpu/drm/msm/msm_gem_submit.c | 7 ++++---
> > > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > > index 3151a0ca8904..c69803ea53c8 100644
> > > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > > @@ -277,7 +277,7 @@ static int submit_lock_objects(struct msm_gem_submit *submit)
> > > > >       return ret;
> > > > >  }
> > > > >
> > > > > -static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
> > > > > +static int submit_fence_sync(struct msm_gem_submit *submit, bool implicit_sync)
> > > > >  {
> > > > >       int i, ret = 0;
> > > > >
> > > > > @@ -297,7 +297,7 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
> > > > >                               return ret;
> > > > >               }
> > > > >
> > > > > -             if (no_implicit)
> > > > > +             if (!implicit_sync)
> > > > >                       continue;
> > > > >
> > > > >               ret = msm_gem_sync_object(&msm_obj->base, submit->ring->fctx,
> > > > > @@ -768,7 +768,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > > > >       if (ret)
> > > > >               goto out;
> > > > >
> > > > > -     ret = submit_fence_sync(submit, !!(args->flags & MSM_SUBMIT_NO_IMPLICIT));
> > > > > +     ret = submit_fence_sync(submit, (gpu->nr_rings > 1) &&
> > > > > +                     !(args->flags & MSM_SUBMIT_NO_IMPLICIT));
> > > > >       if (ret)
> > > > >               goto out;
> > > > >
> > > > > --
> > > > > 2.26.2
> > > > >
> > > >
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> > _______________________________________________
> > Freedreno mailing list
> > Freedreno@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/freedreno



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-10-12 15:43     ` Rob Clark
@ 2020-10-20  9:07       ` Viresh Kumar
  2020-10-20 10:56         ` Daniel Vetter
  0 siblings, 1 reply; 50+ messages in thread
From: Viresh Kumar @ 2020-10-20  9:07 UTC (permalink / raw)
  To: Rob Clark
  Cc: dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list,
	Daniel Vetter, Menon, Nishanth

On 12-10-20, 08:43, Rob Clark wrote:
> On Mon, Oct 12, 2020 at 7:35 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Sun, Oct 11, 2020 at 07:09:34PM -0700, Rob Clark wrote:
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > Unfortunately, due to an dev_pm_opp locking interaction with
> > > mm->mmap_sem, we need to do pm get before aquiring obj locks,
> > > otherwise we can have anger lockdep with the chain:
> >
> > tbh this sounds like a bug in that subsystem, since it means we cannot use
> > said subsystem in mmap handlers either.
> >
> > So if you have some remapping unit or need to wake up your gpu to blt the
> > buffer into system memory first, we're toast. That doesn't sound right. So
> > maybe Cc: pm folks and figure out how to fix this long term properly? Imo
> > not a good reason to hold up this patch set, since unwrangling mmap_sem
> > tends to be work ...
> 
> + a couple of PM folks
> 
> Looks like it has been this way for quite some time, so I guess the
> overlap between things using dev_pm_opp and mmap is low..
> 
> fwiw, example splat so folks can see the locking interaction I am
> talking about.. I suspect the pm_opp interaction with mm->mmap_sem is
> from the debugfs calls while opp_table_lock is held?

I am not very sure about why this circular locking dependency is
happening here and how exactly can we fix it. The OPP core works under
the opp_table_lock, from within which it creates/remove the debugfs
stuff as well.

> [   15.627855] ======================================================
> [   15.634202] WARNING: possible circular locking dependency detected
> [   15.640550] 5.4.70 #41 Not tainted
> [   15.644050] ------------------------------------------------------
> [   15.650397] chrome/1805 is trying to acquire lock:
> [   15.655314] ffffffed90720738 (opp_table_lock){+.+.}, at:
> _find_opp_table+0x34/0x74
> [   15.663092]
> [   15.663092] but task is already holding lock:
> [   15.669082] ffffff80ff3911a8 (reservation_ww_class_mutex){+.+.},
> at: submit_lock_objects+0x70/0x1ec
> [   15.678369]
> [   15.678369] which lock already depends on the new lock.
> [   15.678369]
> [   15.686764]
> [   15.686764] the existing dependency chain (in reverse order) is:
> [   15.694438]
> [   15.694438] -> #3 (reservation_ww_class_mutex){+.+.}:
> [   15.701146]        __mutex_lock_common+0xec/0xc0c
> [   15.705978]        ww_mutex_lock_interruptible+0x5c/0xc4
> [   15.711432]        msm_gem_fault+0x2c/0x124
> [   15.715731]        __do_fault+0x40/0x16c
> [   15.719766]        handle_mm_fault+0x7cc/0xd98
> [   15.724337]        do_page_fault+0x230/0x3b4
> [   15.728721]        do_translation_fault+0x5c/0x78
> [   15.733558]        do_mem_abort+0x4c/0xb4
> [   15.737680]        el0_da+0x1c/0x20
> [   15.741266]
> [   15.741266] -> #2 (&mm->mmap_sem){++++}:
> [   15.746809]        __might_fault+0x70/0x98
> [   15.751022]        compat_filldir+0xf8/0x48c
> [   15.755412]        dcache_readdir+0x70/0x1dc
> [   15.759808]        iterate_dir+0xd4/0x180
> [   15.763931]        __arm64_compat_sys_getdents+0xa0/0x19c
> [   15.769476]        el0_svc_common+0xa8/0x178
> [   15.773861]        el0_svc_compat_handler+0x2c/0x40
> [   15.778868]        el0_svc_compat+0x8/0x10
> [   15.783075]
> [   15.783075] -> #1 (&sb->s_type->i_mutex_key#3){++++}:
> [   15.789788]        down_write+0x54/0x16c
> [   15.793826]        debugfs_remove_recursive+0x50/0x158
> [   15.799108]        opp_debug_unregister+0x34/0x114
> [   15.804028]        dev_pm_opp_put_opp_table+0xd0/0x14c
> [   15.809308]        dev_pm_opp_put_clkname+0x3c/0x50
> [   15.814318]        msm_dsi_host_destroy+0xb0/0xcc
> [   15.819149]        dsi_destroy+0x40/0x58
> [   15.823184]        dsi_bind+0x90/0x170
> [   15.827041]        component_bind_all+0xf0/0x208
> [   15.831787]        msm_drm_init+0x188/0x60c
> [   15.836084]        msm_drm_bind+0x24/0x30
> [   15.840205]        try_to_bring_up_master+0x15c/0x1a4
> [   15.845396]        __component_add+0x98/0x14c
> [   15.849878]        component_add+0x28/0x34
> [   15.854086]        dp_display_probe+0x324/0x370
> [   15.858744]        platform_drv_probe+0x90/0xb0
> [   15.863400]        really_probe+0x134/0x2ec
> [   15.867699]        driver_probe_device+0x64/0xfc
> [   15.872443]        __device_attach_driver+0x8c/0xa4
> [   15.877459]        bus_for_each_drv+0x90/0xd8
> [   15.881939]        __device_attach+0xc0/0x148
> [   15.886420]        device_initial_probe+0x20/0x2c
> [   15.891254]        bus_probe_device+0x34/0x94
> [   15.895726]        deferred_probe_work_func+0x78/0xb4
> [   15.900914]        process_one_work+0x30c/0x5d0
> [   15.905573]        worker_thread+0x240/0x3f0
> [   15.909959]        kthread+0x144/0x154
> [   15.913809]        ret_from_fork+0x10/0x18
> [   15.918016]
> [   15.918016] -> #0 (opp_table_lock){+.+.}:
> [   15.923660]        __lock_acquire+0xee4/0x2450
> [   15.928230]        lock_acquire+0x1cc/0x210
> [   15.932527]        __mutex_lock_common+0xec/0xc0c
> [   15.937359]        mutex_lock_nested+0x40/0x50
> [   15.941928]        _find_opp_table+0x34/0x74
> [   15.946312]        dev_pm_opp_find_freq_exact+0x2c/0xdc
> [   15.951680]        a6xx_gmu_resume+0xc8/0xecc
> [   15.952812] fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-ce"
> [   15.956161]        a6xx_pm_resume+0x148/0x200
> [   15.956166]        adreno_resume+0x28/0x34
> [   15.956171]        pm_generic_runtime_resume+0x34/0x48
> [   15.956174]        __rpm_callback+0x70/0x10c
> [   15.956176]        rpm_callback+0x34/0x8c
> [   15.956179]        rpm_resume+0x414/0x550
> [   15.956182]        __pm_runtime_resume+0x7c/0xa0
> [   15.956185]        msm_gpu_submit+0x60/0x1c0
> [   15.956190]        msm_ioctl_gem_submit+0xadc/0xb60
> [   16.003961]        drm_ioctl_kernel+0x9c/0x118
> [   16.008532]        drm_ioctl+0x27c/0x408
> [   16.012562]        drm_compat_ioctl+0xcc/0xdc
> [   16.017038]        __se_compat_sys_ioctl+0x100/0x206c
> [   16.022224]        __arm64_compat_sys_ioctl+0x20/0x2c
> [   16.027412]        el0_svc_common+0xa8/0x178
> [   16.031800]        el0_svc_compat_handler+0x2c/0x40
> [   16.036810]        el0_svc_compat+0x8/0x10
> [   16.041021]
> [   16.041021] other info that might help us debug this:
> [   16.041021]
> [   16.049235] Chain exists of:
> [   16.049235]   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> [   16.049235]
> [   16.061014]  Possible unsafe locking scenario:
> [   16.061014]
> [   16.067091]        CPU0                    CPU1
> [   16.071750]        ----                    ----
> [   16.076399]   lock(reservation_ww_class_mutex);
> [   16.081059]                                lock(&mm->mmap_sem);
> [   16.087134]                                lock(reservation_ww_class_mutex);
> [   16.094369]   lock(opp_table_lock);
> [   16.097961]
> [   16.097961]  *** DEADLOCK ***
> [   16.097961]
> [   16.104038] 3 locks held by chrome/1805:
> [   16.108068]  #0: ffffff80fb20c0d8 (&dev->struct_mutex){+.+.}, at:
> msm_ioctl_gem_submit+0x264/0xb60
> [   16.117264]  #1: ffffff80dd712c70
> (reservation_ww_class_acquire){+.+.}, at:
> msm_ioctl_gem_submit+0x8e8/0xb60
> [   16.127357]  #2: ffffff80ff3911a8
> (reservation_ww_class_mutex){+.+.}, at: submit_lock_objects+0x70/0x1ec
> [   16.137089]
> [   16.137089] stack backtrace:
> [   16.141567] CPU: 4 PID: 1805 Comm: chrome Not tainted 5.4.70 #41
> [   16.147733] Hardware name: Google Lazor (rev1+) with LTE (DT)
> [   16.153632] Call trace:
> [   16.156154]  dump_backtrace+0x0/0x158
> [   16.159924]  show_stack+0x20/0x2c
> [   16.163340]  dump_stack+0xc8/0x160
> [   16.166840]  print_circular_bug+0x2c4/0x2c8
> [   16.171144]  check_noncircular+0x1a8/0x1b0
> [   16.175351]  __lock_acquire+0xee4/0x2450
> [   16.179382]  lock_acquire+0x1cc/0x210
> [   16.183146]  __mutex_lock_common+0xec/0xc0c
> [   16.187450]  mutex_lock_nested+0x40/0x50
> [   16.191481]  _find_opp_table+0x34/0x74
> [   16.195344]  dev_pm_opp_find_freq_exact+0x2c/0xdc
> [   16.200178]  a6xx_gmu_resume+0xc8/0xecc
> [   16.204120]  a6xx_pm_resume+0x148/0x200
> [   16.208064]  adreno_resume+0x28/0x34
> [   16.211743]  pm_generic_runtime_resume+0x34/0x48
> [   16.216488]  __rpm_callback+0x70/0x10c
> [   16.220342]  rpm_callback+0x34/0x8c
> [   16.223933]  rpm_resume+0x414/0x550
> [   16.227524]  __pm_runtime_resume+0x7c/0xa0
> [   16.231731]  msm_gpu_submit+0x60/0x1c0
> [   16.235586]  msm_ioctl_gem_submit+0xadc/0xb60
> [   16.240066]  drm_ioctl_kernel+0x9c/0x118
> [   16.244097]  drm_ioctl+0x27c/0x408
> [   16.247602]  drm_compat_ioctl+0xcc/0xdc
> [   16.251546]  __se_compat_sys_ioctl+0x100/0x206c
> [   16.256204]  __arm64_compat_sys_ioctl+0x20/0x2c
> [   16.260861]  el0_svc_common+0xa8/0x178
> [   16.264716]  el0_svc_compat_handler+0x2c/0x40
> [   16.269196]  el0_svc_compat+0x8/0x10
> 
> BR,
> -R
> 
> > -Daniel
> >
> > >
> > >   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> > >
> > > For an explicit fencing userspace, the impact should be minimal
> > > as we do all the fence waits before this point.  It could result
> > > in some needless resumes in error cases, etc.
> > >
> > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > ---
> > >  drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++++++++++++--
> > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > index 002130d826aa..a9422d043bfe 100644
> > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > @@ -744,11 +744,20 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > >
> > >       ret = submit_lookup_objects(submit, args, file);
> > >       if (ret)
> > > -             goto out;
> > > +             goto out_pre_pm;
> > >
> > >       ret = submit_lookup_cmds(submit, args, file);
> > >       if (ret)
> > > -             goto out;
> > > +             goto out_pre_pm;
> > > +
> > > +     /*
> > > +      * Thanks to dev_pm_opp opp_table_lock interactions with mm->mmap_sem
> > > +      * in the resume path, we need to to rpm get before we lock objs.
> > > +      * Which unfortunately might involve powering up the GPU sooner than
> > > +      * is necessary.  But at least in the explicit fencing case, we will
> > > +      * have already done all the fence waiting.
> > > +      */
> > > +     pm_runtime_get_sync(&gpu->pdev->dev);
> > >
> > >       /* copy_*_user while holding a ww ticket upsets lockdep */
> > >       ww_acquire_init(&submit->ticket, &reservation_ww_class);
> > > @@ -825,6 +834,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > >
> > >
> > >  out:
> > > +     pm_runtime_put(&gpu->pdev->dev);
> > > +out_pre_pm:
> > >       submit_cleanup(submit);
> > >       if (has_ww_ticket)
> > >               ww_acquire_fini(&submit->ticket);

-- 
viresh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-10-20  9:07       ` Viresh Kumar
@ 2020-10-20 10:56         ` Daniel Vetter
  2020-10-20 11:24           ` Viresh Kumar
  0 siblings, 1 reply; 50+ messages in thread
From: Daniel Vetter @ 2020-10-20 10:56 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rob Clark, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On Tue, Oct 20, 2020 at 11:07 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 12-10-20, 08:43, Rob Clark wrote:
> > On Mon, Oct 12, 2020 at 7:35 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Sun, Oct 11, 2020 at 07:09:34PM -0700, Rob Clark wrote:
> > > > From: Rob Clark <robdclark@chromium.org>
> > > >
> > > > Unfortunately, due to an dev_pm_opp locking interaction with
> > > > mm->mmap_sem, we need to do pm get before aquiring obj locks,
> > > > otherwise we can have anger lockdep with the chain:
> > >
> > > tbh this sounds like a bug in that subsystem, since it means we cannot use
> > > said subsystem in mmap handlers either.
> > >
> > > So if you have some remapping unit or need to wake up your gpu to blt the
> > > buffer into system memory first, we're toast. That doesn't sound right. So
> > > maybe Cc: pm folks and figure out how to fix this long term properly? Imo
> > > not a good reason to hold up this patch set, since unwrangling mmap_sem
> > > tends to be work ...
> >
> > + a couple of PM folks
> >
> > Looks like it has been this way for quite some time, so I guess the
> > overlap between things using dev_pm_opp and mmap is low..
> >
> > fwiw, example splat so folks can see the locking interaction I am
> > talking about.. I suspect the pm_opp interaction with mm->mmap_sem is
> > from the debugfs calls while opp_table_lock is held?
>
> I am not very sure about why this circular locking dependency is
> happening here and how exactly can we fix it. The OPP core works under
> the opp_table_lock, from within which it creates/remove the debugfs
> stuff as well.

Yeah that's bad practice. Generally you shouldn't need to hold locks
in setup/teardown code, since there's no other thread which can
possible hold a reference to anything your touching anymore. Ofc
excluding quickly grabbing/dropping a lock to insert/remove objects
into lists and stuff.

The other reason is that especially with anything related to sysfs or
debugfs, the locking dependencies you're pulling in are enormous: vfs
locks pull in mm locks (due to mmap) and at that point there's pretty
much nothing left you're allowed to hold while acquiring such a lock.
For simple drivers this is no issue, but for fancy drivers (like gpu
drivers) which need to interact with core mm) this means your
subsystem is a major pain to use.

Usually the correct fix is to only hold your subsystem locks in
setup/teardown when absolutely required, and fix any data
inconsistency issues by reordering your setup/teardown code: When you
register as the last step and unregister as the first step, there's no
need for any additional locking. And hence no need to call debugfs
functions while holding your subsystem locks.

The catch phrase I use for this is "don't solve object lifetime issues
with locking". Instead use refcounting and careful ordering in
setup/teardown code.

I think Rob has found some duct-tape, so this isn't an immediate
issue, but would be really nice to get fixed.

Cheers, Daniel

> > [   15.627855] ======================================================
> > [   15.634202] WARNING: possible circular locking dependency detected
> > [   15.640550] 5.4.70 #41 Not tainted
> > [   15.644050] ------------------------------------------------------
> > [   15.650397] chrome/1805 is trying to acquire lock:
> > [   15.655314] ffffffed90720738 (opp_table_lock){+.+.}, at:
> > _find_opp_table+0x34/0x74
> > [   15.663092]
> > [   15.663092] but task is already holding lock:
> > [   15.669082] ffffff80ff3911a8 (reservation_ww_class_mutex){+.+.},
> > at: submit_lock_objects+0x70/0x1ec
> > [   15.678369]
> > [   15.678369] which lock already depends on the new lock.
> > [   15.678369]
> > [   15.686764]
> > [   15.686764] the existing dependency chain (in reverse order) is:
> > [   15.694438]
> > [   15.694438] -> #3 (reservation_ww_class_mutex){+.+.}:
> > [   15.701146]        __mutex_lock_common+0xec/0xc0c
> > [   15.705978]        ww_mutex_lock_interruptible+0x5c/0xc4
> > [   15.711432]        msm_gem_fault+0x2c/0x124
> > [   15.715731]        __do_fault+0x40/0x16c
> > [   15.719766]        handle_mm_fault+0x7cc/0xd98
> > [   15.724337]        do_page_fault+0x230/0x3b4
> > [   15.728721]        do_translation_fault+0x5c/0x78
> > [   15.733558]        do_mem_abort+0x4c/0xb4
> > [   15.737680]        el0_da+0x1c/0x20
> > [   15.741266]
> > [   15.741266] -> #2 (&mm->mmap_sem){++++}:
> > [   15.746809]        __might_fault+0x70/0x98
> > [   15.751022]        compat_filldir+0xf8/0x48c
> > [   15.755412]        dcache_readdir+0x70/0x1dc
> > [   15.759808]        iterate_dir+0xd4/0x180
> > [   15.763931]        __arm64_compat_sys_getdents+0xa0/0x19c
> > [   15.769476]        el0_svc_common+0xa8/0x178
> > [   15.773861]        el0_svc_compat_handler+0x2c/0x40
> > [   15.778868]        el0_svc_compat+0x8/0x10
> > [   15.783075]
> > [   15.783075] -> #1 (&sb->s_type->i_mutex_key#3){++++}:
> > [   15.789788]        down_write+0x54/0x16c
> > [   15.793826]        debugfs_remove_recursive+0x50/0x158
> > [   15.799108]        opp_debug_unregister+0x34/0x114
> > [   15.804028]        dev_pm_opp_put_opp_table+0xd0/0x14c
> > [   15.809308]        dev_pm_opp_put_clkname+0x3c/0x50
> > [   15.814318]        msm_dsi_host_destroy+0xb0/0xcc
> > [   15.819149]        dsi_destroy+0x40/0x58
> > [   15.823184]        dsi_bind+0x90/0x170
> > [   15.827041]        component_bind_all+0xf0/0x208
> > [   15.831787]        msm_drm_init+0x188/0x60c
> > [   15.836084]        msm_drm_bind+0x24/0x30
> > [   15.840205]        try_to_bring_up_master+0x15c/0x1a4
> > [   15.845396]        __component_add+0x98/0x14c
> > [   15.849878]        component_add+0x28/0x34
> > [   15.854086]        dp_display_probe+0x324/0x370
> > [   15.858744]        platform_drv_probe+0x90/0xb0
> > [   15.863400]        really_probe+0x134/0x2ec
> > [   15.867699]        driver_probe_device+0x64/0xfc
> > [   15.872443]        __device_attach_driver+0x8c/0xa4
> > [   15.877459]        bus_for_each_drv+0x90/0xd8
> > [   15.881939]        __device_attach+0xc0/0x148
> > [   15.886420]        device_initial_probe+0x20/0x2c
> > [   15.891254]        bus_probe_device+0x34/0x94
> > [   15.895726]        deferred_probe_work_func+0x78/0xb4
> > [   15.900914]        process_one_work+0x30c/0x5d0
> > [   15.905573]        worker_thread+0x240/0x3f0
> > [   15.909959]        kthread+0x144/0x154
> > [   15.913809]        ret_from_fork+0x10/0x18
> > [   15.918016]
> > [   15.918016] -> #0 (opp_table_lock){+.+.}:
> > [   15.923660]        __lock_acquire+0xee4/0x2450
> > [   15.928230]        lock_acquire+0x1cc/0x210
> > [   15.932527]        __mutex_lock_common+0xec/0xc0c
> > [   15.937359]        mutex_lock_nested+0x40/0x50
> > [   15.941928]        _find_opp_table+0x34/0x74
> > [   15.946312]        dev_pm_opp_find_freq_exact+0x2c/0xdc
> > [   15.951680]        a6xx_gmu_resume+0xc8/0xecc
> > [   15.952812] fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-ce"
> > [   15.956161]        a6xx_pm_resume+0x148/0x200
> > [   15.956166]        adreno_resume+0x28/0x34
> > [   15.956171]        pm_generic_runtime_resume+0x34/0x48
> > [   15.956174]        __rpm_callback+0x70/0x10c
> > [   15.956176]        rpm_callback+0x34/0x8c
> > [   15.956179]        rpm_resume+0x414/0x550
> > [   15.956182]        __pm_runtime_resume+0x7c/0xa0
> > [   15.956185]        msm_gpu_submit+0x60/0x1c0
> > [   15.956190]        msm_ioctl_gem_submit+0xadc/0xb60
> > [   16.003961]        drm_ioctl_kernel+0x9c/0x118
> > [   16.008532]        drm_ioctl+0x27c/0x408
> > [   16.012562]        drm_compat_ioctl+0xcc/0xdc
> > [   16.017038]        __se_compat_sys_ioctl+0x100/0x206c
> > [   16.022224]        __arm64_compat_sys_ioctl+0x20/0x2c
> > [   16.027412]        el0_svc_common+0xa8/0x178
> > [   16.031800]        el0_svc_compat_handler+0x2c/0x40
> > [   16.036810]        el0_svc_compat+0x8/0x10
> > [   16.041021]
> > [   16.041021] other info that might help us debug this:
> > [   16.041021]
> > [   16.049235] Chain exists of:
> > [   16.049235]   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> > [   16.049235]
> > [   16.061014]  Possible unsafe locking scenario:
> > [   16.061014]
> > [   16.067091]        CPU0                    CPU1
> > [   16.071750]        ----                    ----
> > [   16.076399]   lock(reservation_ww_class_mutex);
> > [   16.081059]                                lock(&mm->mmap_sem);
> > [   16.087134]                                lock(reservation_ww_class_mutex);
> > [   16.094369]   lock(opp_table_lock);
> > [   16.097961]
> > [   16.097961]  *** DEADLOCK ***
> > [   16.097961]
> > [   16.104038] 3 locks held by chrome/1805:
> > [   16.108068]  #0: ffffff80fb20c0d8 (&dev->struct_mutex){+.+.}, at:
> > msm_ioctl_gem_submit+0x264/0xb60
> > [   16.117264]  #1: ffffff80dd712c70
> > (reservation_ww_class_acquire){+.+.}, at:
> > msm_ioctl_gem_submit+0x8e8/0xb60
> > [   16.127357]  #2: ffffff80ff3911a8
> > (reservation_ww_class_mutex){+.+.}, at: submit_lock_objects+0x70/0x1ec
> > [   16.137089]
> > [   16.137089] stack backtrace:
> > [   16.141567] CPU: 4 PID: 1805 Comm: chrome Not tainted 5.4.70 #41
> > [   16.147733] Hardware name: Google Lazor (rev1+) with LTE (DT)
> > [   16.153632] Call trace:
> > [   16.156154]  dump_backtrace+0x0/0x158
> > [   16.159924]  show_stack+0x20/0x2c
> > [   16.163340]  dump_stack+0xc8/0x160
> > [   16.166840]  print_circular_bug+0x2c4/0x2c8
> > [   16.171144]  check_noncircular+0x1a8/0x1b0
> > [   16.175351]  __lock_acquire+0xee4/0x2450
> > [   16.179382]  lock_acquire+0x1cc/0x210
> > [   16.183146]  __mutex_lock_common+0xec/0xc0c
> > [   16.187450]  mutex_lock_nested+0x40/0x50
> > [   16.191481]  _find_opp_table+0x34/0x74
> > [   16.195344]  dev_pm_opp_find_freq_exact+0x2c/0xdc
> > [   16.200178]  a6xx_gmu_resume+0xc8/0xecc
> > [   16.204120]  a6xx_pm_resume+0x148/0x200
> > [   16.208064]  adreno_resume+0x28/0x34
> > [   16.211743]  pm_generic_runtime_resume+0x34/0x48
> > [   16.216488]  __rpm_callback+0x70/0x10c
> > [   16.220342]  rpm_callback+0x34/0x8c
> > [   16.223933]  rpm_resume+0x414/0x550
> > [   16.227524]  __pm_runtime_resume+0x7c/0xa0
> > [   16.231731]  msm_gpu_submit+0x60/0x1c0
> > [   16.235586]  msm_ioctl_gem_submit+0xadc/0xb60
> > [   16.240066]  drm_ioctl_kernel+0x9c/0x118
> > [   16.244097]  drm_ioctl+0x27c/0x408
> > [   16.247602]  drm_compat_ioctl+0xcc/0xdc
> > [   16.251546]  __se_compat_sys_ioctl+0x100/0x206c
> > [   16.256204]  __arm64_compat_sys_ioctl+0x20/0x2c
> > [   16.260861]  el0_svc_common+0xa8/0x178
> > [   16.264716]  el0_svc_compat_handler+0x2c/0x40
> > [   16.269196]  el0_svc_compat+0x8/0x10
> >
> > BR,
> > -R
> >
> > > -Daniel
> > >
> > > >
> > > >   opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> > > >
> > > > For an explicit fencing userspace, the impact should be minimal
> > > > as we do all the fence waits before this point.  It could result
> > > > in some needless resumes in error cases, etc.
> > > >
> > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > ---
> > > >  drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++++++++++++--
> > > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > index 002130d826aa..a9422d043bfe 100644
> > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > > @@ -744,11 +744,20 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > > >
> > > >       ret = submit_lookup_objects(submit, args, file);
> > > >       if (ret)
> > > > -             goto out;
> > > > +             goto out_pre_pm;
> > > >
> > > >       ret = submit_lookup_cmds(submit, args, file);
> > > >       if (ret)
> > > > -             goto out;
> > > > +             goto out_pre_pm;
> > > > +
> > > > +     /*
> > > > +      * Thanks to dev_pm_opp opp_table_lock interactions with mm->mmap_sem
> > > > +      * in the resume path, we need to to rpm get before we lock objs.
> > > > +      * Which unfortunately might involve powering up the GPU sooner than
> > > > +      * is necessary.  But at least in the explicit fencing case, we will
> > > > +      * have already done all the fence waiting.
> > > > +      */
> > > > +     pm_runtime_get_sync(&gpu->pdev->dev);
> > > >
> > > >       /* copy_*_user while holding a ww ticket upsets lockdep */
> > > >       ww_acquire_init(&submit->ticket, &reservation_ww_class);
> > > > @@ -825,6 +834,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > > >
> > > >
> > > >  out:
> > > > +     pm_runtime_put(&gpu->pdev->dev);
> > > > +out_pre_pm:
> > > >       submit_cleanup(submit);
> > > >       if (has_ww_ticket)
> > > >               ww_acquire_fini(&submit->ticket);
>
> --
> viresh



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-10-20 10:56         ` Daniel Vetter
@ 2020-10-20 11:24           ` Viresh Kumar
  2020-10-20 11:42             ` Daniel Vetter
  2020-10-20 14:13             ` Rob Clark
  0 siblings, 2 replies; 50+ messages in thread
From: Viresh Kumar @ 2020-10-20 11:24 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Clark, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On 20-10-20, 12:56, Daniel Vetter wrote:
> Yeah that's bad practice. Generally you shouldn't need to hold locks
> in setup/teardown code, since there's no other thread which can
> possible hold a reference to anything your touching anymore. Ofc
> excluding quickly grabbing/dropping a lock to insert/remove objects
> into lists and stuff.
> 
> The other reason is that especially with anything related to sysfs or
> debugfs, the locking dependencies you're pulling in are enormous: vfs
> locks pull in mm locks (due to mmap) and at that point there's pretty
> much nothing left you're allowed to hold while acquiring such a lock.
> For simple drivers this is no issue, but for fancy drivers (like gpu
> drivers) which need to interact with core mm) this means your
> subsystem is a major pain to use.
> 
> Usually the correct fix is to only hold your subsystem locks in
> setup/teardown when absolutely required, and fix any data
> inconsistency issues by reordering your setup/teardown code: When you
> register as the last step and unregister as the first step, there's no
> need for any additional locking. And hence no need to call debugfs
> functions while holding your subsystem locks.
> 
> The catch phrase I use for this is "don't solve object lifetime issues
> with locking". Instead use refcounting and careful ordering in
> setup/teardown code.

This is exactly what I have done in the OPP core, the locks were taken
only when really necessary, though as we have seen now I have missed
that at a single place and that should be fixed as well. Will do that,
thanks.

-- 
viresh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-10-20 11:24           ` Viresh Kumar
@ 2020-10-20 11:42             ` Daniel Vetter
  2020-10-20 14:13             ` Rob Clark
  1 sibling, 0 replies; 50+ messages in thread
From: Daniel Vetter @ 2020-10-20 11:42 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rob Clark, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On Tue, Oct 20, 2020 at 1:24 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 20-10-20, 12:56, Daniel Vetter wrote:
> > Yeah that's bad practice. Generally you shouldn't need to hold locks
> > in setup/teardown code, since there's no other thread which can
> > possible hold a reference to anything your touching anymore. Ofc
> > excluding quickly grabbing/dropping a lock to insert/remove objects
> > into lists and stuff.
> >
> > The other reason is that especially with anything related to sysfs or
> > debugfs, the locking dependencies you're pulling in are enormous: vfs
> > locks pull in mm locks (due to mmap) and at that point there's pretty
> > much nothing left you're allowed to hold while acquiring such a lock.
> > For simple drivers this is no issue, but for fancy drivers (like gpu
> > drivers) which need to interact with core mm) this means your
> > subsystem is a major pain to use.
> >
> > Usually the correct fix is to only hold your subsystem locks in
> > setup/teardown when absolutely required, and fix any data
> > inconsistency issues by reordering your setup/teardown code: When you
> > register as the last step and unregister as the first step, there's no
> > need for any additional locking. And hence no need to call debugfs
> > functions while holding your subsystem locks.
> >
> > The catch phrase I use for this is "don't solve object lifetime issues
> > with locking". Instead use refcounting and careful ordering in
> > setup/teardown code.
>
> This is exactly what I have done in the OPP core, the locks were taken
> only when really necessary, though as we have seen now I have missed
> that at a single place and that should be fixed as well. Will do that,
> thanks.

Excellent. If the fix is small enough can you push it into 5.10? That
way drm/msm doesn't have to carry the temporary solution for 5.11 (the
issue only pops up with the locking rework, which teaches lockdep a
few more things about what's going on as a side effect).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-10-20 11:24           ` Viresh Kumar
  2020-10-20 11:42             ` Daniel Vetter
@ 2020-10-20 14:13             ` Rob Clark
  2020-10-22  8:06               ` Viresh Kumar
  1 sibling, 1 reply; 50+ messages in thread
From: Rob Clark @ 2020-10-20 14:13 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On Tue, Oct 20, 2020 at 4:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 20-10-20, 12:56, Daniel Vetter wrote:
> > Yeah that's bad practice. Generally you shouldn't need to hold locks
> > in setup/teardown code, since there's no other thread which can
> > possible hold a reference to anything your touching anymore. Ofc
> > excluding quickly grabbing/dropping a lock to insert/remove objects
> > into lists and stuff.
> >
> > The other reason is that especially with anything related to sysfs or
> > debugfs, the locking dependencies you're pulling in are enormous: vfs
> > locks pull in mm locks (due to mmap) and at that point there's pretty
> > much nothing left you're allowed to hold while acquiring such a lock.
> > For simple drivers this is no issue, but for fancy drivers (like gpu
> > drivers) which need to interact with core mm) this means your
> > subsystem is a major pain to use.
> >
> > Usually the correct fix is to only hold your subsystem locks in
> > setup/teardown when absolutely required, and fix any data
> > inconsistency issues by reordering your setup/teardown code: When you
> > register as the last step and unregister as the first step, there's no
> > need for any additional locking. And hence no need to call debugfs
> > functions while holding your subsystem locks.
> >
> > The catch phrase I use for this is "don't solve object lifetime issues
> > with locking". Instead use refcounting and careful ordering in
> > setup/teardown code.
>
> This is exactly what I have done in the OPP core, the locks were taken
> only when really necessary, though as we have seen now I have missed
> that at a single place and that should be fixed as well. Will do that,
> thanks.

I do have an easy enough way to repro the issue, so if you have a
patch I can certainly test it.

BR,
-R

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-10-20 14:13             ` Rob Clark
@ 2020-10-22  8:06               ` Viresh Kumar
  2020-10-25 17:39                 ` Rob Clark
  0 siblings, 1 reply; 50+ messages in thread
From: Viresh Kumar @ 2020-10-22  8:06 UTC (permalink / raw)
  To: Rob Clark
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On 20-10-20, 07:13, Rob Clark wrote:
> On Tue, Oct 20, 2020 at 4:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 20-10-20, 12:56, Daniel Vetter wrote:
> > > Yeah that's bad practice. Generally you shouldn't need to hold locks
> > > in setup/teardown code, since there's no other thread which can
> > > possible hold a reference to anything your touching anymore. Ofc
> > > excluding quickly grabbing/dropping a lock to insert/remove objects
> > > into lists and stuff.
> > >
> > > The other reason is that especially with anything related to sysfs or
> > > debugfs, the locking dependencies you're pulling in are enormous: vfs
> > > locks pull in mm locks (due to mmap) and at that point there's pretty
> > > much nothing left you're allowed to hold while acquiring such a lock.
> > > For simple drivers this is no issue, but for fancy drivers (like gpu
> > > drivers) which need to interact with core mm) this means your
> > > subsystem is a major pain to use.
> > >
> > > Usually the correct fix is to only hold your subsystem locks in
> > > setup/teardown when absolutely required, and fix any data
> > > inconsistency issues by reordering your setup/teardown code: When you
> > > register as the last step and unregister as the first step, there's no
> > > need for any additional locking. And hence no need to call debugfs
> > > functions while holding your subsystem locks.
> > >
> > > The catch phrase I use for this is "don't solve object lifetime issues
> > > with locking". Instead use refcounting and careful ordering in
> > > setup/teardown code.
> >
> > This is exactly what I have done in the OPP core, the locks were taken
> > only when really necessary, though as we have seen now I have missed
> > that at a single place and that should be fixed as well. Will do that,
> > thanks.
> 
> I do have an easy enough way to repro the issue, so if you have a
> patch I can certainly test it.

Does this fix it for you ? There is one more place still left where we
are taking the opp_table_lock while adding stuff to debugfs and that's
not that straight forward to fix. But I didn't see that path in your
circular dependency trace, so who knows :)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 2483e765318a..4cc0fb716381 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -1181,6 +1181,10 @@ static void _opp_table_kref_release(struct kref *kref)
        struct opp_device *opp_dev, *temp;
        int i;
 
+       /* Drop the lock as soon as we can */
+       list_del(&opp_table->node);
+       mutex_unlock(&opp_table_lock);
+
        _of_clear_opp_table(opp_table);
 
        /* Release clk */
@@ -1208,10 +1212,7 @@ static void _opp_table_kref_release(struct kref *kref)
 
        mutex_destroy(&opp_table->genpd_virt_dev_lock);
        mutex_destroy(&opp_table->lock);
-       list_del(&opp_table->node);
        kfree(opp_table);
-
-       mutex_unlock(&opp_table_lock);
 }
 
 void dev_pm_opp_put_opp_table(struct opp_table *opp_table)

-- 
viresh

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-10-22  8:06               ` Viresh Kumar
@ 2020-10-25 17:39                 ` Rob Clark
  2020-10-27 11:35                   ` Viresh Kumar
  0 siblings, 1 reply; 50+ messages in thread
From: Rob Clark @ 2020-10-25 17:39 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On Thu, Oct 22, 2020 at 1:06 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 20-10-20, 07:13, Rob Clark wrote:
> > On Tue, Oct 20, 2020 at 4:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 20-10-20, 12:56, Daniel Vetter wrote:
> > > > Yeah that's bad practice. Generally you shouldn't need to hold locks
> > > > in setup/teardown code, since there's no other thread which can
> > > > possible hold a reference to anything your touching anymore. Ofc
> > > > excluding quickly grabbing/dropping a lock to insert/remove objects
> > > > into lists and stuff.
> > > >
> > > > The other reason is that especially with anything related to sysfs or
> > > > debugfs, the locking dependencies you're pulling in are enormous: vfs
> > > > locks pull in mm locks (due to mmap) and at that point there's pretty
> > > > much nothing left you're allowed to hold while acquiring such a lock.
> > > > For simple drivers this is no issue, but for fancy drivers (like gpu
> > > > drivers) which need to interact with core mm) this means your
> > > > subsystem is a major pain to use.
> > > >
> > > > Usually the correct fix is to only hold your subsystem locks in
> > > > setup/teardown when absolutely required, and fix any data
> > > > inconsistency issues by reordering your setup/teardown code: When you
> > > > register as the last step and unregister as the first step, there's no
> > > > need for any additional locking. And hence no need to call debugfs
> > > > functions while holding your subsystem locks.
> > > >
> > > > The catch phrase I use for this is "don't solve object lifetime issues
> > > > with locking". Instead use refcounting and careful ordering in
> > > > setup/teardown code.
> > >
> > > This is exactly what I have done in the OPP core, the locks were taken
> > > only when really necessary, though as we have seen now I have missed
> > > that at a single place and that should be fixed as well. Will do that,
> > > thanks.
> >
> > I do have an easy enough way to repro the issue, so if you have a
> > patch I can certainly test it.
>
> Does this fix it for you ? There is one more place still left where we
> are taking the opp_table_lock while adding stuff to debugfs and that's
> not that straight forward to fix. But I didn't see that path in your
> circular dependency trace, so who knows :)

Nope, I suspect any creation of debugfs files will be problematic.

(btw, _add_opp_dev_unlocked() looks like it should be called
_add_opp_dev_locked()?)

It does look like 'struct opp_table' is already refcnt'd, so I suspect
you could replace holding opp_table_lock while calling into debugfs
with holding a reference to the opp_table instead?

BR,
-R

[  +0.074543] ======================================================
[  +0.006347] WARNING: possible circular locking dependency detected
[  +0.006349] 5.4.72 #14 Not tainted
[  +0.003501] ------------------------------------------------------
[  +0.006350] chrome/1865 is trying to acquire lock:
[  +0.004922] ffffffdd34921750 (opp_table_lock){+.+.}, at:
_find_opp_table+0x34/0x74
[  +0.007779]
              but task is already holding lock:
[  +0.005989] ffffff81f0fc71a8 (reservation_ww_class_mutex){+.+.}, at:
submit_lock_objects+0x70/0x1ec
[  +0.001132] fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-ce"
[  +0.008156]
              which lock already depends on the new lock.

[  +0.000002]
              the existing dependency chain (in reverse order) is:
[  +0.000002]
              -> #4 (reservation_ww_class_mutex){+.+.}:
[  +0.000009]        __mutex_lock_common+0xec/0xc0c
[  +0.000004]        ww_mutex_lock_interruptible+0x5c/0xc4
[  +0.000003]        msm_gem_fault+0x2c/0x124
[  +0.000005]        __do_fault+0x40/0x16c
[  +0.000003]        handle_mm_fault+0x7cc/0xd98
[  +0.000005]        do_page_fault+0x230/0x3b4
[  +0.000003]        do_translation_fault+0x5c/0x78
[  +0.000004]        do_mem_abort+0x4c/0xb4
[  +0.000006]        el0_da+0x1c/0x20
[  +0.069917]
              -> #3 (&mm->mmap_sem){++++}:
[  +0.005548]        __might_fault+0x70/0x98
[  +0.004209]        compat_filldir+0xf8/0x48c
[  +0.004394]        dcache_readdir+0x70/0x1dc
[  +0.004394]        iterate_dir+0xd4/0x180
[  +0.004119]        __arm64_compat_sys_getdents+0xa0/0x19c
[  +0.005549]        el0_svc_common+0xa8/0x178
[  +0.004394]        el0_svc_compat_handler+0x2c/0x40
[  +0.005007]        el0_svc_compat+0x8/0x10
[  +0.004205]
              -> #2 (&sb->s_type->i_mutex_key#3){++++}:
[  +0.006708]        down_write+0x54/0x16c
[  +0.004034]        start_creating+0x68/0x128
[  +0.004392]        debugfs_create_dir+0x28/0x114
[  +0.004747]        opp_debug_register+0x8c/0xc0
[  +0.004657]        _add_opp_dev_unlocked+0x5c/0x70
[  +0.004920]        _add_opp_dev+0x38/0x58
[  +0.004118]        _opp_get_opp_table+0xdc/0x1ac
[  +0.004745]        dev_pm_opp_get_opp_table_indexed+0x24/0x30
[  +0.005899]        dev_pm_opp_of_add_table_indexed+0x48/0x84
[  +0.005813]        of_genpd_add_provider_onecell+0xc0/0x1b8
[  +0.005724]        rpmhpd_probe+0x240/0x268
[  +0.004307]        platform_drv_probe+0x90/0xb0
[  +0.004654]        really_probe+0x134/0x2ec
[  +0.004304]        driver_probe_device+0x64/0xfc
[  +0.004746]        __device_attach_driver+0x8c/0xa4
[  +0.005008]        bus_for_each_drv+0x90/0xd8
[  +0.004481]        __device_attach+0xc0/0x148
[  +0.004480]        device_initial_probe+0x20/0x2c
[  +0.004832]        bus_probe_device+0x34/0x94
[  +0.004482]        device_add+0x1fc/0x3b0
[  +0.004121]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005811]        of_platform_bus_create+0x1e4/0x368
[  +0.005185]        of_platform_populate+0x70/0xbc
[  +0.004833]        devm_of_platform_populate+0x58/0xa0
[  +0.005283]        rpmh_rsc_probe+0x36c/0x3cc
[  +0.004481]        platform_drv_probe+0x90/0xb0
[  +0.004657]        really_probe+0x134/0x2ec
[  +0.004305]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004480]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004833]        bus_probe_device+0x34/0x94
[  +0.004481]        device_add+0x1fc/0x3b0
[  +0.004119]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005811]        of_platform_bus_create+0x1e4/0x368
[  +0.005185]        of_platform_bus_create+0x230/0x368
[  +0.005185]        of_platform_populate+0x70/0xbc
[  +0.004836]        of_platform_default_populate_init+0xa8/0xc0
[  +0.005986]        do_one_initcall+0x1c8/0x3fc
[  +0.004572]        do_initcall_level+0xb4/0x10c
[  +0.004657]        do_basic_setup+0x30/0x48
[  +0.004304]        kernel_init_freeable+0x124/0x1a4
[  +0.005009]        kernel_init+0x14/0x104
[  +0.004119]        ret_from_fork+0x10/0x18
[  +0.004205]
              -> #1 (&opp_table->lock){+.+.}:
[  +0.005815]        __mutex_lock_common+0xec/0xc0c
[  +0.004832]        mutex_lock_nested+0x40/0x50
[  +0.004570]        _add_opp_dev+0x2c/0x58
[  +0.004119]        _opp_get_opp_table+0xdc/0x1ac
[  +0.004745]        dev_pm_opp_get_opp_table_indexed+0x24/0x30
[  +0.005899]        dev_pm_opp_of_add_table_indexed+0x48/0x84
[  +0.005814]        of_genpd_add_provider_onecell+0xc0/0x1b8
[  +0.005721]        rpmhpd_probe+0x240/0x268
[  +0.004306]        platform_drv_probe+0x90/0xb0
[  +0.004656]        really_probe+0x134/0x2ec
[  +0.004305]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004481]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004832]        bus_probe_device+0x34/0x94
[  +0.004481]        device_add+0x1fc/0x3b0
[  +0.004119]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005810]        of_platform_bus_create+0x1e4/0x368
[  +0.005197]        of_platform_populate+0x70/0xbc
[  +0.004832]        devm_of_platform_populate+0x58/0xa0
[  +0.005284]        rpmh_rsc_probe+0x36c/0x3cc
[  +0.004480]        platform_drv_probe+0x90/0xb0
[  +0.004658]        really_probe+0x134/0x2ec
[  +0.004301]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004480]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004831]        bus_probe_device+0x34/0x94
[  +0.004482]        device_add+0x1fc/0x3b0
[  +0.004118]        of_device_add+0x3c/0x4c
[  +0.004214]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005817]        of_platform_bus_create+0x1e4/0x368
[  +0.005186]        of_platform_bus_create+0x230/0x368
[  +0.005188]        of_platform_populate+0x70/0xbc
[  +0.004832]        of_platform_default_populate_init+0xa8/0xc0
[  +0.005987]        do_one_initcall+0x1c8/0x3fc
[  +0.004569]        do_initcall_level+0xb4/0x10c
[  +0.004657]        do_basic_setup+0x30/0x48
[  +0.004305]        kernel_init_freeable+0x124/0x1a4
[  +0.005008]        kernel_init+0x14/0x104
[  +0.004119]        ret_from_fork+0x10/0x18
[  +0.004206]
              -> #0 (opp_table_lock){+.+.}:
[  +0.005640]        __lock_acquire+0xee4/0x2450
[  +0.004570]        lock_acquire+0x1cc/0x210
[  +0.004305]        __mutex_lock_common+0xec/0xc0c
[  +0.004833]        mutex_lock_nested+0x40/0x50
[  +0.004570]        _find_opp_table+0x34/0x74
[  +0.004393]        dev_pm_opp_find_freq_exact+0x2c/0xdc
[  +0.005372]        a6xx_gmu_resume+0xc8/0xecc
[  +0.004480]        a6xx_pm_resume+0x148/0x200
[  +0.004482]        adreno_resume+0x28/0x34
[  +0.004209]        pm_generic_runtime_resume+0x34/0x48
[  +0.005283]        __rpm_callback+0x70/0x10c
[  +0.004393]        rpm_callback+0x34/0x8c
[  +0.004119]        rpm_resume+0x414/0x550
[  +0.004119]        __pm_runtime_resume+0x7c/0xa0
[  +0.004746]        msm_gpu_submit+0x60/0x1c0
[  +0.004394]        msm_ioctl_gem_submit+0xadc/0xb60
[  +0.005010]        drm_ioctl_kernel+0x9c/0x118
[  +0.004569]        drm_ioctl+0x27c/0x408
[  +0.004034]        drm_compat_ioctl+0xcc/0xdc
[  +0.004483]        __se_compat_sys_ioctl+0x100/0x206c
[  +0.005186]        __arm64_compat_sys_ioctl+0x20/0x2c
[  +0.005187]        el0_svc_common+0xa8/0x178
[  +0.004393]        el0_svc_compat_handler+0x2c/0x40
[  +0.005009]        el0_svc_compat+0x8/0x10
[  +0.004205]
              other info that might help us debug this:

[  +0.008213] Chain exists of:
                opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex

[  +0.011780]  Possible unsafe locking scenario:

[  +0.006082]        CPU0                    CPU1
[  +0.004660]        ----                    ----
[  +0.004656]   lock(reservation_ww_class_mutex);
[  +0.004657]                                lock(&mm->mmap_sem);
[  +0.006079]                                lock(reservation_ww_class_mutex);
[  +0.007237]   lock(opp_table_lock);
[  +0.003592]
               *** DEADLOCK ***

[  +0.006084] 3 locks held by chrome/1865:
[  +0.004031]  #0: ffffff81edecc0d8 (&dev->struct_mutex){+.+.}, at:
msm_ioctl_gem_submit+0x264/0xb60
[  +0.009198]  #1: ffffff81d0000870
(reservation_ww_class_acquire){+.+.}, at:
msm_ioctl_gem_submit+0x8e8/0xb60
[  +0.010086]  #2: ffffff81f0fc71a8
(reservation_ww_class_mutex){+.+.}, at: submit_lock_objects+0x70/0x1ec
[  +0.009735]
              stack backtrace:
[  +0.004482] CPU: 0 PID: 1865 Comm: chrome Not tainted 5.4.72 #14
[  +0.006173] Hardware name: Google Lazor (rev1+) with LTE (DT)
[  +0.005899] Call trace:
[  +0.002515]  dump_backtrace+0x0/0x158
[  +0.003768]  show_stack+0x20/0x2c
[  +0.003407]  dump_stack+0xc8/0x160
[  +0.003506]  print_circular_bug+0x2c4/0x2c8
[  +0.004305]  check_noncircular+0x1a8/0x1b0
[  +0.004206]  __lock_acquire+0xee4/0x2450
[  +0.004032]  lock_acquire+0x1cc/0x210
[  +0.003768]  __mutex_lock_common+0xec/0xc0c
[  +0.004305]  mutex_lock_nested+0x40/0x50
[  +0.004033]  _find_opp_table+0x34/0x74
[  +0.003855]  dev_pm_opp_find_freq_exact+0x2c/0xdc
[  +0.004833]  a6xx_gmu_resume+0xc8/0xecc
[  +0.003943]  a6xx_pm_resume+0x148/0x200
[  +0.003944]  adreno_resume+0x28/0x34
[  +0.003681]  pm_generic_runtime_resume+0x34/0x48
[  +0.004745]  __rpm_callback+0x70/0x10c
[  +0.003854]  rpm_callback+0x34/0x8c
[  +0.003592]  rpm_resume+0x414/0x550
[  +0.003592]  __pm_runtime_resume+0x7c/0xa0
[  +0.004207]  msm_gpu_submit+0x60/0x1c0
[  +0.003855]  msm_ioctl_gem_submit+0xadc/0xb60
[  +0.004481]  drm_ioctl_kernel+0x9c/0x118
[  +0.004031]  drm_ioctl+0x27c/0x408
[  +0.003504]  drm_compat_ioctl+0xcc/0xdc
[  +0.003945]  __se_compat_sys_ioctl+0x100/0x206c
[  +0.004658]  __arm64_compat_sys_ioctl+0x20/0x2c
[  +0.004659]  el0_svc_common+0xa8/0x178
[  +0.003855]  el0_svc_compat_handler+0x2c/0x40
[  +0.004480]  el0_svc_compat+0x8/0x10

> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 2483e765318a..4cc0fb716381 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -1181,6 +1181,10 @@ static void _opp_table_kref_release(struct kref *kref)
>         struct opp_device *opp_dev, *temp;
>         int i;
>
> +       /* Drop the lock as soon as we can */
> +       list_del(&opp_table->node);
> +       mutex_unlock(&opp_table_lock);
> +
>         _of_clear_opp_table(opp_table);
>
>         /* Release clk */
> @@ -1208,10 +1212,7 @@ static void _opp_table_kref_release(struct kref *kref)
>
>         mutex_destroy(&opp_table->genpd_virt_dev_lock);
>         mutex_destroy(&opp_table->lock);
> -       list_del(&opp_table->node);
>         kfree(opp_table);
> -
> -       mutex_unlock(&opp_table_lock);
>  }
>
>  void dev_pm_opp_put_opp_table(struct opp_table *opp_table)
>
> --
> viresh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-10-25 17:39                 ` Rob Clark
@ 2020-10-27 11:35                   ` Viresh Kumar
  2020-11-03  5:47                     ` Viresh Kumar
  0 siblings, 1 reply; 50+ messages in thread
From: Viresh Kumar @ 2020-10-27 11:35 UTC (permalink / raw)
  To: Rob Clark
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On 25-10-20, 10:39, Rob Clark wrote:
> Nope, I suspect any creation of debugfs files will be problematic.

Yeah, so it only fixed part of the problem.

> (btw, _add_opp_dev_unlocked() looks like it should be called
> _add_opp_dev_locked()?)
> 
> It does look like 'struct opp_table' is already refcnt'd, so I suspect
> you could replace holding opp_table_lock while calling into debugfs
> with holding a reference to the opp_table instead?

It isn't that straight forward unfortunately, we need to make sure the
table doesn't get allocated for the same device twice, so
find+allocate needs to happen within a locked region.

I have taken, not so straight forward, approach to fixing this issue,
lets see if this fixes it or not.

-------------------------8<-------------------------

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 4ac4e7ce6b8b..6f4a73a6391f 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -29,6 +29,8 @@
 LIST_HEAD(opp_tables);
 /* Lock to allow exclusive modification to the device and opp lists */
 DEFINE_MUTEX(opp_table_lock);
+/* Flag indicating that opp_tables list is being updated at the moment */
+static bool opp_tables_busy;
 
 static struct opp_device *_find_opp_dev(const struct device *dev,
 					struct opp_table *opp_table)
@@ -1036,8 +1038,8 @@ static void _remove_opp_dev(struct opp_device *opp_dev,
 	kfree(opp_dev);
 }
 
-static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
-						struct opp_table *opp_table)
+struct opp_device *_add_opp_dev(const struct device *dev,
+				struct opp_table *opp_table)
 {
 	struct opp_device *opp_dev;
 
@@ -1048,7 +1050,9 @@ static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
 	/* Initialize opp-dev */
 	opp_dev->dev = dev;
 
+	mutex_lock(&opp_table->lock);
 	list_add(&opp_dev->node, &opp_table->dev_list);
+	mutex_unlock(&opp_table->lock);
 
 	/* Create debugfs entries for the opp_table */
 	opp_debug_register(opp_dev, opp_table);
@@ -1056,18 +1060,6 @@ static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
 	return opp_dev;
 }
 
-struct opp_device *_add_opp_dev(const struct device *dev,
-				struct opp_table *opp_table)
-{
-	struct opp_device *opp_dev;
-
-	mutex_lock(&opp_table->lock);
-	opp_dev = _add_opp_dev_unlocked(dev, opp_table);
-	mutex_unlock(&opp_table->lock);
-
-	return opp_dev;
-}
-
 static struct opp_table *_allocate_opp_table(struct device *dev, int index)
 {
 	struct opp_table *opp_table;
@@ -1121,8 +1113,6 @@ static struct opp_table *_allocate_opp_table(struct device *dev, int index)
 	INIT_LIST_HEAD(&opp_table->opp_list);
 	kref_init(&opp_table->kref);
 
-	/* Secure the device table modification */
-	list_add(&opp_table->node, &opp_tables);
 	return opp_table;
 
 err:
@@ -1135,27 +1125,64 @@ void _get_opp_table_kref(struct opp_table *opp_table)
 	kref_get(&opp_table->kref);
 }
 
+/*
+ * We need to make sure that the OPP table for a device doesn't get added twice,
+ * if this routine gets called in parallel with the same device pointer.
+ *
+ * The simplest way to enforce that is to perform everything (find existing
+ * table and if not found, create a new one) under the opp_table_lock, so only
+ * one creator gets access to the same. But that expands the critical section
+ * under the lock and may end up causing circular dependencies with frameworks
+ * like debugfs, interconnect or clock framework as they may be direct or
+ * indirect users of OPP core.
+ *
+ * And for that reason we have to go for a bit tricky implementation here, which
+ * uses the opp_tables_busy flag to indicate if another creator is in the middle
+ * of adding an OPP table and others should wait for it to finish.
+ */
 static struct opp_table *_opp_get_opp_table(struct device *dev, int index)
 {
 	struct opp_table *opp_table;
 
-	/* Hold our table modification lock here */
+again:
 	mutex_lock(&opp_table_lock);
 
 	opp_table = _find_opp_table_unlocked(dev);
 	if (!IS_ERR(opp_table))
 		goto unlock;
 
+	/*
+	 * The opp_tables list or an OPP table's dev_list is getting updated by
+	 * another user, wait for it to finish.
+	 */
+	if (unlikely(opp_tables_busy)) {
+		mutex_unlock(&opp_table_lock);
+		cpu_relax();
+		goto again;
+	}
+
+	opp_tables_busy = true;
 	opp_table = _managed_opp(dev, index);
+
+	/* Drop the lock to reduce the size of critical section */
+	mutex_unlock(&opp_table_lock);
+
 	if (opp_table) {
-		if (!_add_opp_dev_unlocked(dev, opp_table)) {
+		if (!_add_opp_dev(dev, opp_table)) {
 			dev_pm_opp_put_opp_table(opp_table);
 			opp_table = ERR_PTR(-ENOMEM);
 		}
-		goto unlock;
+
+		mutex_lock(&opp_table_lock);
+	} else {
+		opp_table = _allocate_opp_table(dev, index);
+
+		mutex_lock(&opp_table_lock);
+		if (!IS_ERR(opp_table))
+			list_add(&opp_table->node, &opp_tables);
 	}
 
-	opp_table = _allocate_opp_table(dev, index);
+	opp_tables_busy = false;
 
 unlock:
 	mutex_unlock(&opp_table_lock);
@@ -1181,6 +1208,10 @@ static void _opp_table_kref_release(struct kref *kref)
 	struct opp_device *opp_dev, *temp;
 	int i;
 
+	/* Drop the lock as soon as we can */
+	list_del(&opp_table->node);
+	mutex_unlock(&opp_table_lock);
+
 	_of_clear_opp_table(opp_table);
 
 	/* Release clk */
@@ -1208,10 +1239,7 @@ static void _opp_table_kref_release(struct kref *kref)
 
 	mutex_destroy(&opp_table->genpd_virt_dev_lock);
 	mutex_destroy(&opp_table->lock);
-	list_del(&opp_table->node);
 	kfree(opp_table);
-
-	mutex_unlock(&opp_table_lock);
 }
 
 void dev_pm_opp_put_opp_table(struct opp_table *opp_table)

-- 
viresh

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-10-27 11:35                   ` Viresh Kumar
@ 2020-11-03  5:47                     ` Viresh Kumar
  2020-11-03 16:50                       ` Rob Clark
  0 siblings, 1 reply; 50+ messages in thread
From: Viresh Kumar @ 2020-11-03  5:47 UTC (permalink / raw)
  To: Rob Clark
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On 27-10-20, 17:05, Viresh Kumar wrote:
> It isn't that straight forward unfortunately, we need to make sure the
> table doesn't get allocated for the same device twice, so
> find+allocate needs to happen within a locked region.
> 
> I have taken, not so straight forward, approach to fixing this issue,
> lets see if this fixes it or not.
> 
> -------------------------8<-------------------------
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 4ac4e7ce6b8b..6f4a73a6391f 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -29,6 +29,8 @@
>  LIST_HEAD(opp_tables);
>  /* Lock to allow exclusive modification to the device and opp lists */
>  DEFINE_MUTEX(opp_table_lock);
> +/* Flag indicating that opp_tables list is being updated at the moment */
> +static bool opp_tables_busy;
>  
>  static struct opp_device *_find_opp_dev(const struct device *dev,
>  					struct opp_table *opp_table)
> @@ -1036,8 +1038,8 @@ static void _remove_opp_dev(struct opp_device *opp_dev,
>  	kfree(opp_dev);
>  }
>  
> -static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
> -						struct opp_table *opp_table)
> +struct opp_device *_add_opp_dev(const struct device *dev,
> +				struct opp_table *opp_table)
>  {
>  	struct opp_device *opp_dev;
>  
> @@ -1048,7 +1050,9 @@ static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
>  	/* Initialize opp-dev */
>  	opp_dev->dev = dev;
>  
> +	mutex_lock(&opp_table->lock);
>  	list_add(&opp_dev->node, &opp_table->dev_list);
> +	mutex_unlock(&opp_table->lock);
>  
>  	/* Create debugfs entries for the opp_table */
>  	opp_debug_register(opp_dev, opp_table);
> @@ -1056,18 +1060,6 @@ static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
>  	return opp_dev;
>  }
>  
> -struct opp_device *_add_opp_dev(const struct device *dev,
> -				struct opp_table *opp_table)
> -{
> -	struct opp_device *opp_dev;
> -
> -	mutex_lock(&opp_table->lock);
> -	opp_dev = _add_opp_dev_unlocked(dev, opp_table);
> -	mutex_unlock(&opp_table->lock);
> -
> -	return opp_dev;
> -}
> -
>  static struct opp_table *_allocate_opp_table(struct device *dev, int index)
>  {
>  	struct opp_table *opp_table;
> @@ -1121,8 +1113,6 @@ static struct opp_table *_allocate_opp_table(struct device *dev, int index)
>  	INIT_LIST_HEAD(&opp_table->opp_list);
>  	kref_init(&opp_table->kref);
>  
> -	/* Secure the device table modification */
> -	list_add(&opp_table->node, &opp_tables);
>  	return opp_table;
>  
>  err:
> @@ -1135,27 +1125,64 @@ void _get_opp_table_kref(struct opp_table *opp_table)
>  	kref_get(&opp_table->kref);
>  }
>  
> +/*
> + * We need to make sure that the OPP table for a device doesn't get added twice,
> + * if this routine gets called in parallel with the same device pointer.
> + *
> + * The simplest way to enforce that is to perform everything (find existing
> + * table and if not found, create a new one) under the opp_table_lock, so only
> + * one creator gets access to the same. But that expands the critical section
> + * under the lock and may end up causing circular dependencies with frameworks
> + * like debugfs, interconnect or clock framework as they may be direct or
> + * indirect users of OPP core.
> + *
> + * And for that reason we have to go for a bit tricky implementation here, which
> + * uses the opp_tables_busy flag to indicate if another creator is in the middle
> + * of adding an OPP table and others should wait for it to finish.
> + */
>  static struct opp_table *_opp_get_opp_table(struct device *dev, int index)
>  {
>  	struct opp_table *opp_table;
>  
> -	/* Hold our table modification lock here */
> +again:
>  	mutex_lock(&opp_table_lock);
>  
>  	opp_table = _find_opp_table_unlocked(dev);
>  	if (!IS_ERR(opp_table))
>  		goto unlock;
>  
> +	/*
> +	 * The opp_tables list or an OPP table's dev_list is getting updated by
> +	 * another user, wait for it to finish.
> +	 */
> +	if (unlikely(opp_tables_busy)) {
> +		mutex_unlock(&opp_table_lock);
> +		cpu_relax();
> +		goto again;
> +	}
> +
> +	opp_tables_busy = true;
>  	opp_table = _managed_opp(dev, index);
> +
> +	/* Drop the lock to reduce the size of critical section */
> +	mutex_unlock(&opp_table_lock);
> +
>  	if (opp_table) {
> -		if (!_add_opp_dev_unlocked(dev, opp_table)) {
> +		if (!_add_opp_dev(dev, opp_table)) {
>  			dev_pm_opp_put_opp_table(opp_table);
>  			opp_table = ERR_PTR(-ENOMEM);
>  		}
> -		goto unlock;
> +
> +		mutex_lock(&opp_table_lock);
> +	} else {
> +		opp_table = _allocate_opp_table(dev, index);
> +
> +		mutex_lock(&opp_table_lock);
> +		if (!IS_ERR(opp_table))
> +			list_add(&opp_table->node, &opp_tables);
>  	}
>  
> -	opp_table = _allocate_opp_table(dev, index);
> +	opp_tables_busy = false;
>  
>  unlock:
>  	mutex_unlock(&opp_table_lock);
> @@ -1181,6 +1208,10 @@ static void _opp_table_kref_release(struct kref *kref)
>  	struct opp_device *opp_dev, *temp;
>  	int i;
>  
> +	/* Drop the lock as soon as we can */
> +	list_del(&opp_table->node);
> +	mutex_unlock(&opp_table_lock);
> +
>  	_of_clear_opp_table(opp_table);
>  
>  	/* Release clk */
> @@ -1208,10 +1239,7 @@ static void _opp_table_kref_release(struct kref *kref)
>  
>  	mutex_destroy(&opp_table->genpd_virt_dev_lock);
>  	mutex_destroy(&opp_table->lock);
> -	list_del(&opp_table->node);
>  	kfree(opp_table);
> -
> -	mutex_unlock(&opp_table_lock);
>  }
>  
>  void dev_pm_opp_put_opp_table(struct opp_table *opp_table)

Rob, Ping.

-- 
viresh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-11-03  5:47                     ` Viresh Kumar
@ 2020-11-03 16:50                       ` Rob Clark
  2020-11-04  3:03                         ` Viresh Kumar
  0 siblings, 1 reply; 50+ messages in thread
From: Rob Clark @ 2020-11-03 16:50 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On Mon, Nov 2, 2020 at 9:47 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 27-10-20, 17:05, Viresh Kumar wrote:
> > It isn't that straight forward unfortunately, we need to make sure the
> > table doesn't get allocated for the same device twice, so
> > find+allocate needs to happen within a locked region.
> >
> > I have taken, not so straight forward, approach to fixing this issue,
> > lets see if this fixes it or not.
> >
> > -------------------------8<-------------------------
> >
> > diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> > index 4ac4e7ce6b8b..6f4a73a6391f 100644
> > --- a/drivers/opp/core.c
> > +++ b/drivers/opp/core.c
> > @@ -29,6 +29,8 @@
> >  LIST_HEAD(opp_tables);
> >  /* Lock to allow exclusive modification to the device and opp lists */
> >  DEFINE_MUTEX(opp_table_lock);
> > +/* Flag indicating that opp_tables list is being updated at the moment */
> > +static bool opp_tables_busy;
> >
> >  static struct opp_device *_find_opp_dev(const struct device *dev,
> >                                       struct opp_table *opp_table)
> > @@ -1036,8 +1038,8 @@ static void _remove_opp_dev(struct opp_device *opp_dev,
> >       kfree(opp_dev);
> >  }
> >
> > -static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
> > -                                             struct opp_table *opp_table)
> > +struct opp_device *_add_opp_dev(const struct device *dev,
> > +                             struct opp_table *opp_table)
> >  {
> >       struct opp_device *opp_dev;
> >
> > @@ -1048,7 +1050,9 @@ static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
> >       /* Initialize opp-dev */
> >       opp_dev->dev = dev;
> >
> > +     mutex_lock(&opp_table->lock);
> >       list_add(&opp_dev->node, &opp_table->dev_list);
> > +     mutex_unlock(&opp_table->lock);
> >
> >       /* Create debugfs entries for the opp_table */
> >       opp_debug_register(opp_dev, opp_table);
> > @@ -1056,18 +1060,6 @@ static struct opp_device *_add_opp_dev_unlocked(const struct device *dev,
> >       return opp_dev;
> >  }
> >
> > -struct opp_device *_add_opp_dev(const struct device *dev,
> > -                             struct opp_table *opp_table)
> > -{
> > -     struct opp_device *opp_dev;
> > -
> > -     mutex_lock(&opp_table->lock);
> > -     opp_dev = _add_opp_dev_unlocked(dev, opp_table);
> > -     mutex_unlock(&opp_table->lock);
> > -
> > -     return opp_dev;
> > -}
> > -
> >  static struct opp_table *_allocate_opp_table(struct device *dev, int index)
> >  {
> >       struct opp_table *opp_table;
> > @@ -1121,8 +1113,6 @@ static struct opp_table *_allocate_opp_table(struct device *dev, int index)
> >       INIT_LIST_HEAD(&opp_table->opp_list);
> >       kref_init(&opp_table->kref);
> >
> > -     /* Secure the device table modification */
> > -     list_add(&opp_table->node, &opp_tables);
> >       return opp_table;
> >
> >  err:
> > @@ -1135,27 +1125,64 @@ void _get_opp_table_kref(struct opp_table *opp_table)
> >       kref_get(&opp_table->kref);
> >  }
> >
> > +/*
> > + * We need to make sure that the OPP table for a device doesn't get added twice,
> > + * if this routine gets called in parallel with the same device pointer.
> > + *
> > + * The simplest way to enforce that is to perform everything (find existing
> > + * table and if not found, create a new one) under the opp_table_lock, so only
> > + * one creator gets access to the same. But that expands the critical section
> > + * under the lock and may end up causing circular dependencies with frameworks
> > + * like debugfs, interconnect or clock framework as they may be direct or
> > + * indirect users of OPP core.
> > + *
> > + * And for that reason we have to go for a bit tricky implementation here, which
> > + * uses the opp_tables_busy flag to indicate if another creator is in the middle
> > + * of adding an OPP table and others should wait for it to finish.
> > + */
> >  static struct opp_table *_opp_get_opp_table(struct device *dev, int index)
> >  {
> >       struct opp_table *opp_table;
> >
> > -     /* Hold our table modification lock here */
> > +again:
> >       mutex_lock(&opp_table_lock);
> >
> >       opp_table = _find_opp_table_unlocked(dev);
> >       if (!IS_ERR(opp_table))
> >               goto unlock;
> >
> > +     /*
> > +      * The opp_tables list or an OPP table's dev_list is getting updated by
> > +      * another user, wait for it to finish.
> > +      */
> > +     if (unlikely(opp_tables_busy)) {
> > +             mutex_unlock(&opp_table_lock);
> > +             cpu_relax();
> > +             goto again;
> > +     }
> > +
> > +     opp_tables_busy = true;
> >       opp_table = _managed_opp(dev, index);
> > +
> > +     /* Drop the lock to reduce the size of critical section */
> > +     mutex_unlock(&opp_table_lock);
> > +
> >       if (opp_table) {
> > -             if (!_add_opp_dev_unlocked(dev, opp_table)) {
> > +             if (!_add_opp_dev(dev, opp_table)) {
> >                       dev_pm_opp_put_opp_table(opp_table);
> >                       opp_table = ERR_PTR(-ENOMEM);
> >               }
> > -             goto unlock;
> > +
> > +             mutex_lock(&opp_table_lock);
> > +     } else {
> > +             opp_table = _allocate_opp_table(dev, index);
> > +
> > +             mutex_lock(&opp_table_lock);
> > +             if (!IS_ERR(opp_table))
> > +                     list_add(&opp_table->node, &opp_tables);
> >       }
> >
> > -     opp_table = _allocate_opp_table(dev, index);
> > +     opp_tables_busy = false;
> >
> >  unlock:
> >       mutex_unlock(&opp_table_lock);
> > @@ -1181,6 +1208,10 @@ static void _opp_table_kref_release(struct kref *kref)
> >       struct opp_device *opp_dev, *temp;
> >       int i;
> >
> > +     /* Drop the lock as soon as we can */
> > +     list_del(&opp_table->node);
> > +     mutex_unlock(&opp_table_lock);
> > +
> >       _of_clear_opp_table(opp_table);
> >
> >       /* Release clk */
> > @@ -1208,10 +1239,7 @@ static void _opp_table_kref_release(struct kref *kref)
> >
> >       mutex_destroy(&opp_table->genpd_virt_dev_lock);
> >       mutex_destroy(&opp_table->lock);
> > -     list_del(&opp_table->node);
> >       kfree(opp_table);
> > -
> > -     mutex_unlock(&opp_table_lock);
> >  }
> >
> >  void dev_pm_opp_put_opp_table(struct opp_table *opp_table)
>
> Rob, Ping.
>

sorry, it didn't apply cleanly (which I guess is due to some other
dependencies that need to be picked back to v5.4 product kernel), and
due to some other things I'm in middle of debugging I didn't have time
yet to switch to v5.10-rc or look at what else needs to
cherry-picked..

If you could, pushing a branch with this patch somewhere would be a
bit easier to work with (ie. fetch && cherry-pick is easier to deal
with than picking things from list)

BR,
-R

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-11-03 16:50                       ` Rob Clark
@ 2020-11-04  3:03                         ` Viresh Kumar
  2020-11-05 19:24                           ` Rob Clark
  0 siblings, 1 reply; 50+ messages in thread
From: Viresh Kumar @ 2020-11-04  3:03 UTC (permalink / raw)
  To: Rob Clark
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On 03-11-20, 08:50, Rob Clark wrote:
> sorry, it didn't apply cleanly (which I guess is due to some other
> dependencies that need to be picked back to v5.4 product kernel), and
> due to some other things I'm in middle of debugging I didn't have time
> yet to switch to v5.10-rc or look at what else needs to
> cherry-picked..
> 
> If you could, pushing a branch with this patch somewhere would be a
> bit easier to work with (ie. fetch && cherry-pick is easier to deal
> with than picking things from list)

It has been in linux-next for a few days. Here is the HEAD to pick
from. There are few patches there since rc1.

commit 203e29749cc0 ("opp: Allocate the OPP table outside of opp_table_lock")

-- 
viresh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-11-04  3:03                         ` Viresh Kumar
@ 2020-11-05 19:24                           ` Rob Clark
  2020-11-06  7:16                             ` Viresh Kumar
  0 siblings, 1 reply; 50+ messages in thread
From: Rob Clark @ 2020-11-05 19:24 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On Tue, Nov 3, 2020 at 7:04 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 03-11-20, 08:50, Rob Clark wrote:
> > sorry, it didn't apply cleanly (which I guess is due to some other
> > dependencies that need to be picked back to v5.4 product kernel), and
> > due to some other things I'm in middle of debugging I didn't have time
> > yet to switch to v5.10-rc or look at what else needs to
> > cherry-picked..
> >
> > If you could, pushing a branch with this patch somewhere would be a
> > bit easier to work with (ie. fetch && cherry-pick is easier to deal
> > with than picking things from list)
>
> It has been in linux-next for a few days. Here is the HEAD to pick
> from. There are few patches there since rc1.
>
> commit 203e29749cc0 ("opp: Allocate the OPP table outside of opp_table_lock")
>

sorry for the delay, with that cherry-picked, I'm getting a whole lot of:

[   10.191497] WARNING: CPU: 7 PID: 52 at drivers/opp/of.c:115
_find_table_of_opp_np+0x8c/0x94
[   10.191502] Modules linked in:
[   10.191517] CPU: 7 PID: 52 Comm: kworker/7:1 Tainted: G        W
     5.10.0-rc2+ #2
[   10.191522] Hardware name: Google Lazor (rev1+) with LTE (DT)
[   10.191537] Workqueue: events deferred_probe_work_func
[   10.191551] pstate: 60c00009 (nZCv daif +PAN +UAO -TCO BTYPE=--)
[   10.202819] mmc0: CQHCI version 5.10
[   10.206038] pc : _find_table_of_opp_np+0x8c/0x94
[   10.206045] lr : _find_table_of_opp_np+0x34/0x94
[   10.206050] sp : ffffffc010373810
[   10.206054] x29: ffffffc010373810 x28: ffffff94c5a3d170
[   10.206070] x27: ffffff94c5a3d168
[   10.249098] mmc0: SDHCI controller on 7c4000.sdhci [7c4000.sdhci]
using ADMA 64-bit
[   10.251366] x26: ffffff94c580c000
[   10.251374] x25: 0000000000000001 x24: ffffff963f02c750
[   10.251385] x23: 0000000000000000 x22: ffffff94c5aabc80
[   10.251397] x21: ffffff963f021c78 x20: ffffff94c5a75800
[   10.256963] sdhci_msm 7c4000.sdhci: mmc0: CQE init: success
[   10.260376]
[   10.260380] x19: ffffff963f02c750 x18: 0000000000000004
[   10.260392] x17: 000000000000002c x16: ffffffe2468e1e78
[   10.260404] x15: ffffffe246df3eb8 x14: ffffffff52f45308
[   10.311816] x13: 0000000000000000 x12: ffffffe24541aef0
[   10.317298] x11: ffffffe246df3eb8 x10: fffffffefe60e678
[   10.322776] x9 : 0000000000000000 x8 : ffffffb3f89a7000
[   10.328258] x7 : ffffffe245c5d9d0 x6 : 0000000000000000
[   10.333730] x5 : 0000000000000080 x4 : 0000000000000001
[   10.339206] x3 : 0000000000000000 x2 : 0000000000000006
[   10.344684] x1 : ffffffe24684aa88 x0 : 0000000000000000
[   10.350158] Call trace:
[   10.352695]  _find_table_of_opp_np+0x8c/0x94
[   10.353507] mmc0: Command Queue Engine enabled
[   10.357095]  _of_init_opp_table+0x15c/0x1e4
[   10.357103]  _opp_get_opp_table+0x168/0x280
[   10.357110]  dev_pm_opp_set_clkname+0x28/0xcc
[   10.357119]  dpu_bind+0x50/0x1a4
[   10.357128]  component_bind_all+0xf4/0x20c
[   10.357138]  msm_drm_init+0x180/0x588
[   10.361815] mmc0: new HS400 Enhanced strobe MMC card at address 0001
[   10.366050]  msm_drm_bind+0x1c/0x24
[   10.366057]  try_to_bring_up_master+0x160/0x1a8
[   10.366065]  component_master_add_with_match+0xc4/0x108
[   10.366072]  msm_pdev_probe+0x214/0x2a4
[   10.366081]  platform_drv_probe+0x94/0xb4
[   10.374415] mmcblk0: mmc0:0001 DA4064 58.2 GiB
[   10.374871]  really_probe+0x138/0x348
[   10.374881]  driver_probe_device+0x80/0xb8
[   10.379483] mmcblk0boot0: mmc0:0001 DA4064 partition 1 4.00 MiB
[   10.382446]  __device_attach_driver+0x90/0xa8
[   10.382453]  bus_for_each_drv+0x84/0xcc
[   10.382459]  __device_attach+0xc0/0x148
[   10.382466]  device_initial_probe+0x18/0x20
[   10.382473]  bus_probe_device+0x38/0x98
[   10.382483]  deferred_probe_work_func+0x7c/0xb8
[   10.387402] mmcblk0boot1: mmc0:0001 DA4064 partition 2 4.00 MiB
[   10.392780]  process_one_work+0x314/0x60c
[   10.392786]  worker_thread+0x238/0x3e8
[   10.392793]  kthread+0x148/0x158
[   10.392800]  ret_from_fork+0x10/0x18
[   10.392809] CPU: 7 PID: 52 Comm: kworker/7:1 Tainted: G        W
     5.10.0-rc2+ #2
[   10.397683] mmcblk0rpmb: mmc0:0001 DA4064 partition 3 16.0 MiB,
chardev (241:0)
[   10.401051] Hardware name: Google Lazor (rev1+) with LTE (DT)
[   10.401062] Workqueue: events deferred_probe_work_func
[   10.401069] Call trace:
[   10.401077]  dump_backtrace+0x0/0x1b4
[   10.401087]  show_stack+0x1c/0x24
[   10.427111]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12
[   10.427156]  dump_stack+0xdc/0x158
[   10.427165]  __warn+0xd8/0x16c
[   10.427173]  report_bug+0x88/0xe0
[   10.427179]  bug_handler+0x24/0x6c
[   10.535574]  brk_handler+0x78/0xb4
[   10.539090]  do_debug_exception+0x1a4/0x208
[   10.543395]  el1_sync_handler+0x8c/0x110
[   10.547434]  el1_sync+0x7c/0x100
[   10.550762]  _find_table_of_opp_np+0x8c/0x94
[   10.555166]  _of_init_opp_table+0x15c/0x1e4
[   10.559472]  _opp_get_opp_table+0x168/0x280
[   10.563779]  dev_pm_opp_set_clkname+0x28/0xcc
[   10.568270]  dpu_bind+0x50/0x1a4
[   10.571607]  component_bind_all+0xf4/0x20c
[   10.575826]  msm_drm_init+0x180/0x588
[   10.579603]  msm_drm_bind+0x1c/0x24
[   10.583205]  try_to_bring_up_master+0x160/0x1a8
[   10.587877]  component_master_add_with_match+0xc4/0x108
[   10.593251]  msm_pdev_probe+0x214/0x2a4
[   10.597203]  platform_drv_probe+0x94/0xb4
[   10.601334]  really_probe+0x138/0x348
[   10.605110]  driver_probe_device+0x80/0xb8
[   10.609329]  __device_attach_driver+0x90/0xa8
[   10.613821]  bus_for_each_drv+0x84/0xcc
[   10.617774]  __device_attach+0xc0/0x148
[   10.621729]  device_initial_probe+0x18/0x20
[   10.626044]  bus_probe_device+0x38/0x98
[   10.629998]  deferred_probe_work_func+0x7c/0xb8
[   10.634668]  process_one_work+0x314/0x60c
[   10.638797]  worker_thread+0x238/0x3e8
[   10.642661]  kthread+0x148/0x158
[   10.645997]  ret_from_fork+0x10/0x18
[   10.649683] irq event stamp: 117274
[   10.653290] hardirqs last  enabled at (117273):
[<ffffffe245ed8430>] _raw_spin_unlock_irqrestore+0x60/0x94
[   10.663213] hardirqs last disabled at (117274):
[<ffffffe245420ea0>] do_debug_exception+0x60/0x208
[   10.672420] softirqs last  enabled at (116976):
[<ffffffe245400eec>] __do_softirq+0x4bc/0x540
[   10.681184] softirqs last disabled at (116971):
[<ffffffe24547dd10>] __irq_exit_rcu+0x118/0x138
[   10.690123] ---[ end trace 00b127c206a99072 ]---

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-11-05 19:24                           ` Rob Clark
@ 2020-11-06  7:16                             ` Viresh Kumar
  2020-11-17 10:03                               ` Viresh Kumar
  2020-11-17 17:02                               ` Rob Clark
  0 siblings, 2 replies; 50+ messages in thread
From: Viresh Kumar @ 2020-11-06  7:16 UTC (permalink / raw)
  To: Rob Clark
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On 05-11-20, 11:24, Rob Clark wrote:
> On Tue, Nov 3, 2020 at 7:04 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 03-11-20, 08:50, Rob Clark wrote:
> > > sorry, it didn't apply cleanly (which I guess is due to some other
> > > dependencies that need to be picked back to v5.4 product kernel), and
> > > due to some other things I'm in middle of debugging I didn't have time
> > > yet to switch to v5.10-rc or look at what else needs to
> > > cherry-picked..
> > >
> > > If you could, pushing a branch with this patch somewhere would be a
> > > bit easier to work with (ie. fetch && cherry-pick is easier to deal
> > > with than picking things from list)
> >
> > It has been in linux-next for a few days. Here is the HEAD to pick
> > from. There are few patches there since rc1.
> >
> > commit 203e29749cc0 ("opp: Allocate the OPP table outside of opp_table_lock")
> >
> 
> sorry for the delay, with that cherry-picked, I'm getting a whole lot of:

Ahh, sorry about that and thanks for reporting it. Here is the fix:

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index c718092757d9..6b7f0066942d 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -112,8 +112,6 @@ static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)
        struct opp_table *opp_table;
        struct device_node *opp_table_np;
 
-       lockdep_assert_held(&opp_table_lock);
-
        opp_table_np = of_get_parent(opp_np);
        if (!opp_table_np)
                goto err;
@@ -121,12 +119,15 @@ static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)
        /* It is safe to put the node now as all we need now is its address */
        of_node_put(opp_table_np);
 
+       mutex_lock(&opp_table_lock);
        list_for_each_entry(opp_table, &opp_tables, node) {
                if (opp_table_np == opp_table->np) {
                        _get_opp_table_kref(opp_table);
+                       mutex_unlock(&opp_table_lock);
                        return opp_table;
                }
        }
+       mutex_unlock(&opp_table_lock);
 
 err:
        return ERR_PTR(-ENODEV);

-- 
viresh

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-11-06  7:16                             ` Viresh Kumar
@ 2020-11-17 10:03                               ` Viresh Kumar
  2020-11-17 17:02                               ` Rob Clark
  1 sibling, 0 replies; 50+ messages in thread
From: Viresh Kumar @ 2020-11-17 10:03 UTC (permalink / raw)
  To: Rob Clark
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On Fri, 6 Nov 2020 at 12:46, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 05-11-20, 11:24, Rob Clark wrote:
> > On Tue, Nov 3, 2020 at 7:04 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 03-11-20, 08:50, Rob Clark wrote:
> > > > sorry, it didn't apply cleanly (which I guess is due to some other
> > > > dependencies that need to be picked back to v5.4 product kernel), and
> > > > due to some other things I'm in middle of debugging I didn't have time
> > > > yet to switch to v5.10-rc or look at what else needs to
> > > > cherry-picked..
> > > >
> > > > If you could, pushing a branch with this patch somewhere would be a
> > > > bit easier to work with (ie. fetch && cherry-pick is easier to deal
> > > > with than picking things from list)
> > >
> > > It has been in linux-next for a few days. Here is the HEAD to pick
> > > from. There are few patches there since rc1.
> > >
> > > commit 203e29749cc0 ("opp: Allocate the OPP table outside of opp_table_lock")
> > >
> >
> > sorry for the delay, with that cherry-picked, I'm getting a whole lot of:
>
> Ahh, sorry about that and thanks for reporting it. Here is the fix:
>
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index c718092757d9..6b7f0066942d 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -112,8 +112,6 @@ static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)
>         struct opp_table *opp_table;
>         struct device_node *opp_table_np;
>
> -       lockdep_assert_held(&opp_table_lock);
> -
>         opp_table_np = of_get_parent(opp_np);
>         if (!opp_table_np)
>                 goto err;
> @@ -121,12 +119,15 @@ static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)
>         /* It is safe to put the node now as all we need now is its address */
>         of_node_put(opp_table_np);
>
> +       mutex_lock(&opp_table_lock);
>         list_for_each_entry(opp_table, &opp_tables, node) {
>                 if (opp_table_np == opp_table->np) {
>                         _get_opp_table_kref(opp_table);
> +                       mutex_unlock(&opp_table_lock);
>                         return opp_table;
>                 }
>         }
> +       mutex_unlock(&opp_table_lock);
>
>  err:
>         return ERR_PTR(-ENODEV);

Ping.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-11-06  7:16                             ` Viresh Kumar
  2020-11-17 10:03                               ` Viresh Kumar
@ 2020-11-17 17:02                               ` Rob Clark
  2020-11-18  5:28                                 ` Viresh Kumar
  1 sibling, 1 reply; 50+ messages in thread
From: Rob Clark @ 2020-11-17 17:02 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On Thu, Nov 5, 2020 at 11:16 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 05-11-20, 11:24, Rob Clark wrote:
> > On Tue, Nov 3, 2020 at 7:04 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 03-11-20, 08:50, Rob Clark wrote:
> > > > sorry, it didn't apply cleanly (which I guess is due to some other
> > > > dependencies that need to be picked back to v5.4 product kernel), and
> > > > due to some other things I'm in middle of debugging I didn't have time
> > > > yet to switch to v5.10-rc or look at what else needs to
> > > > cherry-picked..
> > > >
> > > > If you could, pushing a branch with this patch somewhere would be a
> > > > bit easier to work with (ie. fetch && cherry-pick is easier to deal
> > > > with than picking things from list)
> > >
> > > It has been in linux-next for a few days. Here is the HEAD to pick
> > > from. There are few patches there since rc1.
> > >
> > > commit 203e29749cc0 ("opp: Allocate the OPP table outside of opp_table_lock")
> > >
> >
> > sorry for the delay, with that cherry-picked, I'm getting a whole lot of:
>
> Ahh, sorry about that and thanks for reporting it. Here is the fix:
>
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index c718092757d9..6b7f0066942d 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -112,8 +112,6 @@ static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)
>         struct opp_table *opp_table;
>         struct device_node *opp_table_np;
>
> -       lockdep_assert_held(&opp_table_lock);
> -
>         opp_table_np = of_get_parent(opp_np);
>         if (!opp_table_np)
>                 goto err;
> @@ -121,12 +119,15 @@ static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)
>         /* It is safe to put the node now as all we need now is its address */
>         of_node_put(opp_table_np);
>
> +       mutex_lock(&opp_table_lock);
>         list_for_each_entry(opp_table, &opp_tables, node) {
>                 if (opp_table_np == opp_table->np) {
>                         _get_opp_table_kref(opp_table);
> +                       mutex_unlock(&opp_table_lock);
>                         return opp_table;
>                 }
>         }
> +       mutex_unlock(&opp_table_lock);
>
>  err:
>         return ERR_PTR(-ENODEV);
>

With that on top of the previous patch,

[   26.378245] ======================================================
[   26.384595] WARNING: possible circular locking dependency detected
[   26.390947] 5.10.0-rc2+ #6 Not tainted
[   26.394804] ------------------------------------------------------
[   26.401155] chrome/1886 is trying to acquire lock:
[   26.406087] ffffffe5e264aa88 (opp_table_lock){+.+.}-{3:3}, at:
_find_opp_table+0x38/0x78
[   26.414436]
[   26.414436] but task is already holding lock:
[   26.420423] ffffffb0283935b0
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
submit_lock_objects+0x70/0x1ec
[   26.430268]
[   26.430268] which lock already depends on the new lock.
[   26.430268]
[   26.438661]
[   26.438661] the existing dependency chain (in reverse order) is:
[   26.446343]
[   26.446343] -> #3 (reservation_ww_class_mutex){+.+.}-{3:3}:
[   26.453603]        lock_acquire+0x23c/0x30c
[   26.457910]        __mutex_lock_common+0xdc/0xbc4
[   26.462743]        ww_mutex_lock_interruptible+0x84/0xec
[   26.468203]        msm_gem_fault+0x30/0x138
[   26.472507]        __do_fault+0x44/0x184
[   26.476541]        handle_mm_fault+0x754/0xc50
[   26.481117]        do_page_fault+0x230/0x354
[   26.485507]        do_translation_fault+0x40/0x54
[   26.490338]        do_mem_abort+0x44/0xac
[   26.494469]        el0_sync_compat_handler+0x15c/0x190
[   26.499756]        el0_sync_compat+0x144/0x180
[   26.504328]
[   26.504328] -> #2 (&mm->mmap_lock){++++}-{3:3}:
[   26.510511]        lock_acquire+0x23c/0x30c
[   26.514813]        __might_fault+0x60/0x80
[   26.519034]        compat_filldir+0x118/0x4d0
[   26.523519]        dcache_readdir+0x74/0x1e0
[   26.527907]        iterate_dir+0xd4/0x198
[   26.532037]        __arm64_compat_sys_getdents+0x6c/0x168
[   26.537583]        el0_svc_common+0xa4/0x174
[   26.541970]        do_el0_svc_compat+0x20/0x30
[   26.546543]        el0_sync_compat_handler+0x124/0x190
[   26.551828]        el0_sync_compat+0x144/0x180
[   26.556399]
[   26.556399] -> #1 (&sb->s_type->i_mutex_key#2){++++}-{3:3}:
[   26.563660]        lock_acquire+0x23c/0x30c
[   26.567963]        down_write+0x80/0x1dc
[   26.571997]        simple_recursive_removal+0x48/0x238
[   26.577288]        debugfs_remove+0x5c/0x78
[   26.581592]        opp_debug_unregister+0x34/0x118
[   26.586521]        dev_pm_opp_put_opp_table+0xd0/0x14c
[   26.591802]        dev_pm_opp_put_clkname+0x40/0x54
[   26.596820]        msm_dsi_host_destroy+0xe0/0x108
[   26.601748]        dsi_destroy+0x40/0x58
[   26.605789]        dsi_bind+0x8c/0x16c
[   26.609648]        component_bind_all+0xf4/0x20c
[   26.614399]        msm_drm_init+0x180/0x588
[   26.618696]        msm_drm_bind+0x1c/0x24
[   26.622822]        try_to_bring_up_master+0x160/0x1a8
[   26.628014]        component_master_add_with_match+0xc4/0x108
[   26.633918]        msm_pdev_probe+0x214/0x2a4
[   26.638405]        platform_drv_probe+0x94/0xb4
[   26.643066]        really_probe+0x138/0x348
[   26.647365]        driver_probe_device+0x80/0xb8
[   26.652111]        device_driver_attach+0x50/0x70
[   26.656951]        __driver_attach+0xb4/0xc8
[   26.661340]        bus_for_each_dev+0x80/0xc8
[   26.665825]        driver_attach+0x28/0x30
[   26.670038]        bus_add_driver+0x100/0x1d4
[   26.674524]        driver_register+0x68/0xfc
[   26.678910]        __platform_driver_register+0x48/0x50
[   26.684284]        msm_drm_register+0x64/0x68
[   26.688766]        do_one_initcall+0x1ac/0x3e4
[   26.693341]        do_initcall_level+0xa0/0xb8
[   26.697910]        do_initcalls+0x58/0x94
[   26.702039]        do_basic_setup+0x28/0x30
[   26.706338]        kernel_init_freeable+0x190/0x1d0
[   26.711352]        kernel_init+0x18/0x10c
[   26.715481]        ret_from_fork+0x10/0x18
[   26.719692]
[   26.719692] -> #0 (opp_table_lock){+.+.}-{3:3}:
[   26.725883]        check_noncircular+0x12c/0x134
[   26.730627]        __lock_acquire+0x2288/0x2b2c
[   26.735284]        lock_acquire+0x23c/0x30c
[   26.739584]        __mutex_lock_common+0xdc/0xbc4
[   26.744426]        mutex_lock_nested+0x50/0x58
[   26.748998]        _find_opp_table+0x38/0x78
[   26.753395]        dev_pm_opp_find_freq_exact+0x2c/0xdc
[   26.758771]        a6xx_gmu_resume+0xcc/0xed0
[   26.763255]        a6xx_pm_resume+0x140/0x174
[   26.767741]        adreno_resume+0x24/0x2c
[   26.771956]        pm_generic_runtime_resume+0x2c/0x3c
[   26.777244]        __rpm_callback+0x74/0x114
[   26.781633]        rpm_callback+0x30/0x84
[   26.785759]        rpm_resume+0x3c8/0x4f0
[   26.789884]        __pm_runtime_resume+0x80/0xa4
[   26.794631]        msm_gpu_submit+0x60/0x228
[   26.799019]        msm_ioctl_gem_submit+0xba0/0xc1c
[   26.804038]        drm_ioctl_kernel+0xa0/0x11c
[   26.808608]        drm_ioctl+0x240/0x3dc
[   26.812653]        drm_compat_ioctl+0xd4/0xe4
[   26.817141]        __arm64_compat_sys_ioctl+0xc4/0xf8
[   26.822331]        el0_svc_common+0xa4/0x174
[   26.826718]        do_el0_svc_compat+0x20/0x30
[   26.831291]        el0_sync_compat_handler+0x124/0x190
[   26.836577]        el0_sync_compat+0x144/0x180
[   26.841148]
[   26.841148] other info that might help us debug this:
[   26.841148]
[   26.849361] Chain exists of:
[   26.849361]   opp_table_lock --> &mm->mmap_lock -->
reservation_ww_class_mutex
[   26.849361]
[   26.861249]  Possible unsafe locking scenario:
[   26.861249]
[   26.867334]        CPU0                    CPU1
[   26.871990]        ----                    ----
[   26.876647]   lock(reservation_ww_class_mutex);
[   26.881309]                                lock(&mm->mmap_lock);
[   26.887487]                                lock(reservation_ww_class_mutex);
[   26.894730]   lock(opp_table_lock);
[   26.898327]
[   26.898327]  *** DEADLOCK ***
[   26.898327]
[   26.904410] 3 locks held by chrome/1886:
[   26.908447]  #0: ffffffb005bd9138 (&dev->struct_mutex){+.+.}-{3:3},
at: msm_ioctl_gem_submit+0x238/0xc1c
[   26.918199]  #1: ffffffb02251fa70
(reservation_ww_class_acquire){+.+.}-{0:0}, at:
msm_ioctl_gem_submit+0x978/0xc1c
[   26.928843]  #2: ffffffb0283935b0
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
submit_lock_objects+0x70/0x1ec
[   26.939126]
[   26.939126] stack backtrace:
[   26.943612] CPU: 5 PID: 1886 Comm: chrome Not tainted 5.10.0-rc2+ #6
[   26.950137] Hardware name: Google Lazor (rev1+) with LTE (DT)
[   26.956039] Call trace:
[   26.958566]  dump_backtrace+0x0/0x1b4
[   26.962333]  show_stack+0x1c/0x24
[   26.965754]  dump_stack+0xdc/0x158
[   26.969251]  print_circular_bug+0x308/0x338
[   26.973550]  check_noncircular+0x12c/0x134
[   26.977762]  __lock_acquire+0x2288/0x2b2c
[   26.981889]  lock_acquire+0x23c/0x30c
[   26.985658]  __mutex_lock_common+0xdc/0xbc4
[   26.989961]  mutex_lock_nested+0x50/0x58
[   26.993994]  _find_opp_table+0x38/0x78
[   26.997852]  dev_pm_opp_find_freq_exact+0x2c/0xdc
[   27.002690]  a6xx_gmu_resume+0xcc/0xed0
[   27.006635]  a6xx_pm_resume+0x140/0x174
[   27.010580]  adreno_resume+0x24/0x2c
[   27.014259]  pm_generic_runtime_resume+0x2c/0x3c
[   27.019008]  __rpm_callback+0x74/0x114
[   27.022868]  rpm_callback+0x30/0x84
[   27.026455]  rpm_resume+0x3c8/0x4f0
[   27.030042]  __pm_runtime_resume+0x80/0xa4
[   27.034259]  msm_gpu_submit+0x60/0x228
[   27.038117]  msm_ioctl_gem_submit+0xba0/0xc1c
[   27.042604]  drm_ioctl_kernel+0xa0/0x11c
[   27.046637]  drm_ioctl+0x240/0x3dc
[   27.050139]  drm_compat_ioctl+0xd4/0xe4
[   27.054088]  __arm64_compat_sys_ioctl+0xc4/0xf8
[   27.058748]  el0_svc_common+0xa4/0x174
[   27.062606]  do_el0_svc_compat+0x20/0x30
[   27.066647]  el0_sync_compat_handler+0x124/0x190
[   27.071393]  el0_sync_compat+0x144/0x180

BR,
-R

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-11-17 17:02                               ` Rob Clark
@ 2020-11-18  5:28                                 ` Viresh Kumar
  2020-11-18 16:53                                   ` Rob Clark
  0 siblings, 1 reply; 50+ messages in thread
From: Viresh Kumar @ 2020-11-18  5:28 UTC (permalink / raw)
  To: Rob Clark
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On 17-11-20, 09:02, Rob Clark wrote:
> With that on top of the previous patch,

Don't you still have this ? Which fixed the lockdep in the remove path.

https://lore.kernel.org/lkml/20201022080644.2ck4okrxygmkuatn@vireshk-i7/

To make it clear you need these patches to fix the OPP stuff:

//From 5.10-rc3 (the one from the above link).
commit e0df59de670b ("opp: Reduce the size of critical section in _opp_table_kref_release()")

//Below two from linux-next
commit ef43f01ac069 ("opp: Always add entries in dev_list with opp_table->lock held")
commit 27c09484dd3d ("opp: Allocate the OPP table outside of opp_table_lock")

This matches the diff I gave you earlier.

-- 
viresh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-11-18  5:28                                 ` Viresh Kumar
@ 2020-11-18 16:53                                   ` Rob Clark
  2020-11-19  6:05                                     ` Viresh Kumar
  0 siblings, 1 reply; 50+ messages in thread
From: Rob Clark @ 2020-11-18 16:53 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On Tue, Nov 17, 2020 at 9:28 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 17-11-20, 09:02, Rob Clark wrote:
> > With that on top of the previous patch,
>
> Don't you still have this ? Which fixed the lockdep in the remove path.
>
> https://lore.kernel.org/lkml/20201022080644.2ck4okrxygmkuatn@vireshk-i7/
>
> To make it clear you need these patches to fix the OPP stuff:
>
> //From 5.10-rc3 (the one from the above link).
> commit e0df59de670b ("opp: Reduce the size of critical section in _opp_table_kref_release()")
>
> //Below two from linux-next
> commit ef43f01ac069 ("opp: Always add entries in dev_list with opp_table->lock held")
> commit 27c09484dd3d ("opp: Allocate the OPP table outside of opp_table_lock")
>
> This matches the diff I gave you earlier.
>

no, I did not have all three, only "opp: Allocate the OPP table
outside of opp_table_lock" plus the fixup.  But with all three:

[   27.072188] ======================================================
[   27.078542] WARNING: possible circular locking dependency detected
[   27.084897] 5.10.0-rc2+ #1 Not tainted
[   27.088750] ------------------------------------------------------
[   27.095103] chrome/1897 is trying to acquire lock:
[   27.100031] ffffffdb14e4aa88 (opp_table_lock){+.+.}-{3:3}, at:
_find_opp_table+0x38/0x78
[   27.108379]
[   27.108379] but task is already holding lock:
[   27.114373] ffffff8e2c8f91b0
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
submit_lock_objects+0x70/0x1ec
[   27.124212]
[   27.124212] which lock already depends on the new lock.
[   27.124212]
[   27.132604]
[   27.132604] the existing dependency chain (in reverse order) is:
[   27.140290]
[   27.140290] -> #4 (reservation_ww_class_mutex){+.+.}-{3:3}:
[   27.147544]        lock_acquire+0x23c/0x30c
[   27.151848]        __mutex_lock_common+0xdc/0xbc4
[   27.156685]        ww_mutex_lock_interruptible+0x84/0xec
[   27.162142]        msm_gem_fault+0x30/0x138
[   27.166443]        __do_fault+0x44/0x184
[   27.170479]        handle_mm_fault+0x754/0xc50
[   27.175053]        do_page_fault+0x230/0x354
[   27.179444]        do_translation_fault+0x40/0x54
[   27.184277]        do_mem_abort+0x44/0xac
[   27.188402]        el0_sync_compat_handler+0x15c/0x190
[   27.193680]        el0_sync_compat+0x144/0x180
[   27.198244]
[   27.198244] -> #3 (&mm->mmap_lock){++++}-{3:3}:
[   27.204435]        lock_acquire+0x23c/0x30c
[   27.208738]        __might_fault+0x60/0x80
[   27.212951]        compat_filldir+0x118/0x4d0
[   27.217434]        dcache_readdir+0x74/0x1e0
[   27.221825]        iterate_dir+0xd4/0x198
[   27.225947]        __arm64_compat_sys_getdents+0x6c/0x168
[   27.231495]        el0_svc_common+0xa4/0x174
[   27.235886]        do_el0_svc_compat+0x20/0x30
[   27.240461]        el0_sync_compat_handler+0x124/0x190
[   27.245746]        el0_sync_compat+0x144/0x180
[   27.250310]
[   27.250310] -> #2 (&sb->s_type->i_mutex_key#2){++++}-{3:3}:
[   27.257569]        lock_acquire+0x23c/0x30c
[   27.261877]        down_write+0x80/0x1dc
[   27.265912]        simple_recursive_removal+0x48/0x238
[   27.271193]        debugfs_remove+0x5c/0x78
[   27.275502]        opp_debug_remove_one+0x18/0x20
[   27.280343]        _opp_kref_release+0x40/0x74
[   27.284917]        dev_pm_opp_put_unlocked+0x44/0x64
[   27.290015]        _opp_remove_all_static+0x5c/0x90
[   27.295029]        dev_pm_opp_remove_table+0x70/0x90
[   27.300129]        dev_pm_opp_of_remove_table+0x14/0x1c
[   27.305504]        msm_dsi_host_destroy+0xd8/0x108
[   27.310434]        dsi_destroy+0x40/0x58
[   27.314469]        dsi_bind+0x8c/0x16c
[   27.318329]        component_bind_all+0xf4/0x20c
[   27.323081]        msm_drm_init+0x180/0x588
[   27.327382]        msm_drm_bind+0x1c/0x24
[   27.331503]        try_to_bring_up_master+0x160/0x1a8
[   27.336696]        component_master_add_with_match+0xc4/0x108
[   27.342597]        msm_pdev_probe+0x214/0x2a4
[   27.347076]        platform_drv_probe+0x94/0xb4
[   27.351739]        really_probe+0x138/0x348
[   27.356041]        driver_probe_device+0x80/0xb8
[   27.360788]        device_driver_attach+0x50/0x70
[   27.365621]        __driver_attach+0xb4/0xc8
[   27.370012]        bus_for_each_dev+0x80/0xc8
[   27.374495]        driver_attach+0x28/0x30
[   27.378712]        bus_add_driver+0x100/0x1d4
[   27.383188]        driver_register+0x68/0xfc
[   27.387579]        __platform_driver_register+0x48/0x50
[   27.392957]        msm_drm_register+0x64/0x68
[   27.397434]        do_one_initcall+0x1ac/0x3e4
[   27.402011]        do_initcall_level+0xa0/0xb8
[   27.406583]        do_initcalls+0x58/0x94
[   27.410704]        do_basic_setup+0x28/0x30
[   27.415008]        kernel_init_freeable+0x190/0x1d0
[   27.420024]        kernel_init+0x18/0x10c
[   27.424146]        ret_from_fork+0x10/0x18
[   27.428362]
[   27.428362] -> #1 (&opp_table->lock){+.+.}-{3:3}:
[   27.434725]        lock_acquire+0x23c/0x30c
[   27.439028]        __mutex_lock_common+0xdc/0xbc4
[   27.443862]        mutex_lock_nested+0x50/0x58
[   27.448436]        _find_opp_table_unlocked+0x44/0xb4
[   27.453626]        _opp_get_opp_table+0x3c/0x280
[   27.458375]        dev_pm_opp_get_opp_table_indexed+0x14/0x1c
[   27.464281]        of_genpd_add_provider_onecell+0xd8/0x1c0
[   27.470019]        rpmhpd_probe+0x244/0x26c
[   27.474323]        platform_drv_probe+0x94/0xb4
[   27.478985]        really_probe+0x138/0x348
[   27.483287]        driver_probe_device+0x80/0xb8
[   27.488033]        __device_attach_driver+0x90/0xa8
[   27.493047]        bus_for_each_drv+0x84/0xcc
[   27.497524]        __device_attach+0xc0/0x148
[   27.502007]        device_initial_probe+0x18/0x20
[   27.506840]        bus_probe_device+0x38/0x98
[   27.511317]        device_add+0x214/0x3c8
[   27.515443]        of_device_add+0x3c/0x48
[   27.519654]        of_platform_device_create_pdata+0xac/0xec
[   27.525473]        of_platform_bus_create+0x1cc/0x348
[   27.530664]        of_platform_populate+0x78/0xc8
[   27.535496]        devm_of_platform_populate+0x5c/0xa4
[   27.540779]        rpmh_rsc_probe+0x370/0x3d0
[   27.545253]        platform_drv_probe+0x94/0xb4
[   27.549916]        really_probe+0x138/0x348
[   27.554223]        driver_probe_device+0x80/0xb8
[   27.558971]        __device_attach_driver+0x90/0xa8
[   27.563988]        bus_for_each_drv+0x84/0xcc
[   27.568465]        __device_attach+0xc0/0x148
[   27.572942]        device_initial_probe+0x18/0x20
[   27.577778]        bus_probe_device+0x38/0x98
[   27.582263]        fw_devlink_resume+0xdc/0x110
[   27.586930]        of_platform_default_populate_init+0xb8/0xd0
[   27.592923]        do_one_initcall+0x1ac/0x3e4
[   27.597489]        do_initcall_level+0xa0/0xb8
[   27.602051]        do_initcalls+0x58/0x94
[   27.606175]        do_basic_setup+0x28/0x30
[   27.610472]        kernel_init_freeable+0x190/0x1d0
[   27.615493]        kernel_init+0x18/0x10c
[   27.619616]        ret_from_fork+0x10/0x18
[   27.623823]
[   27.623823] -> #0 (opp_table_lock){+.+.}-{3:3}:
[   27.630006]        check_noncircular+0x12c/0x134
[   27.634757]        __lock_acquire+0x2288/0x2b2c
[   27.639419]        lock_acquire+0x23c/0x30c
[   27.643727]        __mutex_lock_common+0xdc/0xbc4
[   27.648566]        mutex_lock_nested+0x50/0x58
[   27.653133]        _find_opp_table+0x38/0x78
[   27.657520]        dev_pm_opp_find_freq_exact+0x2c/0xdc
[   27.662890]        a6xx_gmu_resume+0xcc/0xed0
[   27.667372]        a6xx_pm_resume+0x140/0x174
[   27.671849]        adreno_resume+0x24/0x2c
[   27.676070]        pm_generic_runtime_resume+0x2c/0x3c
[   27.681351]        __rpm_callback+0x74/0x114
[   27.685741]        rpm_callback+0x30/0x84
[   27.689865]        rpm_resume+0x3c8/0x4f0
[   27.693989]        __pm_runtime_resume+0x80/0xa4
[   27.698742]        msm_gpu_submit+0x60/0x228
[   27.703136]        msm_ioctl_gem_submit+0xba0/0xc1c
[   27.708158]        drm_ioctl_kernel+0xa0/0x11c
[   27.712724]        drm_ioctl+0x240/0x3dc
[   27.716762]        drm_compat_ioctl+0xd4/0xe4
[   27.721244]        __arm64_compat_sys_ioctl+0xc4/0xf8
[   27.726435]        el0_svc_common+0xa4/0x174
[   27.730827]        do_el0_svc_compat+0x20/0x30
[   27.735395]        el0_sync_compat_handler+0x124/0x190
[   27.740675]        el0_sync_compat+0x144/0x180
[   27.745240]
[   27.745240] other info that might help us debug this:
[   27.745240]
[   27.753459] Chain exists of:
[   27.753459]   opp_table_lock --> &mm->mmap_lock -->
reservation_ww_class_mutex
[   27.753459]
[   27.765342]  Possible unsafe locking scenario:
[   27.765342]
[   27.771422]        CPU0                    CPU1
[   27.776085]        ----                    ----
[   27.780747]   lock(reservation_ww_class_mutex);
[   27.785413]                                lock(&mm->mmap_lock);
[   27.791591]                                lock(reservation_ww_class_mutex);
[   27.798833]   lock(opp_table_lock);
[   27.802428]
[   27.802428]  *** DEADLOCK ***
[   27.802428]
[   27.808506] 3 locks held by chrome/1897:
[   27.812540]  #0: ffffff8e05f91138 (&dev->struct_mutex){+.+.}-{3:3},
at: msm_ioctl_gem_submit+0x238/0xc1c
[   27.822295]  #1: ffffff8e1ebd2670
(reservation_ww_class_acquire){+.+.}-{0:0}, at:
msm_ioctl_gem_submit+0x978/0xc1c
[   27.832930]  #2: ffffff8e2c8f91b0
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
submit_lock_objects+0x70/0x1ec
[   27.843216]
[   27.843216] stack backtrace:
[   27.847702] CPU: 5 PID: 1897 Comm: chrome Not tainted 5.10.0-rc2+ #1
[   27.854235] Hardware name: Google Lazor (rev1+) with LTE (DT)
[   27.860142] Call trace:
[   27.862662]  dump_backtrace+0x0/0x1b4
[   27.866426]  show_stack+0x1c/0x24
[   27.869847]  dump_stack+0xdc/0x158
[   27.873349]  print_circular_bug+0x308/0x338
[   27.877647]  check_noncircular+0x12c/0x134
[   27.881858]  __lock_acquire+0x2288/0x2b2c
[   27.885984]  lock_acquire+0x23c/0x30c
[   27.889753]  __mutex_lock_common+0xdc/0xbc4
[   27.894054]  mutex_lock_nested+0x50/0x58
[   27.898086]  _find_opp_table+0x38/0x78
[   27.901946]  dev_pm_opp_find_freq_exact+0x2c/0xdc
[   27.906784]  a6xx_gmu_resume+0xcc/0xed0
[   27.910734]  a6xx_pm_resume+0x140/0x174
[   27.914684]  adreno_resume+0x24/0x2c
[   27.918363]  pm_generic_runtime_resume+0x2c/0x3c
[   27.923113]  __rpm_callback+0x74/0x114
[   27.926975]  rpm_callback+0x30/0x84
[   27.930565]  rpm_resume+0x3c8/0x4f0
[   27.934154]  __pm_runtime_resume+0x80/0xa4
[   27.938373]  msm_gpu_submit+0x60/0x228
[   27.942233]  msm_ioctl_gem_submit+0xba0/0xc1c
[   27.946713]  drm_ioctl_kernel+0xa0/0x11c
[   27.950749]  drm_ioctl+0x240/0x3dc
[   27.954256]  drm_compat_ioctl+0xd4/0xe4
[   27.958207]  __arm64_compat_sys_ioctl+0xc4/0xf8
[   27.962871]  el0_svc_common+0xa4/0x174
[   27.966731]  do_el0_svc_compat+0x20/0x30
[   27.970766]  el0_sync_compat_handler+0x124/0x190
[   27.975516]  el0_sync_compat+0x144/0x180

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-11-18 16:53                                   ` Rob Clark
@ 2020-11-19  6:05                                     ` Viresh Kumar
  2020-12-07  6:16                                       ` Viresh Kumar
  0 siblings, 1 reply; 50+ messages in thread
From: Viresh Kumar @ 2020-11-19  6:05 UTC (permalink / raw)
  To: Rob Clark
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On 18-11-20, 08:53, Rob Clark wrote:
> On Tue, Nov 17, 2020 at 9:28 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 17-11-20, 09:02, Rob Clark wrote:
> > > With that on top of the previous patch,
> >
> > Don't you still have this ? Which fixed the lockdep in the remove path.
> >
> > https://lore.kernel.org/lkml/20201022080644.2ck4okrxygmkuatn@vireshk-i7/
> >
> > To make it clear you need these patches to fix the OPP stuff:
> >
> > //From 5.10-rc3 (the one from the above link).
> > commit e0df59de670b ("opp: Reduce the size of critical section in _opp_table_kref_release()")

This fixes debugfs stuff while the OPP table is removed.

> > //Below two from linux-next
> > commit ef43f01ac069 ("opp: Always add entries in dev_list with opp_table->lock held")
> > commit 27c09484dd3d ("opp: Allocate the OPP table outside of opp_table_lock")

This fixes debugfs stuff while the OPP table is added.

> > This matches the diff I gave you earlier.
> >
> 
> no, I did not have all three, only "opp: Allocate the OPP table
> outside of opp_table_lock" plus the fixup.  But with all three:

And looking at the lockdep you gave now, it looks like we have a
problem with OPP table's internal lock (opp_table->lock) as well apart
from the global opp_table_lock.

I wish there was a way for me to reproduce the lockdep :(

I know this is exhausting for both of us and I really want to be over
with it as soon as possible, this really should be the last patch
here, please try this along with other two. This fixes the debugfs
thing while the OPPs in the OPP table are removed (they are already
added without a lock around debugfs stuff).

AFAIU, there is no further debugfs stuff that happens from within the
locks and so this really should be the last patch unless I missed
something.

-- 
viresh

-------------------------8<-------------------------
From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Thu, 19 Nov 2020 11:24:32 +0530
Subject: [PATCH] opp: Reduce the size of critical section in
 _opp_kref_release()

There is a lot of stuff here which can be done outside of the
opp_table->lock, do that. This helps avoiding a circular dependency
lockdeps around debugfs.

Reported-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/opp/core.c | 94 +++++++++++++++++++++++-----------------------
 1 file changed, 47 insertions(+), 47 deletions(-)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 9d145bb99a59..4268eb359915 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -1251,9 +1251,14 @@ void _opp_free(struct dev_pm_opp *opp)
 	kfree(opp);
 }
 
-static void _opp_kref_release(struct dev_pm_opp *opp,
-			      struct opp_table *opp_table)
+static void _opp_kref_release(struct kref *kref)
 {
+	struct dev_pm_opp *opp = container_of(kref, struct dev_pm_opp, kref);
+	struct opp_table *opp_table = opp->opp_table;
+
+	list_del(&opp->node);
+	mutex_unlock(&opp_table->lock);
+
 	/*
 	 * Notify the changes in the availability of the operable
 	 * frequency/voltage list.
@@ -1261,27 +1266,9 @@ static void _opp_kref_release(struct dev_pm_opp *opp,
 	blocking_notifier_call_chain(&opp_table->head, OPP_EVENT_REMOVE, opp);
 	_of_opp_free_required_opps(opp_table, opp);
 	opp_debug_remove_one(opp);
-	list_del(&opp->node);
 	kfree(opp);
 }
 
-static void _opp_kref_release_unlocked(struct kref *kref)
-{
-	struct dev_pm_opp *opp = container_of(kref, struct dev_pm_opp, kref);
-	struct opp_table *opp_table = opp->opp_table;
-
-	_opp_kref_release(opp, opp_table);
-}
-
-static void _opp_kref_release_locked(struct kref *kref)
-{
-	struct dev_pm_opp *opp = container_of(kref, struct dev_pm_opp, kref);
-	struct opp_table *opp_table = opp->opp_table;
-
-	_opp_kref_release(opp, opp_table);
-	mutex_unlock(&opp_table->lock);
-}
-
 void dev_pm_opp_get(struct dev_pm_opp *opp)
 {
 	kref_get(&opp->kref);
@@ -1289,16 +1276,10 @@ void dev_pm_opp_get(struct dev_pm_opp *opp)
 
 void dev_pm_opp_put(struct dev_pm_opp *opp)
 {
-	kref_put_mutex(&opp->kref, _opp_kref_release_locked,
-		       &opp->opp_table->lock);
+	kref_put_mutex(&opp->kref, _opp_kref_release, &opp->opp_table->lock);
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_put);
 
-static void dev_pm_opp_put_unlocked(struct dev_pm_opp *opp)
-{
-	kref_put(&opp->kref, _opp_kref_release_unlocked);
-}
-
 /**
  * dev_pm_opp_remove()  - Remove an OPP from OPP table
  * @dev:	device for which we do this operation
@@ -1342,30 +1323,49 @@ void dev_pm_opp_remove(struct device *dev, unsigned long freq)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_remove);
 
+static struct dev_pm_opp *_opp_get_next(struct opp_table *opp_table,
+					bool dynamic)
+{
+	struct dev_pm_opp *opp = NULL, *temp;
+
+	mutex_lock(&opp_table->lock);
+	list_for_each_entry(temp, &opp_table->opp_list, node) {
+		if (dynamic == temp->dynamic) {
+			opp = temp;
+			break;
+		}
+	}
+
+	mutex_unlock(&opp_table->lock);
+	return opp;
+}
+
 bool _opp_remove_all_static(struct opp_table *opp_table)
 {
-	struct dev_pm_opp *opp, *tmp;
-	bool ret = true;
+	struct dev_pm_opp *opp;
 
 	mutex_lock(&opp_table->lock);
 
 	if (!opp_table->parsed_static_opps) {
-		ret = false;
-		goto unlock;
+		mutex_unlock(&opp_table->lock);
+		return false;
 	}
 
-	if (--opp_table->parsed_static_opps)
-		goto unlock;
-
-	list_for_each_entry_safe(opp, tmp, &opp_table->opp_list, node) {
-		if (!opp->dynamic)
-			dev_pm_opp_put_unlocked(opp);
+	if (--opp_table->parsed_static_opps) {
+		mutex_unlock(&opp_table->lock);
+		return true;
 	}
 
-unlock:
 	mutex_unlock(&opp_table->lock);
 
-	return ret;
+	/*
+	 * Can't remove the OPP from under the lock, debugfs removal needs to
+	 * happen lock less to avoid circular dependency issues.
+	 */
+	while ((opp = _opp_get_next(opp_table, false)))
+		dev_pm_opp_put(opp);
+
+	return true;
 }
 
 /**
@@ -1377,21 +1377,21 @@ bool _opp_remove_all_static(struct opp_table *opp_table)
 void dev_pm_opp_remove_all_dynamic(struct device *dev)
 {
 	struct opp_table *opp_table;
-	struct dev_pm_opp *opp, *temp;
+	struct dev_pm_opp *opp;
 	int count = 0;
 
 	opp_table = _find_opp_table(dev);
 	if (IS_ERR(opp_table))
 		return;
 
-	mutex_lock(&opp_table->lock);
-	list_for_each_entry_safe(opp, temp, &opp_table->opp_list, node) {
-		if (opp->dynamic) {
-			dev_pm_opp_put_unlocked(opp);
-			count++;
-		}
+	/*
+	 * Can't remove the OPP from under the lock, debugfs removal needs to
+	 * happen lock less to avoid circular dependency issues.
+	 */
+	while ((opp = _opp_get_next(opp_table, true))) {
+		dev_pm_opp_put(opp);
+		count++;
 	}
-	mutex_unlock(&opp_table->lock);
 
 	/* Drop the references taken by dev_pm_opp_add() */
 	while (count--)

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-11-19  6:05                                     ` Viresh Kumar
@ 2020-12-07  6:16                                       ` Viresh Kumar
  2020-12-16  5:22                                         ` Viresh Kumar
  0 siblings, 1 reply; 50+ messages in thread
From: Viresh Kumar @ 2020-12-07  6:16 UTC (permalink / raw)
  To: Rob Clark
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On 19-11-20, 11:35, Viresh Kumar wrote:
> On 18-11-20, 08:53, Rob Clark wrote:
> > On Tue, Nov 17, 2020 at 9:28 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 17-11-20, 09:02, Rob Clark wrote:
> > > > With that on top of the previous patch,
> > >
> > > Don't you still have this ? Which fixed the lockdep in the remove path.
> > >
> > > https://lore.kernel.org/lkml/20201022080644.2ck4okrxygmkuatn@vireshk-i7/
> > >
> > > To make it clear you need these patches to fix the OPP stuff:
> > >
> > > //From 5.10-rc3 (the one from the above link).
> > > commit e0df59de670b ("opp: Reduce the size of critical section in _opp_table_kref_release()")
> 
> This fixes debugfs stuff while the OPP table is removed.
> 
> > > //Below two from linux-next
> > > commit ef43f01ac069 ("opp: Always add entries in dev_list with opp_table->lock held")
> > > commit 27c09484dd3d ("opp: Allocate the OPP table outside of opp_table_lock")
> 
> This fixes debugfs stuff while the OPP table is added.
> 
> > > This matches the diff I gave you earlier.
> > >
> > 
> > no, I did not have all three, only "opp: Allocate the OPP table
> > outside of opp_table_lock" plus the fixup.  But with all three:
> 
> And looking at the lockdep you gave now, it looks like we have a
> problem with OPP table's internal lock (opp_table->lock) as well apart
> from the global opp_table_lock.
> 
> I wish there was a way for me to reproduce the lockdep :(
> 
> I know this is exhausting for both of us and I really want to be over
> with it as soon as possible, this really should be the last patch
> here, please try this along with other two. This fixes the debugfs
> thing while the OPPs in the OPP table are removed (they are already
> added without a lock around debugfs stuff).
> 
> AFAIU, there is no further debugfs stuff that happens from within the
> locks and so this really should be the last patch unless I missed
> something.

Rob, were you able to test this patch ?

-- 
viresh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
  2020-12-07  6:16                                       ` Viresh Kumar
@ 2020-12-16  5:22                                         ` Viresh Kumar
  0 siblings, 0 replies; 50+ messages in thread
From: Viresh Kumar @ 2020-12-16  5:22 UTC (permalink / raw)
  To: Rob Clark
  Cc: Daniel Vetter, dri-devel, Rob Clark, Sean Paul, David Airlie,
	open list:DRM DRIVER FOR MSM ADRENO GPU,
	open list:DRM DRIVER FOR MSM ADRENO GPU, open list, Menon,
	Nishanth

On 07-12-20, 11:46, Viresh Kumar wrote:
> On 19-11-20, 11:35, Viresh Kumar wrote:
> > On 18-11-20, 08:53, Rob Clark wrote:
> > > On Tue, Nov 17, 2020 at 9:28 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > >
> > > > On 17-11-20, 09:02, Rob Clark wrote:
> > > > > With that on top of the previous patch,
> > > >
> > > > Don't you still have this ? Which fixed the lockdep in the remove path.
> > > >
> > > > https://lore.kernel.org/lkml/20201022080644.2ck4okrxygmkuatn@vireshk-i7/
> > > >
> > > > To make it clear you need these patches to fix the OPP stuff:
> > > >
> > > > //From 5.10-rc3 (the one from the above link).
> > > > commit e0df59de670b ("opp: Reduce the size of critical section in _opp_table_kref_release()")
> > 
> > This fixes debugfs stuff while the OPP table is removed.
> > 
> > > > //Below two from linux-next
> > > > commit ef43f01ac069 ("opp: Always add entries in dev_list with opp_table->lock held")
> > > > commit 27c09484dd3d ("opp: Allocate the OPP table outside of opp_table_lock")
> > 
> > This fixes debugfs stuff while the OPP table is added.
> > 
> > > > This matches the diff I gave you earlier.
> > > >
> > > 
> > > no, I did not have all three, only "opp: Allocate the OPP table
> > > outside of opp_table_lock" plus the fixup.  But with all three:
> > 
> > And looking at the lockdep you gave now, it looks like we have a
> > problem with OPP table's internal lock (opp_table->lock) as well apart
> > from the global opp_table_lock.
> > 
> > I wish there was a way for me to reproduce the lockdep :(
> > 
> > I know this is exhausting for both of us and I really want to be over
> > with it as soon as possible, this really should be the last patch
> > here, please try this along with other two. This fixes the debugfs
> > thing while the OPPs in the OPP table are removed (they are already
> > added without a lock around debugfs stuff).
> > 
> > AFAIU, there is no further debugfs stuff that happens from within the
> > locks and so this really should be the last patch unless I missed
> > something.
> 
> Rob, were you able to test this patch ?

FWIW, this patch and everything else I had is merged into Linus's
master. You can test 5.11-rc1 to see if you still see a lockdep or
not.

-- 
viresh

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2020-12-16  5:23 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
2020-10-12  2:09 ` [PATCH v2 01/22] drm/msm/gem: Add obj->lock wrappers Rob Clark
2020-10-12  2:09 ` [PATCH v2 02/22] drm/msm/gem: Rename internal get_iova_locked helper Rob Clark
2020-10-12  2:09 ` [PATCH v2 03/22] drm/msm/gem: Move prototypes to msm_gem.h Rob Clark
2020-10-12  2:09 ` [PATCH v2 04/22] drm/msm/gem: Add some _locked() helpers Rob Clark
2020-10-12  2:09 ` [PATCH v2 05/22] drm/msm/gem: Move locking in shrinker path Rob Clark
2020-10-12  2:09 ` [PATCH v2 06/22] drm/msm/submit: Move copy_from_user ahead of locking bos Rob Clark
2020-10-12  2:09 ` [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path Rob Clark
2020-10-12 14:35   ` Daniel Vetter
2020-10-12 15:43     ` Rob Clark
2020-10-20  9:07       ` Viresh Kumar
2020-10-20 10:56         ` Daniel Vetter
2020-10-20 11:24           ` Viresh Kumar
2020-10-20 11:42             ` Daniel Vetter
2020-10-20 14:13             ` Rob Clark
2020-10-22  8:06               ` Viresh Kumar
2020-10-25 17:39                 ` Rob Clark
2020-10-27 11:35                   ` Viresh Kumar
2020-11-03  5:47                     ` Viresh Kumar
2020-11-03 16:50                       ` Rob Clark
2020-11-04  3:03                         ` Viresh Kumar
2020-11-05 19:24                           ` Rob Clark
2020-11-06  7:16                             ` Viresh Kumar
2020-11-17 10:03                               ` Viresh Kumar
2020-11-17 17:02                               ` Rob Clark
2020-11-18  5:28                                 ` Viresh Kumar
2020-11-18 16:53                                   ` Rob Clark
2020-11-19  6:05                                     ` Viresh Kumar
2020-12-07  6:16                                       ` Viresh Kumar
2020-12-16  5:22                                         ` Viresh Kumar
2020-10-12  2:09 ` [PATCH v2 08/22] drm/msm/gem: Switch over to obj->resv for locking Rob Clark
2020-10-12  2:09 ` [PATCH v2 09/22] drm/msm: Use correct drm_gem_object_put() in fail case Rob Clark
2020-10-12  2:09 ` [PATCH v2 10/22] drm/msm: Drop chatty trace Rob Clark
2020-10-12  2:09 ` [PATCH v2 11/22] drm/msm: Move update_fences() Rob Clark
2020-10-12  2:09 ` [PATCH v2 12/22] drm/msm: Add priv->mm_lock to protect active/inactive lists Rob Clark
2020-10-12  2:09 ` [PATCH v2 13/22] drm/msm: Document and rename preempt_lock Rob Clark
2020-10-12  2:09 ` [PATCH v2 14/22] drm/msm: Protect ring->submits with it's own lock Rob Clark
2020-10-12  2:09 ` [PATCH v2 15/22] drm/msm: Refcount submits Rob Clark
2020-10-12  2:09 ` [PATCH v2 16/22] drm/msm: Remove obj->gpu Rob Clark
2020-10-12  2:09 ` [PATCH v2 17/22] drm/msm: Drop struct_mutex from the retire path Rob Clark
2020-10-12  2:09 ` [PATCH v2 18/22] drm/msm: Drop struct_mutex in free_object() path Rob Clark
2020-10-12  2:09 ` [PATCH v2 19/22] drm/msm: remove msm_gem_free_work Rob Clark
2020-10-12  2:09 ` [PATCH v2 20/22] drm/msm: drop struct_mutex in madvise path Rob Clark
2020-10-12  2:09 ` [PATCH v2 21/22] drm/msm: Drop struct_mutex in shrinker path Rob Clark
2020-10-12  2:09 ` [PATCH v2 22/22] drm/msm: Don't implicit-sync if only a single ring Rob Clark
2020-10-12 14:40   ` Daniel Vetter
2020-10-12 15:07     ` Rob Clark
2020-10-13 11:08       ` Daniel Vetter
2020-10-13 16:15         ` [Freedreno] " Rob Clark
2020-10-15  8:22           ` Daniel Vetter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).