All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] drm/amdgpu: Reduce the notifier_lock contention
@ 2021-10-07 13:26 Thomas Hellström
  2021-10-07 13:26 ` [RFC PATCH 1/2] drm/amdgpu: Move dma_resv waiting outside the notifier lock Thomas Hellström
  2021-10-07 13:26 ` [RFC PATCH 2/2] drm/amdgpu: Use an rwsem for " Thomas Hellström
  0 siblings, 2 replies; 5+ messages in thread
From: Thomas Hellström @ 2021-10-07 13:26 UTC (permalink / raw)
  To: dri-devel; +Cc: Thomas Hellström, Christian König

In the interest of sharing as much of possible of the userptr design,
two rfc patches that are aiming to reduce the userptr notifier_lock
contention.

We have these changes already in i915, but we wanted to lift them to dri-
devel in the amdgpu context for discussion to make sure we don't miss
anything important.

The first patch addresses the case where idling a single userptr bo
for even a simple invalidation, stalls command submission across the device.
Since the notifier callback is guaranteed to run to *completion* by the
notifier code without a racing successful command submission, moving the
fence wait out of the notifier lock should be safe.

The second patch probably has a smaller impact and might avoid some
contention particularly if traversing long userptr lists. The notifier sem
is also locked interruptible during command submission.

Any comments appreciated.

Cc: Christian König <christian.koenig@amd.com>

Thomas Hellström (2):
  drm/amdgpu: Move dma_resv waiting outside the notifier lock
  drm/amdgpu: Use an rwsem for the notifier lock

 drivers/gpu/drm/amd/amdgpu/amdgpu.h              |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c           | 12 ++++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c       |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c           | 11 ++++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c          |  2 +-
 6 files changed, 18 insertions(+), 13 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 1/2] drm/amdgpu: Move dma_resv waiting outside the notifier lock
  2021-10-07 13:26 [RFC PATCH 0/2] drm/amdgpu: Reduce the notifier_lock contention Thomas Hellström
@ 2021-10-07 13:26 ` Thomas Hellström
  2021-10-07 14:57   ` Christian König
  2021-10-07 13:26 ` [RFC PATCH 2/2] drm/amdgpu: Use an rwsem for " Thomas Hellström
  1 sibling, 1 reply; 5+ messages in thread
From: Thomas Hellström @ 2021-10-07 13:26 UTC (permalink / raw)
  To: dri-devel; +Cc: Thomas Hellström, Christian König

While the range notifier is executing, we have the write-side mmu interval
seqlock, and mmu_interval_read_retry() is always returning true,
which means that if amdgpu_cs_submit grabs the notifier lock during the
fence wait, it will retry anyway when checking the userptr validity
and block when retrying in mmu_interval_read_begin().
(See the extensive comments in mmu_interval_read_begin())

Hence we can release the notifier lock before the fence wait and avoid
a device-wide command submission block during invalidation.

Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index 4b153daf283d..d3d340a6129c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -75,9 +75,10 @@ static bool amdgpu_mn_invalidate_gfx(struct mmu_interval_notifier *mni,
 
 	mmu_interval_set_seq(mni, cur_seq);
 
+	mutex_unlock(&adev->notifier_lock);
+
 	r = dma_resv_wait_timeout(bo->tbo.base.resv, true, false,
 				  MAX_SCHEDULE_TIMEOUT);
-	mutex_unlock(&adev->notifier_lock);
 	if (r <= 0)
 		DRM_ERROR("(%ld) failed to wait for user bo\n", r);
 	return true;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 2/2] drm/amdgpu: Use an rwsem for the notifier lock
  2021-10-07 13:26 [RFC PATCH 0/2] drm/amdgpu: Reduce the notifier_lock contention Thomas Hellström
  2021-10-07 13:26 ` [RFC PATCH 1/2] drm/amdgpu: Move dma_resv waiting outside the notifier lock Thomas Hellström
@ 2021-10-07 13:26 ` Thomas Hellström
  1 sibling, 0 replies; 5+ messages in thread
From: Thomas Hellström @ 2021-10-07 13:26 UTC (permalink / raw)
  To: dri-devel; +Cc: Thomas Hellström, Christian König

Use an rwsem as the notifier lock, and take it in read mode during
command submission.
This avoids the device-wide serialization of command submission in
the absence of userptr invalidations.

Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h              |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c           | 12 ++++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c       |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c           | 10 +++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c          |  2 +-
 6 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d356e329e6f8..04fec2299c02 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1051,7 +1051,7 @@ struct amdgpu_device {
 	struct rw_semaphore reset_sem;
 	struct amdgpu_doorbell_index doorbell_index;
 
-	struct mutex			notifier_lock;
+	struct rw_semaphore		notifier_sem;
 
 	int asic_reset_res;
 	struct work_struct		xgmi_reset_work;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 2d6b2d77b738..14be51692539 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2049,7 +2049,7 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
 
 		/*
 		 * FIXME: Cannot ignore the return code, must hold
-		 * notifier_lock
+		 * notifier_sem in read mode.
 		 */
 		amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 0311d799a010..26f447c49bdd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1224,11 +1224,13 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 
 	drm_sched_job_arm(&job->base);
 
-	/* No memory allocation is allowed while holding the notifier lock.
+	/* No memory allocation is allowed while holding the notifier sem.
 	 * The lock is held until amdgpu_cs_submit is finished and fence is
 	 * added to BOs.
 	 */
-	mutex_lock(&p->adev->notifier_lock);
+	r = down_read_interruptible(&p->adev->notifier_sem);
+	if (r)
+		goto error_notifier;
 
 	/* If userptr are invalidated after amdgpu_cs_parser_bos(), return
 	 * -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl.
@@ -1288,13 +1290,15 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	}
 
 	ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
-	mutex_unlock(&p->adev->notifier_lock);
+	up_read(&p->adev->notifier_sem);
 
 	return 0;
 
 error_abort:
+	up_read(&p->adev->notifier_sem);
+
+error_notifier:
 	drm_sched_job_cleanup(&job->base);
-	mutex_unlock(&p->adev->notifier_lock);
 
 error_unlock:
 	amdgpu_job_free(job);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 48089dc0180b..aa27b462152d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3476,7 +3476,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	atomic_set(&adev->in_gpu_reset, 0);
 	init_rwsem(&adev->reset_sem);
 	mutex_init(&adev->psp.mutex);
-	mutex_init(&adev->notifier_lock);
+	init_rwsem(&adev->notifier_sem);
 
 	r = amdgpu_device_init_apu_flags(adev);
 	if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index d3d340a6129c..e0b1b6e11bf5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -71,11 +71,11 @@ static bool amdgpu_mn_invalidate_gfx(struct mmu_interval_notifier *mni,
 	if (!mmu_notifier_range_blockable(range))
 		return false;
 
-	mutex_lock(&adev->notifier_lock);
+	down_write(&adev->notifier_sem);
 
 	mmu_interval_set_seq(mni, cur_seq);
 
-	mutex_unlock(&adev->notifier_lock);
+	up_write(&adev->notifier_sem);
 
 	r = dma_resv_wait_timeout(bo->tbo.base.resv, true, false,
 				  MAX_SCHEDULE_TIMEOUT);
@@ -108,12 +108,12 @@ static bool amdgpu_mn_invalidate_hsa(struct mmu_interval_notifier *mni,
 	if (!mmu_notifier_range_blockable(range))
 		return false;
 
-	mutex_lock(&adev->notifier_lock);
+	down_write(&adev->notifier_sem);
 
 	mmu_interval_set_seq(mni, cur_seq);
 
 	amdgpu_amdkfd_evict_userptr(bo->kfd_bo, bo->notifier.mm);
-	mutex_unlock(&adev->notifier_lock);
+	up_write(&adev->notifier_sem);
 
 	return true;
 }
@@ -215,7 +215,7 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier,
 	/*
 	 * Due to default_flags, all pages are HMM_PFN_VALID or
 	 * hmm_range_fault() fails. FIXME: The pages cannot be touched outside
-	 * the notifier_lock, and mmu_interval_read_retry() must be done first.
+	 * the notifier_sem, and mmu_interval_read_retry() must be done first.
 	 */
 	for (i = 0; pages && i < npages; i++)
 		pages[i] = hmm_pfn_to_page(pfns[i]);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 0cf94421665f..b2af53a05fb5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -723,7 +723,7 @@ bool amdgpu_ttm_tt_get_user_pages_done(struct ttm_tt *ttm)
 
 	if (gtt->range) {
 		/*
-		 * FIXME: Must always hold notifier_lock for this, and must
+		 * FIXME: Must always hold notifier_sem for this, and must
 		 * not ignore the return code.
 		 */
 		r = amdgpu_hmm_range_get_pages_done(gtt->range);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: Move dma_resv waiting outside the notifier lock
  2021-10-07 13:26 ` [RFC PATCH 1/2] drm/amdgpu: Move dma_resv waiting outside the notifier lock Thomas Hellström
@ 2021-10-07 14:57   ` Christian König
  2021-10-08  6:43     ` Thomas Hellström
  0 siblings, 1 reply; 5+ messages in thread
From: Christian König @ 2021-10-07 14:57 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel

Am 07.10.21 um 15:26 schrieb Thomas Hellström:
> While the range notifier is executing, we have the write-side mmu interval
> seqlock, and mmu_interval_read_retry() is always returning true,
> which means that if amdgpu_cs_submit grabs the notifier lock during the
> fence wait, it will retry anyway when checking the userptr validity
> and block when retrying in mmu_interval_read_begin().
> (See the extensive comments in mmu_interval_read_begin())
>
> Hence we can release the notifier lock before the fence wait and avoid
> a device-wide command submission block during invalidation.

First of all I'm not convinced that this works and second blocking the 
CS while an MMU invalidation is underway is completely intentional.

In other words when for example fork() is called in parallel with a CS 
the CS should be blocked until the invalidation caused by the fork() 
operation is completed and *NOT* risk that the CS succeeds and adds 
another dependency to the MMU invalidation.

Christian.

>
> Cc: Christian König <christian.koenig@amd.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> index 4b153daf283d..d3d340a6129c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> @@ -75,9 +75,10 @@ static bool amdgpu_mn_invalidate_gfx(struct mmu_interval_notifier *mni,
>   
>   	mmu_interval_set_seq(mni, cur_seq);
>   
> +	mutex_unlock(&adev->notifier_lock);
> +
>   	r = dma_resv_wait_timeout(bo->tbo.base.resv, true, false,
>   				  MAX_SCHEDULE_TIMEOUT);
> -	mutex_unlock(&adev->notifier_lock);
>   	if (r <= 0)
>   		DRM_ERROR("(%ld) failed to wait for user bo\n", r);
>   	return true;


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/2] drm/amdgpu: Move dma_resv waiting outside the notifier lock
  2021-10-07 14:57   ` Christian König
@ 2021-10-08  6:43     ` Thomas Hellström
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Hellström @ 2021-10-08  6:43 UTC (permalink / raw)
  To: Christian König, dri-devel

Hi, Christian,

On Thu, 2021-10-07 at 16:57 +0200, Christian König wrote:
> Am 07.10.21 um 15:26 schrieb Thomas Hellström:
> > While the range notifier is executing, we have the write-side mmu
> > interval
> > seqlock, and mmu_interval_read_retry() is always returning true,
> > which means that if amdgpu_cs_submit grabs the notifier lock during
> > the
> > fence wait, it will retry anyway when checking the userptr validity
> > and block when retrying in mmu_interval_read_begin().
> > (See the extensive comments in mmu_interval_read_begin())
> > 
> > Hence we can release the notifier lock before the fence wait and
> > avoid
> > a device-wide command submission block during invalidation.
> 
> First of all I'm not convinced that this works and second blocking
> the 
> CS while an MMU invalidation is underway is completely intentional.
> 
> In other words when for example fork() is called in parallel with a
> CS 
> the CS should be blocked until the invalidation caused by the fork() 
> operation is completed and *NOT* risk that the CS succeeds and adds 
> another dependency to the MMU invalidation.

The point is that command submission can't succeed since while the
notifer runs, we have the write-side seqlock. It's released when the
notifier ends, and we can rely on that. The following will happen:

Thread1              		Thread 2

enter_cs();          		enter_invalidation();
		     		notifier_lock()
	             		update_seqno();
                     		notifier_unlock();
notifier_lock();     		fence_wait();
validate_userptr();
(invalid_seqno)
notifier_unlock();
(retry)
mmu_interval_read_begin();
(blocks)
				fence_signal();
				(notifier ends)
				mn_itree_inv_end();
(unblocks, new seqno);
enter_cs();
(succeeds)

So the difference is that you block only CS that affect that particular
userptr, not CS across the entire device.

So for the sake of discussion, is there a particular situation where
you see a race that might happen here?

Thanks,
Thomas


> 
> Christian.
> 
> > 
> > Cc: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> > index 4b153daf283d..d3d340a6129c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> > @@ -75,9 +75,10 @@ static bool amdgpu_mn_invalidate_gfx(struct
> > mmu_interval_notifier *mni,
> >   
> >         mmu_interval_set_seq(mni, cur_seq);
> >   
> > +       mutex_unlock(&adev->notifier_lock);
> > +
> >         r = dma_resv_wait_timeout(bo->tbo.base.resv, true, false,
> >                                   MAX_SCHEDULE_TIMEOUT);
> > -       mutex_unlock(&adev->notifier_lock);
> >         if (r <= 0)
> >                 DRM_ERROR("(%ld) failed to wait for user bo\n", r);
> >         return true;
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-08  6:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-07 13:26 [RFC PATCH 0/2] drm/amdgpu: Reduce the notifier_lock contention Thomas Hellström
2021-10-07 13:26 ` [RFC PATCH 1/2] drm/amdgpu: Move dma_resv waiting outside the notifier lock Thomas Hellström
2021-10-07 14:57   ` Christian König
2021-10-08  6:43     ` Thomas Hellström
2021-10-07 13:26 ` [RFC PATCH 2/2] drm/amdgpu: Use an rwsem for " Thomas Hellström

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.