linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Yang, Philip" <Philip.Yang@amd.com>
To: Jason Gunthorpe <jgg@mellanox.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	Jerome Glisse <jglisse@redhat.com>,
	Ralph Campbell <rcampbell@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	"Kuehling, Felix" <Felix.Kuehling@amd.com>,
	Juergen Gross <jgross@suse.com>,
	"Zhou, David(ChunMing)" <David1.Zhou@amd.com>,
	Mike Marciniszyn <mike.marciniszyn@intel.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"nouveau@lists.freedesktop.org" <nouveau@lists.freedesktop.org>,
	Dennis Dalessandro <dennis.dalessandro@intel.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	Christoph Hellwig <hch@infradead.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Petr Cvek <petrcvekcz@gmail.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	Ben Skeggs <bskeggs@redhat.com>
Subject: Re: [PATCH v2 14/15] drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror
Date: Fri, 1 Nov 2019 14:44:51 +0000	[thread overview]
Message-ID: <30b2f569-bf7a-5166-c98d-4a4a13d1351f@amd.com> (raw)
In-Reply-To: <20191029192544.GU22766@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 911 bytes --]



On 2019-10-29 3:25 p.m., Jason Gunthorpe wrote:
> On Tue, Oct 29, 2019 at 07:22:37PM +0000, Yang, Philip wrote:
>> Hi Jason,
>>
>> I did quick test after merging amd-staging-drm-next with the
>> mmu_notifier branch, which includes this set changes. The test result
>> has different failures, app stuck intermittently, GUI no display etc. I
>> am understanding the changes and will try to figure out the cause.
> 
> Thanks! I'm not surprised by this given how difficult this patch was
> to make. Let me know if I can assist in any way
> 
> Please ensure to run with lockdep enabled.. Your symptops sounds sort
> of like deadlocking?
> 
Hi Jason,

Attached patch fix several issues in amdgpu driver, maybe you can squash 
this into patch 14. With this is done, patch 12, 13, 14 is Reviewed-by 
and Tested-by Philip Yang <philip.yang@amd.com>

Regards,
Philip

> Regards,
> Jason
> 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-drm-amdgpu-issues-with-new-mmu_range_notifier-api.patch --]
[-- Type: text/x-patch; name="0001-drm-amdgpu-issues-with-new-mmu_range_notifier-api.patch", Size: 5274 bytes --]

From 5a0bd4d8cef8472fe2904550142d288feed8cd81 Mon Sep 17 00:00:00 2001
From: Philip Yang <Philip.Yang@amd.com>
Date: Thu, 31 Oct 2019 09:10:30 -0400
Subject: [PATCH] drm/amdgpu: issues with new mmu_range_notifier api

put mmu_range_set_seq under the same lock which is used to call
mmu_range_read_retry.

fix amdgpu_ttm_tt_get_user_pages_done return value, because
mmu_range_read_retry means !hmm_range_valid

retry if hmm_range_fault return -EBUSY

fix false WARN for missing get_user_page_done, we should check all
pages not just the first page, don't understand why this issue is
triggered by this change.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 32 +++++++--------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 37 +++++++++++++++++--------
 2 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index cb718a064eb4..c8bbd06f1009 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -67,21 +67,15 @@ static bool amdgpu_mn_invalidate_gfx(struct mmu_range_notifier *mrn,
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 	long r;
 
-	/*
-	 * FIXME: Must hold some lock shared with
-	 * amdgpu_ttm_tt_get_user_pages_done()
-	 */
-	mmu_range_set_seq(mrn, cur_seq);
+	mutex_lock(&adev->notifier_lock);
 
-	/* FIXME: Is this necessary? */
-	if (!amdgpu_ttm_tt_affect_userptr(bo->tbo.ttm, range->start,
-					  range->end))
-		return true;
+	mmu_range_set_seq(mrn, cur_seq);
 
-	if (!mmu_notifier_range_blockable(range))
+	if (!mmu_notifier_range_blockable(range)) {
+		mutex_unlock(&adev->notifier_lock);
 		return false;
+	}
 
-	mutex_lock(&adev->notifier_lock);
 	r = dma_resv_wait_timeout_rcu(bo->tbo.base.resv, true, false,
 				      MAX_SCHEDULE_TIMEOUT);
 	mutex_unlock(&adev->notifier_lock);
@@ -110,21 +104,15 @@ static bool amdgpu_mn_invalidate_hsa(struct mmu_range_notifier *mrn,
 	struct amdgpu_bo *bo = container_of(mrn, struct amdgpu_bo, notifier);
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 
-	/*
-	 * FIXME: Must hold some lock shared with
-	 * amdgpu_ttm_tt_get_user_pages_done()
-	 */
-	mmu_range_set_seq(mrn, cur_seq);
+	mutex_lock(&adev->notifier_lock);
 
-	/* FIXME: Is this necessary? */
-	if (!amdgpu_ttm_tt_affect_userptr(bo->tbo.ttm, range->start,
-					  range->end))
-		return true;
+	mmu_range_set_seq(mrn, cur_seq);
 
-	if (!mmu_notifier_range_blockable(range))
+	if (!mmu_notifier_range_blockable(range)) {
+		mutex_unlock(&adev->notifier_lock);
 		return false;
+	}
 
-	mutex_lock(&adev->notifier_lock);
 	amdgpu_amdkfd_evict_userptr(bo->kfd_bo, bo->notifier.mm);
 	mutex_unlock(&adev->notifier_lock);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index a38437fd290a..56fde43d5efa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -799,10 +799,11 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
 {
 	struct ttm_tt *ttm = bo->tbo.ttm;
 	struct amdgpu_ttm_tt *gtt = (void *)ttm;
-	struct mm_struct *mm;
-	struct hmm_range *range;
 	unsigned long start = gtt->userptr;
 	struct vm_area_struct *vma;
+	struct hmm_range *range;
+	unsigned long timeout;
+	struct mm_struct *mm;
 	unsigned long i;
 	int r = 0;
 
@@ -841,8 +842,6 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
 		goto out_free_ranges;
 	}
 
-	range->notifier_seq = mmu_range_read_begin(&bo->notifier);
-
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, start);
 	if (unlikely(!vma || start < vma->vm_start)) {
@@ -854,12 +853,20 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
 		r = -EPERM;
 		goto out_unlock;
 	}
+	up_read(&mm->mmap_sem);
+	timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
+
+retry:
+	range->notifier_seq = mmu_range_read_begin(&bo->notifier);
 
+	down_read(&mm->mmap_sem);
 	r = hmm_range_fault(range, 0);
 	up_read(&mm->mmap_sem);
-
-	if (unlikely(r < 0))
+	if (unlikely(r <= 0)) {
+		if ((r == 0 || r == -EBUSY) && !time_after(jiffies, timeout))
+			goto retry;
 		goto out_free_pfns;
+	}
 
 	for (i = 0; i < ttm->num_pages; i++) {
 		pages[i] = hmm_device_entry_to_page(range, range->pfns[i]);
@@ -916,7 +923,7 @@ bool amdgpu_ttm_tt_get_user_pages_done(struct ttm_tt *ttm)
 		gtt->range = NULL;
 	}
 
-	return r;
+	return !r;
 }
 #endif
 
@@ -997,10 +1004,18 @@ static void amdgpu_ttm_tt_unpin_userptr(struct ttm_tt *ttm)
 	sg_free_table(ttm->sg);
 
 #if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR)
-	if (gtt->range &&
-	    ttm->pages[0] == hmm_device_entry_to_page(gtt->range,
-						      gtt->range->pfns[0]))
-		WARN_ONCE(1, "Missing get_user_page_done\n");
+	if (gtt->range) {
+		unsigned long i;
+
+		for (i = 0; i < ttm->num_pages; i++) {
+			if (ttm->pages[i] !=
+				hmm_device_entry_to_page(gtt->range,
+					      gtt->range->pfns[i]))
+				break;
+		}
+
+		WARN((i == ttm->num_pages), "Missing get_user_page_done\n");
+	}
 #endif
 }
 
-- 
2.17.1


  reply	other threads:[~2019-11-01 14:44 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-28 20:10 [PATCH v2 00/15] Consolidate the mmu notifier interval_tree and locking Jason Gunthorpe
2019-10-28 20:10 ` [PATCH v2 01/15] mm/mmu_notifier: define the header pre-processor parts even if disabled Jason Gunthorpe
2019-11-05 21:23   ` John Hubbard
2019-11-06 13:36     ` Jason Gunthorpe
2019-10-28 20:10 ` [PATCH v2 02/15] mm/mmu_notifier: add an interval tree notifier Jason Gunthorpe
2019-10-29 22:04   ` Kuehling, Felix
2019-10-29 22:56     ` Jason Gunthorpe
2019-11-07  0:23   ` John Hubbard
2019-11-07  2:08     ` Jerome Glisse
2019-11-07 20:11       ` Jason Gunthorpe
2019-11-07 21:04         ` Jerome Glisse
2019-11-08  0:32           ` Jason Gunthorpe
2019-11-08  2:00             ` Jerome Glisse
2019-11-08 20:19               ` Jason Gunthorpe
2019-11-07 20:06     ` Jason Gunthorpe
2019-11-07 20:53       ` John Hubbard
2019-11-08 15:26         ` Jason Gunthorpe
2019-11-08  6:33       ` Christoph Hellwig
2019-11-08 13:43         ` Jerome Glisse
2019-10-28 20:10 ` [PATCH v2 03/15] mm/hmm: allow hmm_range to be used with a mmu_range_notifier or hmm_mirror Jason Gunthorpe
2019-10-28 20:10 ` [PATCH v2 04/15] mm/hmm: define the pre-processor related parts of hmm.h even if disabled Jason Gunthorpe
2019-10-28 20:10 ` [PATCH v2 05/15] RDMA/odp: Use mmu_range_notifier_insert() Jason Gunthorpe
2019-10-28 20:10 ` [PATCH v2 06/15] RDMA/hfi1: Use mmu_range_notifier_inset for user_exp_rcv Jason Gunthorpe
2019-10-29 12:19   ` Dennis Dalessandro
2019-10-29 12:51     ` Jason Gunthorpe
2019-10-28 20:10 ` [PATCH v2 07/15] drm/radeon: use mmu_range_notifier_insert Jason Gunthorpe
2019-10-29  7:48   ` Koenig, Christian
2019-10-28 20:10 ` [PATCH v2 08/15] xen/gntdev: Use select for DMA_SHARED_BUFFER Jason Gunthorpe
2019-11-01 18:26   ` Jason Gunthorpe
2019-11-05 14:44     ` Jürgen Groß
2019-11-07  9:39   ` Jürgen Groß
2019-10-28 20:10 ` [PATCH v2 09/15] xen/gntdev: use mmu_range_notifier_insert Jason Gunthorpe
2019-10-30 16:55   ` Boris Ostrovsky
2019-11-01 17:48     ` Jason Gunthorpe
2019-11-01 18:51       ` Boris Ostrovsky
2019-11-01 19:17         ` Jason Gunthorpe
2019-11-04 22:03   ` Boris Ostrovsky
2019-11-05  2:31     ` Jason Gunthorpe
2019-11-05 15:16       ` Boris Ostrovsky
2019-11-07 20:36         ` Jason Gunthorpe
2019-11-07 22:54           ` Boris Ostrovsky
2019-11-08 14:53             ` Jason Gunthorpe
2019-10-28 20:10 ` [PATCH v2 10/15] nouveau: use mmu_notifier directly for invalidate_range_start Jason Gunthorpe
2019-10-28 20:10 ` [PATCH v2 11/15] nouveau: use mmu_range_notifier instead of hmm_mirror Jason Gunthorpe
2019-10-28 20:10 ` [PATCH v2 13/15] drm/amdgpu: Use mmu_range_insert " Jason Gunthorpe
2019-10-29  7:51   ` Koenig, Christian
2019-10-29 13:59     ` Jason Gunthorpe
2019-10-29 22:14   ` Kuehling, Felix
2019-10-29 23:09     ` Jason Gunthorpe
2019-10-28 20:10 ` [PATCH v2 14/15] drm/amdgpu: Use mmu_range_notifier " Jason Gunthorpe
2019-10-29 19:22   ` Yang, Philip
2019-10-29 19:25     ` Jason Gunthorpe
2019-11-01 14:44       ` Yang, Philip [this message]
2019-11-01 15:12         ` Jason Gunthorpe
2019-11-01 15:59           ` Yang, Philip
2019-11-01 17:42             ` Jason Gunthorpe
2019-11-01 19:19               ` Jason Gunthorpe
2019-11-01 19:45               ` Yang, Philip
2019-11-01 19:50                 ` Yang, Philip
2019-11-01 19:51                 ` Jason Gunthorpe
2019-11-01 18:21         ` Jason Gunthorpe
2019-11-01 18:34         ` [PATCH v2a " Jason Gunthorpe
2019-10-28 20:10 ` [PATCH v2 15/15] mm/hmm: remove hmm_mirror and related Jason Gunthorpe
     [not found] ` <20191028201032.6352-13-jgg@ziepe.ca>
2019-10-29  7:49   ` [PATCH v2 12/15] drm/amdgpu: Call find_vma under mmap_sem Koenig, Christian
2019-10-29 16:28   ` Kuehling, Felix
2019-10-29 13:07     ` Christian König
2019-10-29 17:19     ` Jason Gunthorpe
2019-11-01 19:54 ` [PATCH v2 00/15] Consolidate the mmu notifier interval_tree and locking Jason Gunthorpe
2019-11-01 20:54 ` Ralph Campbell
2019-11-04 20:40   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=30b2f569-bf7a-5166-c98d-4a4a13d1351f@amd.com \
    --to=philip.yang@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=David1.Zhou@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bskeggs@redhat.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@infradead.org \
    --cc=jgg@mellanox.com \
    --cc=jglisse@redhat.com \
    --cc=jgross@suse.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mike.marciniszyn@intel.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=oleksandr_andrushchenko@epam.com \
    --cc=petrcvekcz@gmail.com \
    --cc=rcampbell@nvidia.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).