All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ralph Campbell <rcampbell@nvidia.com>
To: John Hubbard <jhubbard@nvidia.com>,
	<nouveau@lists.freedesktop.org>, <linux-kernel@vger.kernel.org>
Cc: Jerome Glisse <jglisse@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	"Jason Gunthorpe" <jgg@mellanox.com>,
	Ben Skeggs <bskeggs@redhat.com>
Subject: Re: [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration
Date: Mon, 22 Jun 2020 18:42:00 -0700	[thread overview]
Message-ID: <730e85c9-33b5-9c57-7123-057b75cbbddf@nvidia.com> (raw)
In-Reply-To: <f2bf81df-8faa-0f51-3f74-cb3b31d96aad@nvidia.com>


On 6/22/20 5:30 PM, John Hubbard wrote:
> On 2020-06-22 16:38, Ralph Campbell wrote:
>> The OpenCL function clEnqueueSVMMigrateMem(), without any flags, will
>> migrate memory in the given address range to device private memory. The
>> source pages might already have been migrated to device private memory.
>> In that case, the source struct page is not checked to see if it is
>> a device private page and incorrectly computes the GPU's physical
>> address of local memory leading to data corruption.
>> Fix this by checking the source struct page and computing the correct
>> physical address.
>>
>> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
>> ---
>>   drivers/gpu/drm/nouveau/nouveau_dmem.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> index cc9993837508..f6a806ba3caa 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> @@ -540,6 +540,12 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
>>       if (!(src & MIGRATE_PFN_MIGRATE))
>>           goto out;
>> +    if (spage && is_device_private_page(spage)) {
>> +        paddr = nouveau_dmem_page_addr(spage);
>> +        *dma_addr = DMA_MAPPING_ERROR;
>> +        goto done;
>> +    }
>> +
>>       dpage = nouveau_dmem_page_alloc_locked(drm);
>>       if (!dpage)
>>           goto out;
>> @@ -560,6 +566,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
>>               goto out_free_page;
>>       }
>> +done:
>>       *pfn = NVIF_VMM_PFNMAP_V0_V | NVIF_VMM_PFNMAP_V0_VRAM |
>>           ((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
>>       if (src & MIGRATE_PFN_WRITE)
>> @@ -615,6 +622,7 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
>>       struct migrate_vma args = {
>>           .vma        = vma,
>>           .start        = start,
>> +        .src_owner    = drm->dev,
> 
> Hi Ralph,
> 
> This .src_owner setting does look like a required fix, but it seems like
> a completely separate fix from what is listed in this patch's commit
> description, right? (It feels like a casualty of rearranging the patches.)
> 
> 
> thanks,

It's a bit more complex. There is a catch-22 here with the change to mm/migrate.c.
Without this patch or mm/migrate.c, a second call to clEnqueueSVMMigrateMem()
for the same address range will invalidate the GPU mapping to device private memory
created by the first call.
With this patch but not mm/migrate.c, the first call to clEnqueueSVMMigrateMem()
will fail to migrate normal anonymous memory to device private memory.
Without this patch but including the change to mm/migrate.c, a second call to
clEnqueueSVMMigrateMem() will crash the kernel because dma_map_page() will be
called with the device private PFN which is not a valid CPU physical address.
With both changes, a range of anonymous and device private pages can be migrated
to the GPU and the GPU page tables updated properly.

WARNING: multiple messages have this Message-ID (diff)
From: Ralph Campbell <rcampbell@nvidia.com>
To: John Hubbard <jhubbard@nvidia.com>,
	nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org
Cc: Jerome Glisse <jglisse@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Jason Gunthorpe <jgg@mellanox.com>,
	Ben Skeggs <bskeggs@redhat.com>
Subject: Re: [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration
Date: Mon, 22 Jun 2020 18:42:00 -0700	[thread overview]
Message-ID: <730e85c9-33b5-9c57-7123-057b75cbbddf@nvidia.com> (raw)
In-Reply-To: <f2bf81df-8faa-0f51-3f74-cb3b31d96aad@nvidia.com>


On 6/22/20 5:30 PM, John Hubbard wrote:
> On 2020-06-22 16:38, Ralph Campbell wrote:
>> The OpenCL function clEnqueueSVMMigrateMem(), without any flags, will
>> migrate memory in the given address range to device private memory. The
>> source pages might already have been migrated to device private memory.
>> In that case, the source struct page is not checked to see if it is
>> a device private page and incorrectly computes the GPU's physical
>> address of local memory leading to data corruption.
>> Fix this by checking the source struct page and computing the correct
>> physical address.
>>
>> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
>> ---
>>   drivers/gpu/drm/nouveau/nouveau_dmem.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> index cc9993837508..f6a806ba3caa 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> @@ -540,6 +540,12 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
>>       if (!(src & MIGRATE_PFN_MIGRATE))
>>           goto out;
>> +    if (spage && is_device_private_page(spage)) {
>> +        paddr = nouveau_dmem_page_addr(spage);
>> +        *dma_addr = DMA_MAPPING_ERROR;
>> +        goto done;
>> +    }
>> +
>>       dpage = nouveau_dmem_page_alloc_locked(drm);
>>       if (!dpage)
>>           goto out;
>> @@ -560,6 +566,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
>>               goto out_free_page;
>>       }
>> +done:
>>       *pfn = NVIF_VMM_PFNMAP_V0_V | NVIF_VMM_PFNMAP_V0_VRAM |
>>           ((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
>>       if (src & MIGRATE_PFN_WRITE)
>> @@ -615,6 +622,7 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
>>       struct migrate_vma args = {
>>           .vma        = vma,
>>           .start        = start,
>> +        .src_owner    = drm->dev,
> 
> Hi Ralph,
> 
> This .src_owner setting does look like a required fix, but it seems like
> a completely separate fix from what is listed in this patch's commit
> description, right? (It feels like a casualty of rearranging the patches.)
> 
> 
> thanks,

It's a bit more complex. There is a catch-22 here with the change to mm/migrate.c.
Without this patch or mm/migrate.c, a second call to clEnqueueSVMMigrateMem()
for the same address range will invalidate the GPU mapping to device private memory
created by the first call.
With this patch but not mm/migrate.c, the first call to clEnqueueSVMMigrateMem()
will fail to migrate normal anonymous memory to device private memory.
Without this patch but including the change to mm/migrate.c, a second call to
clEnqueueSVMMigrateMem() will crash the kernel because dma_map_page() will be
called with the device private PFN which is not a valid CPU physical address.
With both changes, a range of anonymous and device private pages can be migrated
to the GPU and the GPU page tables updated properly.

  reply	other threads:[~2020-06-23  1:42 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-22 23:38 [RESEND PATCH 0/3] nouveau: fixes for SVM Ralph Campbell
2020-06-22 23:38 ` Ralph Campbell
2020-06-22 23:38 ` [RESEND PATCH 1/3] nouveau: fix migrate page regression Ralph Campbell
2020-06-22 23:38   ` Ralph Campbell
2020-06-23  0:51   ` John Hubbard
2020-06-23  0:51     ` John Hubbard
2020-06-25  5:23     ` [Nouveau] " Ben Skeggs
2020-06-25  5:25       ` Ben Skeggs
2020-06-22 23:38 ` [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration Ralph Campbell
2020-06-22 23:38   ` Ralph Campbell
2020-06-23  0:30   ` John Hubbard
2020-06-23  0:30     ` John Hubbard
2020-06-23  1:42     ` Ralph Campbell [this message]
2020-06-23  1:42       ` Ralph Campbell
2020-06-24  7:23   ` Christoph Hellwig
2020-06-24 18:10     ` Ralph Campbell
2020-06-24 18:10       ` Ralph Campbell
2020-06-25 17:25       ` Ralph Campbell
2020-06-25 17:25         ` Ralph Campbell
2020-06-25 17:31         ` Jason Gunthorpe
2020-06-25 17:42           ` Ralph Campbell
2020-06-25 17:42             ` Ralph Campbell
2020-06-22 23:38 ` [RESEND PATCH 3/3] nouveau: make nvkm_vmm_ctor() and nvkm_mmu_ptp_get() static Ralph Campbell
2020-06-22 23:38   ` Ralph Campbell
2020-06-23  0:57   ` John Hubbard
2020-06-23  0:57     ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=730e85c9-33b5-9c57-7123-057b75cbbddf@nvidia.com \
    --to=rcampbell@nvidia.com \
    --cc=bskeggs@redhat.com \
    --cc=hch@lst.de \
    --cc=jgg@mellanox.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.