From: Ralph Campbell <rcampbell@nvidia.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Christoph Hellwig <hch@lst.de>, <nouveau@lists.freedesktop.org>,
<linux-kernel@vger.kernel.org>,
Jerome Glisse <jglisse@redhat.com>,
"John Hubbard" <jhubbard@nvidia.com>,
Ben Skeggs <bskeggs@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
Bharata B Rao <bharata@linux.ibm.com>
Subject: Re: [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration
Date: Thu, 25 Jun 2020 10:42:24 -0700 [thread overview]
Message-ID: <412b3a51-619a-4f94-da96-f4596e5eb510@nvidia.com> (raw)
In-Reply-To: <20200625173144.GT6578@ziepe.ca>
On 6/25/20 10:31 AM, Jason Gunthorpe wrote:
> On Thu, Jun 25, 2020 at 10:25:38AM -0700, Ralph Campbell wrote:
>> Making sure to include linux-mm and Bharata B Rao for IBM's
>> use of migrate_vma*().
>>
>> On 6/24/20 11:10 AM, Ralph Campbell wrote:
>>>
>>> On 6/24/20 12:23 AM, Christoph Hellwig wrote:
>>>> On Mon, Jun 22, 2020 at 04:38:53PM -0700, Ralph Campbell wrote:
>>>>> The OpenCL function clEnqueueSVMMigrateMem(), without any flags, will
>>>>> migrate memory in the given address range to device private memory. The
>>>>> source pages might already have been migrated to device private memory.
>>>>> In that case, the source struct page is not checked to see if it is
>>>>> a device private page and incorrectly computes the GPU's physical
>>>>> address of local memory leading to data corruption.
>>>>> Fix this by checking the source struct page and computing the correct
>>>>> physical address.
>>>>
>>>> I'm really worried about all this delicate code to fix the mixed
>>>> ranges. Can't we make it clear at the migrate_vma_* level if we want
>>>> to migrate from or two device private memory, and then skip all the work
>>>> for regions of memory that already are in the right place? This might be
>>>> a little more work initially, but I think it leads to a much better
>>>> API.
>>>>
>>>
>>> The current code does encode the direction with src_owner != NULL meaning
>>> device private to system memory and src_owner == NULL meaning system
>>> memory to device private memory. This patch would obviously defeat that
>>> so perhaps a flag could be added to the struct migrate_vma to indicate the
>>> direction but I'm unclear how that makes things less delicate.
>>> Can you expand on what you are worried about?
>>>
>>> The issue with invalidations might be better addressed by letting the device
>>> driver handle device private page TLB invalidations when migrating to
>>> system memory and changing migrate_vma_setup() to only invalidate CPU
>>> TLB entries for normal pages being migrated to device private memory.
>>> If a page isn't migrating, it seems inefficient to invalidate those TLB
>>> entries.
>>>
>>> Any other suggestions?
>>
>> After a night's sleep, I think this might work. What do others think?
>>
>> 1) Add a new MMU_NOTIFY_MIGRATE enum to mmu_notifier_event.
>>
>> 2) Change migrate_vma_collect() to use the new MMU_NOTIFY_MIGRATE event type.
>>
>> 3) Modify nouveau_svmm_invalidate_range_start() to simply return (no invalidations)
>> for MMU_NOTIFY_MIGRATE mmu notifier callbacks.
>
> Isn't it a bit of an assumption that migrate_vma_collect() is only
> used by nouveau itself?
>
> What if some other devices' device_private pages are being migrated?
>
> Jason
>
Good point. The driver needs a way of knowing the callback is due its call
to migrate_vma_setup() and not some other migration invalidation.
How about adding a void pointer to struct mmu_notifier_range
which migrate_vma_collect() can set to src_owner. If the event is
MMU_NOTIFY_MIGRATE and the src_owner matches the void pointer, then the
callback should be the one the driver initiated.
prev parent reply other threads:[~2020-06-25 17:42 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20200622233854.10889-1-rcampbell@nvidia.com>
[not found] ` <20200622233854.10889-3-rcampbell@nvidia.com>
[not found] ` <20200624072355.GB18609@lst.de>
[not found] ` <330f6a82-d01d-db97-1dec-69346f41e707@nvidia.com>
2020-06-25 17:25 ` [RESEND PATCH 2/3] nouveau: fix mixed normal and device private page migration Ralph Campbell
2020-06-25 17:31 ` Jason Gunthorpe
2020-06-25 17:42 ` Ralph Campbell [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=412b3a51-619a-4f94-da96-f4596e5eb510@nvidia.com \
--to=rcampbell@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=bharata@linux.ibm.com \
--cc=bskeggs@redhat.com \
--cc=hch@lst.de \
--cc=jgg@ziepe.ca \
--cc=jglisse@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nouveau@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).