linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Felix Kuehling <felix.kuehling@amd.com>
Cc: David Hildenbrand <david@redhat.com>,
	Alex Sierra <alex.sierra@amd.com>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	rcampbell@nvidia.com, linux-ext4@vger.kernel.org,
	linux-xfs@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, hch@lst.de, jgg@nvidia.com,
	jglisse@redhat.com, willy@infradead.org
Subject: Re: [PATCH v6 01/10] mm: add zone device coherent type memory support
Date: Mon, 14 Feb 2022 13:04:35 +1100	[thread overview]
Message-ID: <875ypigp39.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <3acdebb4-310e-1edb-d7b0-a79db348f6f2@amd.com>

[-- Attachment #1: Type: text/plain, Size: 5036 bytes --]

Felix Kuehling <felix.kuehling@amd.com> writes:

> Am 2022-02-11 um 11:15 schrieb David Hildenbrand:
>> On 01.02.22 16:48, Alex Sierra wrote:
>>> Device memory that is cache coherent from device and CPU point of view.
>>> This is used on platforms that have an advanced system bus (like CAPI
>>> or CXL). Any page of a process can be migrated to such memory. However,
>>> no one should be allowed to pin such memory so that it can always be
>>> evicted.
>>>
>>> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
>>> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>> Reviewed-by: Alistair Popple <apopple@nvidia.com>
>> So, I’m currently messing with PageAnon() pages and CoW semantics …
>> all these PageAnon() ZONE_DEVICE variants don’t necessarily make my life
>> easier but I’m not sure yet if they make my life harder. I hope you can
>> help me understand some of that stuff.
>>
>> 1) What are expected CoW semantics for DEVICE_COHERENT?
>>
>> I assume we’ll share them just like other PageAnon() pages during fork()
>> readable, and the first sharer writing to them receives an “ordinary”
>> !ZONE_DEVICE copy.
>
> Yes.
>
>
>>
>> So this would be just like DEVICE_EXCLUSIVE CoW handling I assume, just
>> that we don’t have to go through the loop of restoring a device
>> exclusive entry?
>
> I’m not sure how DEVICE_EXCLUSIVE pages are handled under CoW. As I understand
> it, they’re not really in a special memory zone like DEVICE_COHERENT. Just a
> special way of mapping an ordinary page in order to allow device-exclusive
> access for some time. I suspect there may even be a possibility that a page can
> be both DEVICE_EXCLUSIVE and DEVICE_COHERENT.

Right - there aren’t really device exclusive pages, they are just special
non-present ptes conceptually pretty similar to migration entries. The
difference is that on CPU fault (or fork) the original entry is restored
immediately after notifying the device that it no longer has exclusive access.

As device exclusive entries can be turned into normal entries whenever required
we handle CoW by restoring the original ptes if a device exclusive entry is
encountered. This reduces the chances of introducing any subtle CoW bugs as it
just gets handled the same as any normal page table entry (because the exclusive
entries will have been removed).

> That said, your statement sounds correct. There is no requirement to do anything
> with the new “ordinary” page after copying. What actually happens to
> DEVICE_COHERENT pages on CoW is a bit convoluted:
>
> When the page is marked as CoW, it is marked R/O in the CPU page table. This
> causes an MMU notifier that invalidates the device PTE. The next device access
> in the parent process causes a page fault. If that’s a write fault (usually is
> in our current driver), it will trigger CoW, which means the parent process now
> gets a new system memory copy of the page, while the child process keeps the
> DEVICE_COHERENT page. The driver could decide to migrate the page back to a new
> DEVICE_COHERENT allocation.
>
> In practice that means, “fork” basically causes all DEVICE_COHERENT memory in
> the parent process to be migrated to ordinary system memory, which is quite
> disruptive. What we have today results in correct behaviour, but the performance
> is far from ideal.
>
> We could probably mitigate it by making the driver better at mapping pages R/O
> in the device on read faults, at the potential cost of having to handle a second
> (write) fault later.
>
>
>>
>> 2) How are these pages freed to clear/invalidate PageAnon() ?
>>
>> I assume for PageAnon() ZONE_DEVICE pages we’ll always for via
>> free_devmap_managed_page(), correct?
>
> Yes. The driver depends on the the page->pgmap->ops->page_free callback to free
> the device memory allocation backing the page.
>
>
>>
>>
>> 3) FOLL_PIN
>>
>> While you write “no one should be allowed to pin such memory”, patch #2
>> only blocks FOLL_LONGTERM. So I assume we allow ordinary FOLL_PIN and
>> you might want to be a bit more precise?
>
> I agree. I think the paragraph was written before we fully fleshed out the
> interaction with GUP, and the forgotten.
>
>
>>
>>
>> … I’m pretty sure we cannot FOLL_PIN DEVICE_PRIVATE pages,
>
> Right. Trying to GUP a DEVICE_PRIVATE page causes a page fault that migrates the
> page back to normal system memory (using the page->pgmap->ops->migrate_to_ram
> callback). Then you pin the system memory page.
>
>
>>   but can we
>> FILL_PIN DEVICE_EXCLUSIVE pages? I strongly assume so?

In the case of device exclusive entries GUP/PUP will fault and restore the
original entry. It will then pin the original normal page pointed to by the
device exclusive entry.

• Alistair

>
> I assume you mean DEVICE_COHERENT, not DEVICE_EXCLUSIVE? In that case the answer
> is “Yes”.
>
> Regards,
>   Felix
>
>
>>
>>
>> Thanks for any information.
>>

  reply	other threads:[~2022-02-14  2:27 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-01 15:48 [PATCH v6 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
2022-02-01 15:48 ` [PATCH v6 01/10] mm: add zone device coherent type memory support Alex Sierra
2022-02-11 16:15   ` David Hildenbrand
2022-02-11 16:39     ` David Hildenbrand
2022-02-11 16:52       ` Sierra Guiza, Alejandro (Alex)
2022-02-11 17:07       ` Felix Kuehling
2022-02-15 12:16         ` David Hildenbrand
2022-02-15 14:45           ` Jason Gunthorpe
2022-02-15 18:32             ` Christoph Hellwig
2022-02-15 19:41               ` Jason Gunthorpe
2022-02-15 21:35                 ` Felix Kuehling
2022-02-15 21:47                   ` Jason Gunthorpe
2022-02-15 22:49                     ` Felix Kuehling
2022-02-16  2:01                       ` Jason Gunthorpe
2022-02-16 16:56                         ` Felix Kuehling
2022-02-16 17:28                           ` Jason Gunthorpe
2022-02-16  1:23                     ` Alistair Popple
2022-02-16  2:03                       ` Jason Gunthorpe
2022-02-16  2:36                         ` Alistair Popple
2022-02-16  8:31                           ` David Hildenbrand
2022-02-16 12:26                             ` Jason Gunthorpe
2022-02-17  1:05                               ` Alistair Popple
2022-02-17 21:12                               ` Felix Kuehling
2022-02-18  0:19                                 ` Jason Gunthorpe
2022-02-18 19:20                                   ` Felix Kuehling
2022-02-18 19:26                                     ` Jason Gunthorpe
2022-02-18 19:37                                       ` Felix Kuehling
2022-02-28 20:34                                       ` [PATCH] mm: split vm_normal_pages for LRU and non-LRU handling Alex Sierra
2022-02-28 22:41                                         ` Felix Kuehling
2022-03-01  8:03                                         ` David Hildenbrand
2022-03-01 16:08                                           ` Felix Kuehling
2022-03-01 16:22                                             ` David Hildenbrand
2022-03-01 16:30                                               ` Felix Kuehling
2022-03-01 16:32                                                 ` David Hildenbrand
2022-02-18  0:59                                 ` [PATCH v6 01/10] mm: add zone device coherent type memory support Alistair Popple
2022-02-11 16:45     ` Jason Gunthorpe
2022-02-11 16:49       ` David Hildenbrand
2022-02-11 16:56         ` Jason Gunthorpe
2022-02-15 12:15           ` David Hildenbrand
2022-02-15 18:52             ` Felix Kuehling
2022-02-11 17:05     ` Felix Kuehling
2022-02-14  2:04       ` Alistair Popple [this message]
2022-02-01 15:48 ` [PATCH v6 02/10] mm: add device coherent vma selection for memory migration Alex Sierra
2022-02-01 15:48 ` [PATCH v6 03/10] mm/gup: fail get_user_pages for LONGTERM dev coherent type Alex Sierra
2022-02-01 15:48 ` [PATCH v6 04/10] drm/amdkfd: add SPM support for SVM Alex Sierra
2022-02-01 15:48 ` [PATCH v6 05/10] drm/amdkfd: coherent type as sys mem on migration to ram Alex Sierra
2022-02-01 15:48 ` [PATCH v6 06/10] lib: test_hmm add ioctl to get zone device type Alex Sierra
2022-02-01 15:48 ` [PATCH v6 07/10] lib: test_hmm add module param for " Alex Sierra
2022-02-01 15:48 ` [PATCH v6 08/10] lib: add support for device coherent type in test_hmm Alex Sierra
2022-02-01 15:49 ` [PATCH v6 09/10] tools: update hmm-test to support device coherent type Alex Sierra
2022-02-01 15:49 ` [PATCH v6 10/10] tools: update test_hmm script to support SP config Alex Sierra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875ypigp39.fsf@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=david@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=hch@lst.de \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=rcampbell@nvidia.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).