* hmm_range_fault interaction between different drivers
@ 2022-07-21 23:00 Felix Kuehling
2022-07-22 15:34 ` Jason Gunthorpe
0 siblings, 1 reply; 3+ messages in thread
From: Felix Kuehling @ 2022-07-21 23:00 UTC (permalink / raw)
To: linux-mm, Jason Gunthorpe, Alistair Popple, Yang, Philip
Hi all,
We're noticing some unexpected behaviour when the amdgpu and Mellanox
drivers are interacting on shared memory with hmm_range_fault. If the
amdgpu driver migrated pages to DEVICE_PRIVATE memory, we would expect
hmm_range_fault called by the Mellanox driver to fault them back to
system memory. But that's not happening. Instead hmm_range_fault fails.
For an experiment, Philip hacked hmm_vma_handle_pte to treat
DEVICE_PRIVATE pages like device_exclusive pages, which gave us the
expected behaviour. It would result in a dev_pagemap_ops.migrate_to_ram
callback in our driver, and hmm_range_fault would return system memory
pages to the Mellanox driver.
So something is clearly wrong. It could be:
* our expectations are wrong,
* the implementation of hmm_range_fault is wrong, or
* our driver is missing something when migrating to DEVICE_PRIVATE memory.
Do you have any insights?
Thank you,
Felix
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: hmm_range_fault interaction between different drivers
2022-07-21 23:00 hmm_range_fault interaction between different drivers Felix Kuehling
@ 2022-07-22 15:34 ` Jason Gunthorpe
2022-07-22 16:47 ` Ralph Campbell
0 siblings, 1 reply; 3+ messages in thread
From: Jason Gunthorpe @ 2022-07-22 15:34 UTC (permalink / raw)
To: Felix Kuehling; +Cc: linux-mm, Alistair Popple, Yang, Philip
On Thu, Jul 21, 2022 at 07:00:23PM -0400, Felix Kuehling wrote:
> Hi all,
>
> We're noticing some unexpected behaviour when the amdgpu and Mellanox
> drivers are interacting on shared memory with hmm_range_fault. If the amdgpu
> driver migrated pages to DEVICE_PRIVATE memory, we would expect
> hmm_range_fault called by the Mellanox driver to fault them back to system
> memory. But that's not happening. Instead hmm_range_fault fails.
>
> For an experiment, Philip hacked hmm_vma_handle_pte to treat DEVICE_PRIVATE
> pages like device_exclusive pages, which gave us the expected behaviour. It
> would result in a dev_pagemap_ops.migrate_to_ram callback in our driver, and
> hmm_range_fault would return system memory pages to the Mellanox driver.
>
> So something is clearly wrong. It could be:
>
> * our expectations are wrong,
> * the implementation of hmm_range_fault is wrong, or
> * our driver is missing something when migrating to DEVICE_PRIVATE memory.
>
> Do you have any insights?
I think it is a bug
Jason
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: hmm_range_fault interaction between different drivers
2022-07-22 15:34 ` Jason Gunthorpe
@ 2022-07-22 16:47 ` Ralph Campbell
0 siblings, 0 replies; 3+ messages in thread
From: Ralph Campbell @ 2022-07-22 16:47 UTC (permalink / raw)
To: Jason Gunthorpe, Felix Kuehling; +Cc: linux-mm, Alistair Popple, Yang, Philip
[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]
On 7/22/22 08:34, Jason Gunthorpe wrote:
> On Thu, Jul 21, 2022 at 07:00:23PM -0400, Felix Kuehling wrote:
>> Hi all,
>>
>> We're noticing some unexpected behaviour when the amdgpu and Mellanox
>> drivers are interacting on shared memory with hmm_range_fault. If the amdgpu
>> driver migrated pages to DEVICE_PRIVATE memory, we would expect
>> hmm_range_fault called by the Mellanox driver to fault them back to system
>> memory. But that's not happening. Instead hmm_range_fault fails.
>>
>> For an experiment, Philip hacked hmm_vma_handle_pte to treat DEVICE_PRIVATE
>> pages like device_exclusive pages, which gave us the expected behaviour. It
>> would result in a dev_pagemap_ops.migrate_to_ram callback in our driver, and
>> hmm_range_fault would return system memory pages to the Mellanox driver.
>>
>> So something is clearly wrong. It could be:
>>
>> * our expectations are wrong,
>> * the implementation of hmm_range_fault is wrong, or
>> * our driver is missing something when migrating to DEVICE_PRIVATE memory.
>>
>> Do you have any insights?
> I think it is a bug
>
> Jason
Yes, looks like a bug to me too. In hmm_vma_handle_pte(), it calls
hmm_is_device_private_entry() which correctly handles the case where
the device private entry is owned by the driver calling hmm_range_fault()
but then does nothing to fault in the page if it is a device private
entry not owned by the driver.
I'll work with Alistair and one of us will post a fix.
Thanks for finding this!
[-- Attachment #2: Type: text/html, Size: 1915 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-07-22 16:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-21 23:00 hmm_range_fault interaction between different drivers Felix Kuehling
2022-07-22 15:34 ` Jason Gunthorpe
2022-07-22 16:47 ` Ralph Campbell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.