All of lore.kernel.org
 help / color / mirror / Atom feed
* hmm_range_fault interaction between different drivers
@ 2022-07-21 23:00 Felix Kuehling
  2022-07-22 15:34 ` Jason Gunthorpe
  0 siblings, 1 reply; 3+ messages in thread
From: Felix Kuehling @ 2022-07-21 23:00 UTC (permalink / raw)
  To: linux-mm, Jason Gunthorpe, Alistair Popple, Yang, Philip

Hi all,

We're noticing some unexpected behaviour when the amdgpu and Mellanox 
drivers are interacting on shared memory with hmm_range_fault. If the 
amdgpu driver migrated pages to DEVICE_PRIVATE memory, we would expect 
hmm_range_fault called by the Mellanox driver to fault them back to 
system memory. But that's not happening. Instead hmm_range_fault fails.

For an experiment, Philip hacked hmm_vma_handle_pte to treat 
DEVICE_PRIVATE pages like device_exclusive pages, which gave us the 
expected behaviour. It would result in a dev_pagemap_ops.migrate_to_ram 
callback in our driver, and hmm_range_fault would return system memory 
pages to the Mellanox driver.

So something is clearly wrong. It could be:

  * our expectations are wrong,
  * the implementation of hmm_range_fault is wrong, or
  * our driver is missing something when migrating to DEVICE_PRIVATE memory.

Do you have any insights?

Thank you,
   Felix




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: hmm_range_fault interaction between different drivers
  2022-07-21 23:00 hmm_range_fault interaction between different drivers Felix Kuehling
@ 2022-07-22 15:34 ` Jason Gunthorpe
  2022-07-22 16:47   ` Ralph Campbell
  0 siblings, 1 reply; 3+ messages in thread
From: Jason Gunthorpe @ 2022-07-22 15:34 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: linux-mm, Alistair Popple, Yang, Philip

On Thu, Jul 21, 2022 at 07:00:23PM -0400, Felix Kuehling wrote:
> Hi all,
> 
> We're noticing some unexpected behaviour when the amdgpu and Mellanox
> drivers are interacting on shared memory with hmm_range_fault. If the amdgpu
> driver migrated pages to DEVICE_PRIVATE memory, we would expect
> hmm_range_fault called by the Mellanox driver to fault them back to system
> memory. But that's not happening. Instead hmm_range_fault fails.
> 
> For an experiment, Philip hacked hmm_vma_handle_pte to treat DEVICE_PRIVATE
> pages like device_exclusive pages, which gave us the expected behaviour. It
> would result in a dev_pagemap_ops.migrate_to_ram callback in our driver, and
> hmm_range_fault would return system memory pages to the Mellanox driver.
> 
> So something is clearly wrong. It could be:
> 
>  * our expectations are wrong,
>  * the implementation of hmm_range_fault is wrong, or
>  * our driver is missing something when migrating to DEVICE_PRIVATE memory.
> 
> Do you have any insights?

I think it is a bug

Jason


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: hmm_range_fault interaction between different drivers
  2022-07-22 15:34 ` Jason Gunthorpe
@ 2022-07-22 16:47   ` Ralph Campbell
  0 siblings, 0 replies; 3+ messages in thread
From: Ralph Campbell @ 2022-07-22 16:47 UTC (permalink / raw)
  To: Jason Gunthorpe, Felix Kuehling; +Cc: linux-mm, Alistair Popple, Yang, Philip

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

On 7/22/22 08:34, Jason Gunthorpe wrote:
> On Thu, Jul 21, 2022 at 07:00:23PM -0400, Felix Kuehling wrote:
>> Hi all,
>>
>> We're noticing some unexpected behaviour when the amdgpu and Mellanox
>> drivers are interacting on shared memory with hmm_range_fault. If the amdgpu
>> driver migrated pages to DEVICE_PRIVATE memory, we would expect
>> hmm_range_fault called by the Mellanox driver to fault them back to system
>> memory. But that's not happening. Instead hmm_range_fault fails.
>>
>> For an experiment, Philip hacked hmm_vma_handle_pte to treat DEVICE_PRIVATE
>> pages like device_exclusive pages, which gave us the expected behaviour. It
>> would result in a dev_pagemap_ops.migrate_to_ram callback in our driver, and
>> hmm_range_fault would return system memory pages to the Mellanox driver.
>>
>> So something is clearly wrong. It could be:
>>
>>   * our expectations are wrong,
>>   * the implementation of hmm_range_fault is wrong, or
>>   * our driver is missing something when migrating to DEVICE_PRIVATE memory.
>>
>> Do you have any insights?
> I think it is a bug
>
> Jason

Yes, looks like a bug to me too. In hmm_vma_handle_pte(), it calls
hmm_is_device_private_entry() which correctly handles the case where
the device private entry is owned by the driver calling hmm_range_fault()
but then does nothing to fault in the page if it is a device private
entry not owned by the driver.

I'll work with Alistair and one of us will post a fix.
Thanks for finding this!

[-- Attachment #2: Type: text/html, Size: 1915 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-07-22 16:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-21 23:00 hmm_range_fault interaction between different drivers Felix Kuehling
2022-07-22 15:34 ` Jason Gunthorpe
2022-07-22 16:47   ` Ralph Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.