archive mirror
 help / color / mirror / Atom feed
From: Andrey Grodzovsky <>
To: Keith Busch <>
Cc: Bjorn Helgaas <>,
	"Deucher, Alexander" <>,
	"" <>,
	"" <>,
	"Koenig, Christian" <>,
	"Antonovitch, Anatoli" <>,
	Daniel Vetter <>
Subject: Re: Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case]
Date: Fri, 5 Feb 2021 16:40:06 -0500	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 2/5/21 4:35 PM, Keith Busch wrote:
> On Fri, Feb 05, 2021 at 03:42:01PM -0500, Andrey Grodzovsky wrote:
>> On 2/5/21 2:45 PM, Bjorn Helgaas wrote:
>>> On Fri, Feb 05, 2021 at 11:08:45AM -0500, Andrey Grodzovsky wrote:
>>>> For user mappings, including MMIO mappings, we have a reliable
>>>> approach where we invalidate device address space mappings for all
>>>> user on first sign of device disconnect and then on all subsequent
>>>> page faults from the users accessing those ranges we insert dummy
>>>> zero page into their respective page tables. It's actually the
>>>> kernel driver, where no page faulting can be used such as for user
>>>> space, I have issues on how to protect from keep accessing those
>>>> ranges which already are released by PCI subsystem and hence can be
>>>> allocated to another hot plugging device.
>>> That doesn't sound reliable to me, but maybe I don't understand what
>>> you mean by the "first sign of device disconnect."
>> See functions drm_dev_enter, drm_dev_exit and drm_dev_unplug in drm_derv.c
>>> At least from a PCI
>>> perspective, the first sign of a surprise hot unplug is likely to be
>>> an MMIO read that returns ~0.
>> We set drm_dev_unplug in amdgpu_pci_remove and base all later checks
>> with drm_dev_enter/drm_dev_exit on this
> It sounds like you are talking about an orderly notified unplug rather
> than a surprise hot unplug. If it's a surprise, the code doesn't get to
> fence off future MMIO access until well after the address range is
> already unreachable.

I am referring to surprise unplug on which we get notification from the PCI
subsystem which ends up calling our pci_driver.remove callback. I understand
there is a window of time within we are not yet notified but all our MMIO 
accesses will already fail because the device is physically gone at that point



      reply	other threads:[~2021-02-05 21:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <>
2021-02-05 15:38 ` Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case] Bjorn Helgaas
2021-02-05 16:08   ` Andrey Grodzovsky
2021-02-05 17:59     ` Daniel Vetter
2021-02-05 19:09       ` Andrey Grodzovsky
2021-02-05 19:45     ` Bjorn Helgaas
2021-02-05 20:42       ` Andrey Grodzovsky
2021-02-05 20:49         ` Daniel Vetter
2021-02-05 21:24           ` Andrey Grodzovsky
2021-02-05 22:15             ` Daniel Vetter
2021-02-05 21:35         ` Keith Busch
2021-02-05 21:40           ` Andrey Grodzovsky [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).