linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case]
       [not found] <31a7498d-dd68-f194-cbf5-1f73a53322ff@amd.com>
@ 2021-02-05 15:38 ` Bjorn Helgaas
  2021-02-05 16:08   ` Andrey Grodzovsky
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2021-02-05 15:38 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Deucher, Alexander, s.miroshnichenko, linux-pci, Koenig,
	Christian, Antonovitch, Anatoli

On Thu, Feb 04, 2021 at 11:03:10AM -0500, Andrey Grodzovsky wrote:
> + linux-pci mailing list and a bit of extra details bellow.
> 
> On 2/2/21 12:51 PM, Andrey Grodzovsky wrote:
> > Bjorn, Sergey I would also like to use this opportunity to ask you a
> > question on a related topic - Hot unplug.
> > I've been working for a while on this (see latest patchset set here
> > https://lists.freedesktop.org/archives/amd-gfx/2021-January/058595.html).
> > My question is specifically regarding this patch
> > https://lists.freedesktop.org/archives/amd-gfx/2021-January/058606.html
> > - the idea here is to
> > prevent any accesses to MMIO address ranges still mapped in kernel page

I think you're talking about a PCI BAR that is mapped into a user
process.

It is impossible to reliably *prevent* MMIO accesses to a BAR on a
PCI device that has been unplugged.  There is always a window where
the CPU issues a load/store and the device is unplugged before the
load/store completes.

If a PCIe device is unplugged, an MMIO read to that BAR will complete
on PCIe with an uncorrectable fatal error.  When that happens there is
no valid data from the PCIe device, so the PCIe host bridge typically
fabricates ~0 data so the CPU load instruction can complete.

If you want to reliably recover from unplugs like this, I think you
have to check for that ~0 data at the important points, i.e., where
you make decisions based on the data.  Of course, ~0 may be valid data
in some cases.  You have to use your knowledge of the device
programming model to determine whether ~0 is possible, and if it is,
check in some other way, like another MMIO read, to tell whether the
read succeeded and returned ~0, or failed because of PCIe error and
returned ~0.

> > table by the driver AFTER the device is gone physically and from the
> > PCI  subsystem, post pci_driver.remove
> > call back execution. This happens because struct device (and struct
> > drm_device representing the graphic card) are still present because some
> > user clients which  are not aware
> > of hot removal still hold device file open and hence prevents device
> > refcount to drop to 0. The solution in this patch is brute force where
> > we try and find any place we access MMIO
> > mapped to kernel address space and guard against the write access with a
> > 'device-unplug' flag. This solution is obliviously racy because a device
> > can be unplugged right after checking the falg.
> > I had an idea to instead not release but to keep those ranges reserved
> > even after pci_driver.remove, until DRM device is destroyed when it's
> > refcount drops to 0 and by this to prevent new devices plugged in
> > and allocated some of the same MMIO address  range to get accidental
> > writes into their registers.
> > But, on dri-devel I was advised that this will upset the PCI subsystem
> > and so best to be avoided but I still would like another opinion from
> > PCI experts on whether this approach is indeed not possible ?
> > 
> > Andrey

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case]
  2021-02-05 15:38 ` Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case] Bjorn Helgaas
@ 2021-02-05 16:08   ` Andrey Grodzovsky
  2021-02-05 17:59     ` Daniel Vetter
  2021-02-05 19:45     ` Bjorn Helgaas
  0 siblings, 2 replies; 11+ messages in thread
From: Andrey Grodzovsky @ 2021-02-05 16:08 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Deucher, Alexander, s.miroshnichenko, linux-pci, Koenig,
	Christian, Antonovitch, Anatoli, Daniel Vetter

+ Daniel


On 2/5/21 10:38 AM, Bjorn Helgaas wrote:
> On Thu, Feb 04, 2021 at 11:03:10AM -0500, Andrey Grodzovsky wrote:
>> + linux-pci mailing list and a bit of extra details bellow.
>>
>> On 2/2/21 12:51 PM, Andrey Grodzovsky wrote:
>>> Bjorn, Sergey I would also like to use this opportunity to ask you a
>>> question on a related topic - Hot unplug.
>>> I've been working for a while on this (see latest patchset set here
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058595.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccb508648cdca4144b0af08d8c9ec0c9e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481362958887459%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=IZgVF3%2Bq0vGwXBJo5gh8%2BaYEgYnXWqIhnfI3swFDCXU%3D&amp;reserved=0).
>>> My question is specifically regarding this patch
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058606.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccb508648cdca4144b0af08d8c9ec0c9e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481362958887459%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ARrUqyg%2F3NoCOs0l6hgaR3Fktqt2nz6Ab0FP9zSVx04%3D&amp;reserved=0
>>> - the idea here is to
>>> prevent any accesses to MMIO address ranges still mapped in kernel page
> 
> I think you're talking about a PCI BAR that is mapped into a user
> process.

For user mappings, including MMIO mappings, we have a reliable approach where
we invalidate device address space mappings for all user on first sign of device 
disconnect
and then on all subsequent page faults from the users accessing those ranges we 
insert dummy zero page
into their respective page tables. It's actually the kernel driver, where no 
page faulting can be used
such as for user space, I have issues on how to protect from keep accessing 
those ranges which already are released
by PCI subsystem and hence can be allocated to another hot plugging device.

> 
> It is impossible to reliably *prevent* MMIO accesses to a BAR on a
> PCI device that has been unplugged.  There is always a window where
> the CPU issues a load/store and the device is unplugged before the
> load/store completes.
> 
> If a PCIe device is unplugged, an MMIO read to that BAR will complete
> on PCIe with an uncorrectable fatal error.  When that happens there is
> no valid data from the PCIe device, so the PCIe host bridge typically
> fabricates ~0 data so the CPU load instruction can complete.
> 
> If you want to reliably recover from unplugs like this, I think you
> have to check for that ~0 data at the important points, i.e., where
> you make decisions based on the data.  Of course, ~0 may be valid data
> in some cases.  You have to use your knowledge of the device
> programming model to determine whether ~0 is possible, and if it is,
> check in some other way, like another MMIO read, to tell whether the
> read succeeded and returned ~0, or failed because of PCIe error and
> returned ~0.

Looks like there is a high performance price to pay for this approach if we 
protect at every possible junction (e.g. register read/write accessors ), I 
tested this by doing 1M read/writes while using drm_dev_enter/drm_dev_exit which 
is DRM's RCU based guard against device unplug and even then we hit performance 
penalty of 40%. I assume that with actual MMIO read (e.g. pci_device_is_present) 
  will cause a much larger performance
penalty.

Andrey

> 
>>> table by the driver AFTER the device is gone physically and from the
>>> PCI  subsystem, post pci_driver.remove
>>> call back execution. This happens because struct device (and struct
>>> drm_device representing the graphic card) are still present because some
>>> user clients which  are not aware
>>> of hot removal still hold device file open and hence prevents device
>>> refcount to drop to 0. The solution in this patch is brute force where
>>> we try and find any place we access MMIO
>>> mapped to kernel address space and guard against the write access with a
>>> 'device-unplug' flag. This solution is obliviously racy because a device
>>> can be unplugged right after checking the falg.
>>> I had an idea to instead not release but to keep those ranges reserved
>>> even after pci_driver.remove, until DRM device is destroyed when it's
>>> refcount drops to 0 and by this to prevent new devices plugged in
>>> and allocated some of the same MMIO address  range to get accidental
>>> writes into their registers.
>>> But, on dri-devel I was advised that this will upset the PCI subsystem
>>> and so best to be avoided but I still would like another opinion from
>>> PCI experts on whether this approach is indeed not possible ?
>>>
>>> Andrey

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case]
  2021-02-05 16:08   ` Andrey Grodzovsky
@ 2021-02-05 17:59     ` Daniel Vetter
  2021-02-05 19:09       ` Andrey Grodzovsky
  2021-02-05 19:45     ` Bjorn Helgaas
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel Vetter @ 2021-02-05 17:59 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Bjorn Helgaas, Deucher, Alexander, s.miroshnichenko, linux-pci,
	Koenig, Christian, Antonovitch, Anatoli

On Fri, Feb 5, 2021 at 5:08 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com> wrote:
>
> + Daniel
>
>
> On 2/5/21 10:38 AM, Bjorn Helgaas wrote:
> > On Thu, Feb 04, 2021 at 11:03:10AM -0500, Andrey Grodzovsky wrote:
> >> + linux-pci mailing list and a bit of extra details bellow.
> >>
> >> On 2/2/21 12:51 PM, Andrey Grodzovsky wrote:
> >>> Bjorn, Sergey I would also like to use this opportunity to ask you a
> >>> question on a related topic - Hot unplug.
> >>> I've been working for a while on this (see latest patchset set here
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058595.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccb508648cdca4144b0af08d8c9ec0c9e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481362958887459%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=IZgVF3%2Bq0vGwXBJo5gh8%2BaYEgYnXWqIhnfI3swFDCXU%3D&amp;reserved=0).
> >>> My question is specifically regarding this patch
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058606.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccb508648cdca4144b0af08d8c9ec0c9e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481362958887459%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ARrUqyg%2F3NoCOs0l6hgaR3Fktqt2nz6Ab0FP9zSVx04%3D&amp;reserved=0
> >>> - the idea here is to
> >>> prevent any accesses to MMIO address ranges still mapped in kernel page
> >
> > I think you're talking about a PCI BAR that is mapped into a user
> > process.
>
> For user mappings, including MMIO mappings, we have a reliable approach where
> we invalidate device address space mappings for all user on first sign of device
> disconnect
> and then on all subsequent page faults from the users accessing those ranges we
> insert dummy zero page
> into their respective page tables. It's actually the kernel driver, where no
> page faulting can be used
> such as for user space, I have issues on how to protect from keep accessing
> those ranges which already are released
> by PCI subsystem and hence can be allocated to another hot plugging device.

Hm what's the access here? At least on the drm side we should be able
to tear everything down when our ->remove callback is called.

Or I might also be missing too much of the thread context.

> > It is impossible to reliably *prevent* MMIO accesses to a BAR on a
> > PCI device that has been unplugged.  There is always a window where
> > the CPU issues a load/store and the device is unplugged before the
> > load/store completes.
> >
> > If a PCIe device is unplugged, an MMIO read to that BAR will complete
> > on PCIe with an uncorrectable fatal error.  When that happens there is
> > no valid data from the PCIe device, so the PCIe host bridge typically
> > fabricates ~0 data so the CPU load instruction can complete.
> >
> > If you want to reliably recover from unplugs like this, I think you
> > have to check for that ~0 data at the important points, i.e., where
> > you make decisions based on the data.  Of course, ~0 may be valid data
> > in some cases.  You have to use your knowledge of the device
> > programming model to determine whether ~0 is possible, and if it is,
> > check in some other way, like another MMIO read, to tell whether the
> > read succeeded and returned ~0, or failed because of PCIe error and
> > returned ~0.
>
> Looks like there is a high performance price to pay for this approach if we
> protect at every possible junction (e.g. register read/write accessors ), I
> tested this by doing 1M read/writes while using drm_dev_enter/drm_dev_exit which
> is DRM's RCU based guard against device unplug and even then we hit performance
> penalty of 40%. I assume that with actual MMIO read (e.g. pci_device_is_present)
>   will cause a much larger performance
> penalty.

Hm that's surprising much (for the SRCU check), but then checking for
every read/write really is a bit overkill I think.
-Daniel


> Andrey
>
> >
> >>> table by the driver AFTER the device is gone physically and from the
> >>> PCI  subsystem, post pci_driver.remove
> >>> call back execution. This happens because struct device (and struct
> >>> drm_device representing the graphic card) are still present because some
> >>> user clients which  are not aware
> >>> of hot removal still hold device file open and hence prevents device
> >>> refcount to drop to 0. The solution in this patch is brute force where
> >>> we try and find any place we access MMIO
> >>> mapped to kernel address space and guard against the write access with a
> >>> 'device-unplug' flag. This solution is obliviously racy because a device
> >>> can be unplugged right after checking the falg.
> >>> I had an idea to instead not release but to keep those ranges reserved
> >>> even after pci_driver.remove, until DRM device is destroyed when it's
> >>> refcount drops to 0 and by this to prevent new devices plugged in
> >>> and allocated some of the same MMIO address  range to get accidental
> >>> writes into their registers.
> >>> But, on dri-devel I was advised that this will upset the PCI subsystem
> >>> and so best to be avoided but I still would like another opinion from
> >>> PCI experts on whether this approach is indeed not possible ?
> >>>
> >>> Andrey



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case]
  2021-02-05 17:59     ` Daniel Vetter
@ 2021-02-05 19:09       ` Andrey Grodzovsky
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Grodzovsky @ 2021-02-05 19:09 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Bjorn Helgaas, Deucher, Alexander, s.miroshnichenko, linux-pci,
	Koenig, Christian, Antonovitch, Anatoli



On 2/5/21 12:59 PM, Daniel Vetter wrote:
> On Fri, Feb 5, 2021 at 5:08 PM Andrey Grodzovsky
> <Andrey.Grodzovsky@amd.com> wrote:
>>
>> + Daniel
>>
>>
>> On 2/5/21 10:38 AM, Bjorn Helgaas wrote:
>>> On Thu, Feb 04, 2021 at 11:03:10AM -0500, Andrey Grodzovsky wrote:
>>>> + linux-pci mailing list and a bit of extra details bellow.
>>>>
>>>> On 2/2/21 12:51 PM, Andrey Grodzovsky wrote:
>>>>> Bjorn, Sergey I would also like to use this opportunity to ask you a
>>>>> question on a related topic - Hot unplug.
>>>>> I've been working for a while on this (see latest patchset set here
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058595.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Cb45e7ccc6a9d44746b4608d8c9ffdf6c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481448110995519%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9tZ1zY3t3jkclPsNB0LEUWiN3CGcjAFGUGgXjV1KK3I%3D&amp;reserved=0).
>>>>> My question is specifically regarding this patch
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058606.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Cb45e7ccc6a9d44746b4608d8c9ffdf6c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481448110995519%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=PqD048XLtkaAz6dKygr0rRTnFdwH0Y8WYM9JEYQ1zdA%3D&amp;reserved=0
>>>>> - the idea here is to
>>>>> prevent any accesses to MMIO address ranges still mapped in kernel page
>>>
>>> I think you're talking about a PCI BAR that is mapped into a user
>>> process.
>>
>> For user mappings, including MMIO mappings, we have a reliable approach where
>> we invalidate device address space mappings for all user on first sign of device
>> disconnect
>> and then on all subsequent page faults from the users accessing those ranges we
>> insert dummy zero page
>> into their respective page tables. It's actually the kernel driver, where no
>> page faulting can be used
>> such as for user space, I have issues on how to protect from keep accessing
>> those ranges which already are released
>> by PCI subsystem and hence can be allocated to another hot plugging device.
> 
> Hm what's the access here? At least on the drm side we should be able
> to tear everything down when our ->remove callback is called.
> 
> Or I might also be missing too much of the thread context.

I mean we could have a bunch of background threads doing things and periodic
timers that poll different stuff (like temperature sensosr monitoring ,e.t.c)
and some stuff just might be in the middle of HW programming just as
the device was extracted

> 
>>> It is impossible to reliably *prevent* MMIO accesses to a BAR on a
>>> PCI device that has been unplugged.  There is always a window where
>>> the CPU issues a load/store and the device is unplugged before the
>>> load/store completes.
>>>
>>> If a PCIe device is unplugged, an MMIO read to that BAR will complete
>>> on PCIe with an uncorrectable fatal error.  When that happens there is
>>> no valid data from the PCIe device, so the PCIe host bridge typically
>>> fabricates ~0 data so the CPU load instruction can complete.
>>>
>>> If you want to reliably recover from unplugs like this, I think you
>>> have to check for that ~0 data at the important points, i.e., where
>>> you make decisions based on the data.  Of course, ~0 may be valid data
>>> in some cases.  You have to use your knowledge of the device
>>> programming model to determine whether ~0 is possible, and if it is,
>>> check in some other way, like another MMIO read, to tell whether the
>>> read succeeded and returned ~0, or failed because of PCIe error and
>>> returned ~0.
>>
>> Looks like there is a high performance price to pay for this approach if we
>> protect at every possible junction (e.g. register read/write accessors ), I
>> tested this by doing 1M read/writes while using drm_dev_enter/drm_dev_exit which
>> is DRM's RCU based guard against device unplug and even then we hit performance
>> penalty of 40%. I assume that with actual MMIO read (e.g. pci_device_is_present)
>>    will cause a much larger performance
>> penalty.
> 
> Hm that's surprising much (for the SRCU check), but then checking for
> every read/write really is a bit overkill I think.
> -Daniel

So i guess there is no magic bullet to this after all and we back to some kind 
of gurds around the sensitive stuff such as blocking all GPU SW schedulers, 
refusing father direct IB submissions to HW rings, rejecting all IOCTls.

Andrey

> 
> 
>> Andrey
>>
>>>
>>>>> table by the driver AFTER the device is gone physically and from the
>>>>> PCI  subsystem, post pci_driver.remove
>>>>> call back execution. This happens because struct device (and struct
>>>>> drm_device representing the graphic card) are still present because some
>>>>> user clients which  are not aware
>>>>> of hot removal still hold device file open and hence prevents device
>>>>> refcount to drop to 0. The solution in this patch is brute force where
>>>>> we try and find any place we access MMIO
>>>>> mapped to kernel address space and guard against the write access with a
>>>>> 'device-unplug' flag. This solution is obliviously racy because a device
>>>>> can be unplugged right after checking the falg.
>>>>> I had an idea to instead not release but to keep those ranges reserved
>>>>> even after pci_driver.remove, until DRM device is destroyed when it's
>>>>> refcount drops to 0 and by this to prevent new devices plugged in
>>>>> and allocated some of the same MMIO address  range to get accidental
>>>>> writes into their registers.
>>>>> But, on dri-devel I was advised that this will upset the PCI subsystem
>>>>> and so best to be avoided but I still would like another opinion from
>>>>> PCI experts on whether this approach is indeed not possible ?
>>>>>
>>>>> Andrey
> 
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case]
  2021-02-05 16:08   ` Andrey Grodzovsky
  2021-02-05 17:59     ` Daniel Vetter
@ 2021-02-05 19:45     ` Bjorn Helgaas
  2021-02-05 20:42       ` Andrey Grodzovsky
  1 sibling, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2021-02-05 19:45 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Deucher, Alexander, s.miroshnichenko, linux-pci, Koenig,
	Christian, Antonovitch, Anatoli, Daniel Vetter

On Fri, Feb 05, 2021 at 11:08:45AM -0500, Andrey Grodzovsky wrote:
> On 2/5/21 10:38 AM, Bjorn Helgaas wrote:
> > On Thu, Feb 04, 2021 at 11:03:10AM -0500, Andrey Grodzovsky wrote:
> > > + linux-pci mailing list and a bit of extra details bellow.
> > > 
> > > On 2/2/21 12:51 PM, Andrey Grodzovsky wrote:
> > > > Bjorn, Sergey I would also like to use this opportunity to ask you a
> > > > question on a related topic - Hot unplug.
> > > > I've been working for a while on this (see latest patchset set here
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058595.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccb508648cdca4144b0af08d8c9ec0c9e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481362958887459%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=IZgVF3%2Bq0vGwXBJo5gh8%2BaYEgYnXWqIhnfI3swFDCXU%3D&amp;reserved=0).
> > > > My question is specifically regarding this patch
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058606.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccb508648cdca4144b0af08d8c9ec0c9e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481362958887459%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ARrUqyg%2F3NoCOs0l6hgaR3Fktqt2nz6Ab0FP9zSVx04%3D&amp;reserved=0
> > > > - the idea here is to
> > > > prevent any accesses to MMIO address ranges still mapped in kernel page
> > 
> > I think you're talking about a PCI BAR that is mapped into a user
> > process.
> 
> For user mappings, including MMIO mappings, we have a reliable
> approach where we invalidate device address space mappings for all
> user on first sign of device disconnect and then on all subsequent
> page faults from the users accessing those ranges we insert dummy
> zero page into their respective page tables. It's actually the
> kernel driver, where no page faulting can be used such as for user
> space, I have issues on how to protect from keep accessing those
> ranges which already are released by PCI subsystem and hence can be
> allocated to another hot plugging device.

That doesn't sound reliable to me, but maybe I don't understand what
you mean by the "first sign of device disconnect." At least from a PCI
perspective, the first sign of a surprise hot unplug is likely to be
an MMIO read that returns ~0.

It's true that the hot unplug will likely cause an interrupt and we
*may* be able to unbind the driver before the driver or a user program
performs an MMIO access, but there's certainly no guarantee.  The
following sequence is always possible:

  - User unplugs PCIe device
  - Bridge raises hotplug interrupt
  - Driver or userspace issues MMIO read
  - Bridge responds with error because link to device is down
  - Host bridge receives error, fabricates ~0 data to CPU
  - Driver or userspace sees ~0 data from MMIO read
  - PCI core fields hotplug interrupt and unbinds driver

> > It is impossible to reliably *prevent* MMIO accesses to a BAR on a
> > PCI device that has been unplugged.  There is always a window
> > where the CPU issues a load/store and the device is unplugged
> > before the load/store completes.
> > 
> > If a PCIe device is unplugged, an MMIO read to that BAR will
> > complete on PCIe with an uncorrectable fatal error.  When that
> > happens there is no valid data from the PCIe device, so the PCIe
> > host bridge typically fabricates ~0 data so the CPU load
> > instruction can complete.
> > 
> > If you want to reliably recover from unplugs like this, I think
> > you have to check for that ~0 data at the important points, i.e.,
> > where you make decisions based on the data.  Of course, ~0 may be
> > valid data in some cases.  You have to use your knowledge of the
> > device programming model to determine whether ~0 is possible, and
> > if it is, check in some other way, like another MMIO read, to tell
> > whether the read succeeded and returned ~0, or failed because of
> > PCIe error and returned ~0.
> 
> Looks like there is a high performance price to pay for this
> approach if we protect at every possible junction (e.g. register
> read/write accessors ), I tested this by doing 1M read/writes while
> using drm_dev_enter/drm_dev_exit which is DRM's RCU based guard
> against device unplug and even then we hit performance penalty of
> 40%. I assume that with actual MMIO read (e.g.
> pci_device_is_present)  will cause a much larger performance
> penalty.

I guess you have to decide whether you want a fast 90% solution or a
somewhat slower 100% reliable solution :)

I don't think the checking should be as expensive as you're thinking.
You only have to check if:

  (1) you're doing an MMIO read (there's no response for MMIO writes,
      so you can't tell whether they succeed),
  (2) the MMIO read returns ~0,
  (3) ~0 might be a valid value for the register you're reading, and
  (4) the read value is important to your control flow.

For example, if you do a series of reads and act on the data after all
the reads complete, you only need to worry about whether the *last*
read failed.  If that last read is to a register that is known to
contain a zero bit, no additional MMIO read is required and the
checking is basically free.

> > > > table by the driver AFTER the device is gone physically and
> > > > from the PCI  subsystem, post pci_driver.remove call back
> > > > execution. This happens because struct device (and struct
> > > > drm_device representing the graphic card) are still present
> > > > because some user clients which  are not aware of hot removal
> > > > still hold device file open and hence prevents device refcount
> > > > to drop to 0. The solution in this patch is brute force where
> > > > we try and find any place we access MMIO mapped to kernel
> > > > address space and guard against the write access with a
> > > > 'device-unplug' flag. This solution is obliviously racy
> > > > because a device can be unplugged right after checking the
> > > > falg.  I had an idea to instead not release but to keep those
> > > > ranges reserved even after pci_driver.remove, until DRM device
> > > > is destroyed when it's refcount drops to 0 and by this to
> > > > prevent new devices plugged in and allocated some of the same
> > > > MMIO address  range to get accidental writes into their
> > > > registers.  But, on dri-devel I was advised that this will
> > > > upset the PCI subsystem and so best to be avoided but I still
> > > > would like another opinion from PCI experts on whether this
> > > > approach is indeed not possible ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case]
  2021-02-05 19:45     ` Bjorn Helgaas
@ 2021-02-05 20:42       ` Andrey Grodzovsky
  2021-02-05 20:49         ` Daniel Vetter
  2021-02-05 21:35         ` Keith Busch
  0 siblings, 2 replies; 11+ messages in thread
From: Andrey Grodzovsky @ 2021-02-05 20:42 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Deucher, Alexander, s.miroshnichenko, linux-pci, Koenig,
	Christian, Antonovitch, Anatoli, Daniel Vetter



On 2/5/21 2:45 PM, Bjorn Helgaas wrote:
> On Fri, Feb 05, 2021 at 11:08:45AM -0500, Andrey Grodzovsky wrote:
>> On 2/5/21 10:38 AM, Bjorn Helgaas wrote:
>>> On Thu, Feb 04, 2021 at 11:03:10AM -0500, Andrey Grodzovsky wrote:
>>>> + linux-pci mailing list and a bit of extra details bellow.
>>>>
>>>> On 2/2/21 12:51 PM, Andrey Grodzovsky wrote:
>>>>> Bjorn, Sergey I would also like to use this opportunity to ask you a
>>>>> question on a related topic - Hot unplug.
>>>>> I've been working for a while on this (see latest patchset set here
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058595.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C67eb867f5714488f604608d8ca0ea0c8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481511484863191%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sPRG9cnJDUPR%2FJ1%2Bls0zM6Bidut6bbT%2BpCYuufnc24Q%3D&amp;reserved=0).
>>>>> My question is specifically regarding this patch
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058606.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C67eb867f5714488f604608d8ca0ea0c8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481511484863191%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=YsGsXwyK%2FtiErUONC9BbcXcceGcljtbnOBWb131kl%2FI%3D&amp;reserved=0
>>>>> - the idea here is to
>>>>> prevent any accesses to MMIO address ranges still mapped in kernel page
>>>
>>> I think you're talking about a PCI BAR that is mapped into a user
>>> process.
>>
>> For user mappings, including MMIO mappings, we have a reliable
>> approach where we invalidate device address space mappings for all
>> user on first sign of device disconnect and then on all subsequent
>> page faults from the users accessing those ranges we insert dummy
>> zero page into their respective page tables. It's actually the
>> kernel driver, where no page faulting can be used such as for user
>> space, I have issues on how to protect from keep accessing those
>> ranges which already are released by PCI subsystem and hence can be
>> allocated to another hot plugging device.
> 
> That doesn't sound reliable to me, but maybe I don't understand what
> you mean by the "first sign of device disconnect." 

See functions drm_dev_enter, drm_dev_exit and drm_dev_unplug in drm_derv.c

At least from a PCI
> perspective, the first sign of a surprise hot unplug is likely to be
> an MMIO read that returns ~0.

We set drm_dev_unplug in amdgpu_pci_remove and base all later checks
with drm_dev_enter/drm_dev_exit on this

> 
> It's true that the hot unplug will likely cause an interrupt and we
> *may* be able to unbind the driver before the driver or a user program
> performs an MMIO access, but there's certainly no guarantee.  The
> following sequence is always possible:
> 
>    - User unplugs PCIe device
>    - Bridge raises hotplug interrupt
>    - Driver or userspace issues MMIO read
>    - Bridge responds with error because link to device is down
>    - Host bridge receives error, fabricates ~0 data to CPU
>    - Driver or userspace sees ~0 data from MMIO read
>    - PCI core fields hotplug interrupt and unbinds driver
> 
>>> It is impossible to reliably *prevent* MMIO accesses to a BAR on a
>>> PCI device that has been unplugged.  There is always a window
>>> where the CPU issues a load/store and the device is unplugged
>>> before the load/store completes.
>>>
>>> If a PCIe device is unplugged, an MMIO read to that BAR will
>>> complete on PCIe with an uncorrectable fatal error.  When that
>>> happens there is no valid data from the PCIe device, so the PCIe
>>> host bridge typically fabricates ~0 data so the CPU load
>>> instruction can complete.
>>>
>>> If you want to reliably recover from unplugs like this, I think
>>> you have to check for that ~0 data at the important points, i.e.,
>>> where you make decisions based on the data.  Of course, ~0 may be
>>> valid data in some cases.  You have to use your knowledge of the
>>> device programming model to determine whether ~0 is possible, and
>>> if it is, check in some other way, like another MMIO read, to tell
>>> whether the read succeeded and returned ~0, or failed because of
>>> PCIe error and returned ~0.
>>
>> Looks like there is a high performance price to pay for this
>> approach if we protect at every possible junction (e.g. register
>> read/write accessors ), I tested this by doing 1M read/writes while
>> using drm_dev_enter/drm_dev_exit which is DRM's RCU based guard
>> against device unplug and even then we hit performance penalty of
>> 40%. I assume that with actual MMIO read (e.g.
>> pci_device_is_present)  will cause a much larger performance
>> penalty.
> 
> I guess you have to decide whether you want a fast 90% solution or a
> somewhat slower 100% reliable solution :)
> 
> I don't think the checking should be as expensive as you're thinking.
> You only have to check if:
> 
>    (1) you're doing an MMIO read (there's no response for MMIO writes,
>        so you can't tell whether they succeed),
>    (2) the MMIO read returns ~0,
>    (3) ~0 might be a valid value for the register you're reading, and
>    (4) the read value is important to your control flow.
> 
> For example, if you do a series of reads and act on the data after all
> the reads complete, you only need to worry about whether the *last*
> read failed.  If that last read is to a register that is known to
> contain a zero bit, no additional MMIO read is required and the
> checking is basically free.

I am more worried about MMIO writes to avoid writing to a BAR
of a newly 'just' plugged in device that got accidentally allocated some
part of MMIO addresses range that our 'ghost' device still using.

Andrey

> 
>>>>> table by the driver AFTER the device is gone physically and
>>>>> from the PCI  subsystem, post pci_driver.remove call back
>>>>> execution. This happens because struct device (and struct
>>>>> drm_device representing the graphic card) are still present
>>>>> because some user clients which  are not aware of hot removal
>>>>> still hold device file open and hence prevents device refcount
>>>>> to drop to 0. The solution in this patch is brute force where
>>>>> we try and find any place we access MMIO mapped to kernel
>>>>> address space and guard against the write access with a
>>>>> 'device-unplug' flag. This solution is obliviously racy
>>>>> because a device can be unplugged right after checking the
>>>>> falg.  I had an idea to instead not release but to keep those
>>>>> ranges reserved even after pci_driver.remove, until DRM device
>>>>> is destroyed when it's refcount drops to 0 and by this to
>>>>> prevent new devices plugged in and allocated some of the same
>>>>> MMIO address  range to get accidental writes into their
>>>>> registers.  But, on dri-devel I was advised that this will
>>>>> upset the PCI subsystem and so best to be avoided but I still
>>>>> would like another opinion from PCI experts on whether this
>>>>> approach is indeed not possible ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case]
  2021-02-05 20:42       ` Andrey Grodzovsky
@ 2021-02-05 20:49         ` Daniel Vetter
  2021-02-05 21:24           ` Andrey Grodzovsky
  2021-02-05 21:35         ` Keith Busch
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel Vetter @ 2021-02-05 20:49 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Bjorn Helgaas, Deucher, Alexander, s.miroshnichenko, linux-pci,
	Koenig, Christian, Antonovitch, Anatoli

On Fri, Feb 5, 2021 at 9:42 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com> wrote:
>
>
>
> On 2/5/21 2:45 PM, Bjorn Helgaas wrote:
> > On Fri, Feb 05, 2021 at 11:08:45AM -0500, Andrey Grodzovsky wrote:
> >> On 2/5/21 10:38 AM, Bjorn Helgaas wrote:
> >>> On Thu, Feb 04, 2021 at 11:03:10AM -0500, Andrey Grodzovsky wrote:
> >>>> + linux-pci mailing list and a bit of extra details bellow.
> >>>>
> >>>> On 2/2/21 12:51 PM, Andrey Grodzovsky wrote:
> >>>>> Bjorn, Sergey I would also like to use this opportunity to ask you a
> >>>>> question on a related topic - Hot unplug.
> >>>>> I've been working for a while on this (see latest patchset set here
> >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058595.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C67eb867f5714488f604608d8ca0ea0c8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481511484863191%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sPRG9cnJDUPR%2FJ1%2Bls0zM6Bidut6bbT%2BpCYuufnc24Q%3D&amp;reserved=0).
> >>>>> My question is specifically regarding this patch
> >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058606.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C67eb867f5714488f604608d8ca0ea0c8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481511484863191%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=YsGsXwyK%2FtiErUONC9BbcXcceGcljtbnOBWb131kl%2FI%3D&amp;reserved=0
> >>>>> - the idea here is to
> >>>>> prevent any accesses to MMIO address ranges still mapped in kernel page
> >>>
> >>> I think you're talking about a PCI BAR that is mapped into a user
> >>> process.
> >>
> >> For user mappings, including MMIO mappings, we have a reliable
> >> approach where we invalidate device address space mappings for all
> >> user on first sign of device disconnect and then on all subsequent
> >> page faults from the users accessing those ranges we insert dummy
> >> zero page into their respective page tables. It's actually the
> >> kernel driver, where no page faulting can be used such as for user
> >> space, I have issues on how to protect from keep accessing those
> >> ranges which already are released by PCI subsystem and hence can be
> >> allocated to another hot plugging device.
> >
> > That doesn't sound reliable to me, but maybe I don't understand what
> > you mean by the "first sign of device disconnect."
>
> See functions drm_dev_enter, drm_dev_exit and drm_dev_unplug in drm_derv.c
>
> At least from a PCI
> > perspective, the first sign of a surprise hot unplug is likely to be
> > an MMIO read that returns ~0.
>
> We set drm_dev_unplug in amdgpu_pci_remove and base all later checks
> with drm_dev_enter/drm_dev_exit on this
>
> >
> > It's true that the hot unplug will likely cause an interrupt and we
> > *may* be able to unbind the driver before the driver or a user program
> > performs an MMIO access, but there's certainly no guarantee.  The
> > following sequence is always possible:
> >
> >    - User unplugs PCIe device
> >    - Bridge raises hotplug interrupt
> >    - Driver or userspace issues MMIO read
> >    - Bridge responds with error because link to device is down
> >    - Host bridge receives error, fabricates ~0 data to CPU
> >    - Driver or userspace sees ~0 data from MMIO read
> >    - PCI core fields hotplug interrupt and unbinds driver
> >
> >>> It is impossible to reliably *prevent* MMIO accesses to a BAR on a
> >>> PCI device that has been unplugged.  There is always a window
> >>> where the CPU issues a load/store and the device is unplugged
> >>> before the load/store completes.
> >>>
> >>> If a PCIe device is unplugged, an MMIO read to that BAR will
> >>> complete on PCIe with an uncorrectable fatal error.  When that
> >>> happens there is no valid data from the PCIe device, so the PCIe
> >>> host bridge typically fabricates ~0 data so the CPU load
> >>> instruction can complete.
> >>>
> >>> If you want to reliably recover from unplugs like this, I think
> >>> you have to check for that ~0 data at the important points, i.e.,
> >>> where you make decisions based on the data.  Of course, ~0 may be
> >>> valid data in some cases.  You have to use your knowledge of the
> >>> device programming model to determine whether ~0 is possible, and
> >>> if it is, check in some other way, like another MMIO read, to tell
> >>> whether the read succeeded and returned ~0, or failed because of
> >>> PCIe error and returned ~0.
> >>
> >> Looks like there is a high performance price to pay for this
> >> approach if we protect at every possible junction (e.g. register
> >> read/write accessors ), I tested this by doing 1M read/writes while
> >> using drm_dev_enter/drm_dev_exit which is DRM's RCU based guard
> >> against device unplug and even then we hit performance penalty of
> >> 40%. I assume that with actual MMIO read (e.g.
> >> pci_device_is_present)  will cause a much larger performance
> >> penalty.
> >
> > I guess you have to decide whether you want a fast 90% solution or a
> > somewhat slower 100% reliable solution :)
> >
> > I don't think the checking should be as expensive as you're thinking.
> > You only have to check if:
> >
> >    (1) you're doing an MMIO read (there's no response for MMIO writes,
> >        so you can't tell whether they succeed),
> >    (2) the MMIO read returns ~0,
> >    (3) ~0 might be a valid value for the register you're reading, and
> >    (4) the read value is important to your control flow.
> >
> > For example, if you do a series of reads and act on the data after all
> > the reads complete, you only need to worry about whether the *last*
> > read failed.  If that last read is to a register that is known to
> > contain a zero bit, no additional MMIO read is required and the
> > checking is basically free.
>
> I am more worried about MMIO writes to avoid writing to a BAR
> of a newly 'just' plugged in device that got accidentally allocated some
> part of MMIO addresses range that our 'ghost' device still using.

We have to shut all these down before the ->remove callback has
finished. At least that's my understanding, before that's all done the
pci subsystem wont reallocate anything.

The drm_dev_enter/exit is only meant to lock out entire code paths
when we know the device is gone, so that we don't spend an eternity
timing out every access and running lots of code for no point. The
other reason for drm_dev_exit is to guard the sw side from access of
data structures which we tear down (or have to tear down) as part of
->remove callback, like io remaps, maybe timers/workers driving hw
access and all these things. Only the uapi objects need to stay around
until userspace has closed all the fd/unmapped all the mappings so
that we don't oops all over the place.

Of course auditing a big driver like amdgpu for this is massive pain,
so where/how you sprinkle drm_dev_enter/exit over it is a bit an
engineering tradeoff.

Either way (if my understanding is correct), after ->remove is
finished, your driver _must_ guarantee that all access to mmio ranges
has stopped. How you do that is kinda up to you.

But also like Bjorn pointed out, _while_ the hotunplug is happening,
i.e. at any moment between when transactions start failing up to and
including you driver's ->remove callback having finished, you need to
be able to cope with the bogus all-0xff replies in your code. But
that's kinda an orthogonal requirement: If you've fixed the use-after
free you might still crash on 0xff, and even if you fixed the all the
crashes due to 0xff reads you might still have lots of use-after frees
around.

Cheers, Daniel

> Andrey
>
> >
> >>>>> table by the driver AFTER the device is gone physically and
> >>>>> from the PCI  subsystem, post pci_driver.remove call back
> >>>>> execution. This happens because struct device (and struct
> >>>>> drm_device representing the graphic card) are still present
> >>>>> because some user clients which  are not aware of hot removal
> >>>>> still hold device file open and hence prevents device refcount
> >>>>> to drop to 0. The solution in this patch is brute force where
> >>>>> we try and find any place we access MMIO mapped to kernel
> >>>>> address space and guard against the write access with a
> >>>>> 'device-unplug' flag. This solution is obliviously racy
> >>>>> because a device can be unplugged right after checking the
> >>>>> falg.  I had an idea to instead not release but to keep those
> >>>>> ranges reserved even after pci_driver.remove, until DRM device
> >>>>> is destroyed when it's refcount drops to 0 and by this to
> >>>>> prevent new devices plugged in and allocated some of the same
> >>>>> MMIO address  range to get accidental writes into their
> >>>>> registers.  But, on dri-devel I was advised that this will
> >>>>> upset the PCI subsystem and so best to be avoided but I still
> >>>>> would like another opinion from PCI experts on whether this
> >>>>> approach is indeed not possible ?



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case]
  2021-02-05 20:49         ` Daniel Vetter
@ 2021-02-05 21:24           ` Andrey Grodzovsky
  2021-02-05 22:15             ` Daniel Vetter
  0 siblings, 1 reply; 11+ messages in thread
From: Andrey Grodzovsky @ 2021-02-05 21:24 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Bjorn Helgaas, Deucher, Alexander, s.miroshnichenko, linux-pci,
	Koenig, Christian, Antonovitch, Anatoli



On 2/5/21 3:49 PM, Daniel Vetter wrote:
> On Fri, Feb 5, 2021 at 9:42 PM Andrey Grodzovsky
> <Andrey.Grodzovsky@amd.com> wrote:
>>
>>
>>
>> On 2/5/21 2:45 PM, Bjorn Helgaas wrote:
>>> On Fri, Feb 05, 2021 at 11:08:45AM -0500, Andrey Grodzovsky wrote:
>>>> On 2/5/21 10:38 AM, Bjorn Helgaas wrote:
>>>>> On Thu, Feb 04, 2021 at 11:03:10AM -0500, Andrey Grodzovsky wrote:
>>>>>> + linux-pci mailing list and a bit of extra details bellow.
>>>>>>
>>>>>> On 2/2/21 12:51 PM, Andrey Grodzovsky wrote:
>>>>>>> Bjorn, Sergey I would also like to use this opportunity to ask you a
>>>>>>> question on a related topic - Hot unplug.
>>>>>>> I've been working for a while on this (see latest patchset set here
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058595.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C49d227022bff486d9fd008d8ca178ec1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481549831907816%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=I0ma8E5vOKAOh3vPI0IcGrbZpeFeHtbjuBOkTA5mPSU%3D&amp;reserved=0).
>>>>>>> My question is specifically regarding this patch
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058606.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C49d227022bff486d9fd008d8ca178ec1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481549831917820%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=AS4ahmsk%2BY1bRvkqrLWC1KfrE8oNAGhbtKqIOm5AUQo%3D&amp;reserved=0
>>>>>>> - the idea here is to
>>>>>>> prevent any accesses to MMIO address ranges still mapped in kernel page
>>>>>
>>>>> I think you're talking about a PCI BAR that is mapped into a user
>>>>> process.
>>>>
>>>> For user mappings, including MMIO mappings, we have a reliable
>>>> approach where we invalidate device address space mappings for all
>>>> user on first sign of device disconnect and then on all subsequent
>>>> page faults from the users accessing those ranges we insert dummy
>>>> zero page into their respective page tables. It's actually the
>>>> kernel driver, where no page faulting can be used such as for user
>>>> space, I have issues on how to protect from keep accessing those
>>>> ranges which already are released by PCI subsystem and hence can be
>>>> allocated to another hot plugging device.
>>>
>>> That doesn't sound reliable to me, but maybe I don't understand what
>>> you mean by the "first sign of device disconnect."
>>
>> See functions drm_dev_enter, drm_dev_exit and drm_dev_unplug in drm_derv.c
>>
>> At least from a PCI
>>> perspective, the first sign of a surprise hot unplug is likely to be
>>> an MMIO read that returns ~0.
>>
>> We set drm_dev_unplug in amdgpu_pci_remove and base all later checks
>> with drm_dev_enter/drm_dev_exit on this
>>
>>>
>>> It's true that the hot unplug will likely cause an interrupt and we
>>> *may* be able to unbind the driver before the driver or a user program
>>> performs an MMIO access, but there's certainly no guarantee.  The
>>> following sequence is always possible:
>>>
>>>     - User unplugs PCIe device
>>>     - Bridge raises hotplug interrupt
>>>     - Driver or userspace issues MMIO read
>>>     - Bridge responds with error because link to device is down
>>>     - Host bridge receives error, fabricates ~0 data to CPU
>>>     - Driver or userspace sees ~0 data from MMIO read
>>>     - PCI core fields hotplug interrupt and unbinds driver
>>>
>>>>> It is impossible to reliably *prevent* MMIO accesses to a BAR on a
>>>>> PCI device that has been unplugged.  There is always a window
>>>>> where the CPU issues a load/store and the device is unplugged
>>>>> before the load/store completes.
>>>>>
>>>>> If a PCIe device is unplugged, an MMIO read to that BAR will
>>>>> complete on PCIe with an uncorrectable fatal error.  When that
>>>>> happens there is no valid data from the PCIe device, so the PCIe
>>>>> host bridge typically fabricates ~0 data so the CPU load
>>>>> instruction can complete.
>>>>>
>>>>> If you want to reliably recover from unplugs like this, I think
>>>>> you have to check for that ~0 data at the important points, i.e.,
>>>>> where you make decisions based on the data.  Of course, ~0 may be
>>>>> valid data in some cases.  You have to use your knowledge of the
>>>>> device programming model to determine whether ~0 is possible, and
>>>>> if it is, check in some other way, like another MMIO read, to tell
>>>>> whether the read succeeded and returned ~0, or failed because of
>>>>> PCIe error and returned ~0.
>>>>
>>>> Looks like there is a high performance price to pay for this
>>>> approach if we protect at every possible junction (e.g. register
>>>> read/write accessors ), I tested this by doing 1M read/writes while
>>>> using drm_dev_enter/drm_dev_exit which is DRM's RCU based guard
>>>> against device unplug and even then we hit performance penalty of
>>>> 40%. I assume that with actual MMIO read (e.g.
>>>> pci_device_is_present)  will cause a much larger performance
>>>> penalty.
>>>
>>> I guess you have to decide whether you want a fast 90% solution or a
>>> somewhat slower 100% reliable solution :)
>>>
>>> I don't think the checking should be as expensive as you're thinking.
>>> You only have to check if:
>>>
>>>     (1) you're doing an MMIO read (there's no response for MMIO writes,
>>>         so you can't tell whether they succeed),
>>>     (2) the MMIO read returns ~0,
>>>     (3) ~0 might be a valid value for the register you're reading, and
>>>     (4) the read value is important to your control flow.
>>>
>>> For example, if you do a series of reads and act on the data after all
>>> the reads complete, you only need to worry about whether the *last*
>>> read failed.  If that last read is to a register that is known to
>>> contain a zero bit, no additional MMIO read is required and the
>>> checking is basically free.
>>
>> I am more worried about MMIO writes to avoid writing to a BAR
>> of a newly 'just' plugged in device that got accidentally allocated some
>> part of MMIO addresses range that our 'ghost' device still using.
> 
> We have to shut all these down before the ->remove callback has
> finished. At least that's my understanding, before that's all done the
> pci subsystem wont reallocate anything.
> 
> The drm_dev_enter/exit is only meant to lock out entire code paths
> when we know the device is gone, so that we don't spend an eternity
> timing out every access and running lots of code for no point. The
> other reason for drm_dev_exit is to guard the sw side from access of
> data structures which we tear down (or have to tear down) as part of
> ->remove callback, like io remaps, maybe timers/workers driving hw
> access and all these things. Only the uapi objects need to stay around
> until userspace has closed all the fd/unmapped all the mappings so
> that we don't oops all over the place.
> 
> Of course auditing a big driver like amdgpu for this is massive pain,
> so where/how you sprinkle drm_dev_enter/exit over it is a bit an
> engineering tradeoff.
> 
> Either way (if my understanding is correct), after ->remove is
> finished, your driver _must_ guarantee that all access to mmio ranges
> has stopped. How you do that is kinda up to you.
> 
> But also like Bjorn pointed out, _while_ the hotunplug is happening,
> i.e. at any moment between when transactions start failing up to and
> including you driver's ->remove callback having finished, you need to
> be able to cope with the bogus all-0xff replies in your code. But
> that's kinda an orthogonal requirement: If you've fixed the use-after
> free you might still crash on 0xff, and even if you fixed the all the
> crashes due to 0xff reads you might still have lots of use-after frees
> around.
> 
> Cheers, Daniel

I see, so, to summarize and confirm I got this correct.
We drop the per register access guards as this an overkill and hurts performance.
Post .remove callback I verify to guard any IOCTL (already done), block GPU 
scheduler threads (already done) and disable interrupts (already done), guard 
against any direct IB submissions to HW rings (not done yet) and cancel and 
flush any background running threads (timers mostly probably) (not done yet)
which might trigger HW programming.

Correct ?

Andrey

> 
>> Andrey
>>
>>>
>>>>>>> table by the driver AFTER the device is gone physically and
>>>>>>> from the PCI  subsystem, post pci_driver.remove call back
>>>>>>> execution. This happens because struct device (and struct
>>>>>>> drm_device representing the graphic card) are still present
>>>>>>> because some user clients which  are not aware of hot removal
>>>>>>> still hold device file open and hence prevents device refcount
>>>>>>> to drop to 0. The solution in this patch is brute force where
>>>>>>> we try and find any place we access MMIO mapped to kernel
>>>>>>> address space and guard against the write access with a
>>>>>>> 'device-unplug' flag. This solution is obliviously racy
>>>>>>> because a device can be unplugged right after checking the
>>>>>>> falg.  I had an idea to instead not release but to keep those
>>>>>>> ranges reserved even after pci_driver.remove, until DRM device
>>>>>>> is destroyed when it's refcount drops to 0 and by this to
>>>>>>> prevent new devices plugged in and allocated some of the same
>>>>>>> MMIO address  range to get accidental writes into their
>>>>>>> registers.  But, on dri-devel I was advised that this will
>>>>>>> upset the PCI subsystem and so best to be avoided but I still
>>>>>>> would like another opinion from PCI experts on whether this
>>>>>>> approach is indeed not possible ?
> 
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case]
  2021-02-05 20:42       ` Andrey Grodzovsky
  2021-02-05 20:49         ` Daniel Vetter
@ 2021-02-05 21:35         ` Keith Busch
  2021-02-05 21:40           ` Andrey Grodzovsky
  1 sibling, 1 reply; 11+ messages in thread
From: Keith Busch @ 2021-02-05 21:35 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Bjorn Helgaas, Deucher, Alexander, s.miroshnichenko, linux-pci,
	Koenig, Christian, Antonovitch, Anatoli, Daniel Vetter

On Fri, Feb 05, 2021 at 03:42:01PM -0500, Andrey Grodzovsky wrote:
> On 2/5/21 2:45 PM, Bjorn Helgaas wrote:
> > On Fri, Feb 05, 2021 at 11:08:45AM -0500, Andrey Grodzovsky wrote:
> > > 
> > > For user mappings, including MMIO mappings, we have a reliable
> > > approach where we invalidate device address space mappings for all
> > > user on first sign of device disconnect and then on all subsequent
> > > page faults from the users accessing those ranges we insert dummy
> > > zero page into their respective page tables. It's actually the
> > > kernel driver, where no page faulting can be used such as for user
> > > space, I have issues on how to protect from keep accessing those
> > > ranges which already are released by PCI subsystem and hence can be
> > > allocated to another hot plugging device.
> > 
> > That doesn't sound reliable to me, but maybe I don't understand what
> > you mean by the "first sign of device disconnect."
> 
> See functions drm_dev_enter, drm_dev_exit and drm_dev_unplug in drm_derv.c
> 
> > At least from a PCI
> > perspective, the first sign of a surprise hot unplug is likely to be
> > an MMIO read that returns ~0.
> 
> We set drm_dev_unplug in amdgpu_pci_remove and base all later checks
> with drm_dev_enter/drm_dev_exit on this

It sounds like you are talking about an orderly notified unplug rather
than a surprise hot unplug. If it's a surprise, the code doesn't get to
fence off future MMIO access until well after the address range is
already unreachable.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case]
  2021-02-05 21:35         ` Keith Busch
@ 2021-02-05 21:40           ` Andrey Grodzovsky
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Grodzovsky @ 2021-02-05 21:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: Bjorn Helgaas, Deucher, Alexander, s.miroshnichenko, linux-pci,
	Koenig, Christian, Antonovitch, Anatoli, Daniel Vetter



On 2/5/21 4:35 PM, Keith Busch wrote:
> On Fri, Feb 05, 2021 at 03:42:01PM -0500, Andrey Grodzovsky wrote:
>> On 2/5/21 2:45 PM, Bjorn Helgaas wrote:
>>> On Fri, Feb 05, 2021 at 11:08:45AM -0500, Andrey Grodzovsky wrote:
>>>>
>>>> For user mappings, including MMIO mappings, we have a reliable
>>>> approach where we invalidate device address space mappings for all
>>>> user on first sign of device disconnect and then on all subsequent
>>>> page faults from the users accessing those ranges we insert dummy
>>>> zero page into their respective page tables. It's actually the
>>>> kernel driver, where no page faulting can be used such as for user
>>>> space, I have issues on how to protect from keep accessing those
>>>> ranges which already are released by PCI subsystem and hence can be
>>>> allocated to another hot plugging device.
>>>
>>> That doesn't sound reliable to me, but maybe I don't understand what
>>> you mean by the "first sign of device disconnect."
>>
>> See functions drm_dev_enter, drm_dev_exit and drm_dev_unplug in drm_derv.c
>>
>>> At least from a PCI
>>> perspective, the first sign of a surprise hot unplug is likely to be
>>> an MMIO read that returns ~0.
>>
>> We set drm_dev_unplug in amdgpu_pci_remove and base all later checks
>> with drm_dev_enter/drm_dev_exit on this
> 
> It sounds like you are talking about an orderly notified unplug rather
> than a surprise hot unplug. If it's a surprise, the code doesn't get to
> fence off future MMIO access until well after the address range is
> already unreachable.

I am referring to surprise unplug on which we get notification from the PCI
subsystem which ends up calling our pci_driver.remove callback. I understand
there is a window of time within we are not yet notified but all our MMIO 
accesses will already fail because the device is physically gone at that point
already.

Andrey

> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case]
  2021-02-05 21:24           ` Andrey Grodzovsky
@ 2021-02-05 22:15             ` Daniel Vetter
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Vetter @ 2021-02-05 22:15 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Bjorn Helgaas, Deucher, Alexander, s.miroshnichenko, linux-pci,
	Koenig, Christian, Antonovitch, Anatoli

On Fri, Feb 5, 2021 at 10:24 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com> wrote:
>
>
>
> On 2/5/21 3:49 PM, Daniel Vetter wrote:
> > On Fri, Feb 5, 2021 at 9:42 PM Andrey Grodzovsky
> > <Andrey.Grodzovsky@amd.com> wrote:
> >>
> >>
> >>
> >> On 2/5/21 2:45 PM, Bjorn Helgaas wrote:
> >>> On Fri, Feb 05, 2021 at 11:08:45AM -0500, Andrey Grodzovsky wrote:
> >>>> On 2/5/21 10:38 AM, Bjorn Helgaas wrote:
> >>>>> On Thu, Feb 04, 2021 at 11:03:10AM -0500, Andrey Grodzovsky wrote:
> >>>>>> + linux-pci mailing list and a bit of extra details bellow.
> >>>>>>
> >>>>>> On 2/2/21 12:51 PM, Andrey Grodzovsky wrote:
> >>>>>>> Bjorn, Sergey I would also like to use this opportunity to ask you a
> >>>>>>> question on a related topic - Hot unplug.
> >>>>>>> I've been working for a while on this (see latest patchset set here
> >>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058595.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C49d227022bff486d9fd008d8ca178ec1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481549831907816%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=I0ma8E5vOKAOh3vPI0IcGrbZpeFeHtbjuBOkTA5mPSU%3D&amp;reserved=0).
> >>>>>>> My question is specifically regarding this patch
> >>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058606.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C49d227022bff486d9fd008d8ca178ec1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481549831917820%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=AS4ahmsk%2BY1bRvkqrLWC1KfrE8oNAGhbtKqIOm5AUQo%3D&amp;reserved=0
> >>>>>>> - the idea here is to
> >>>>>>> prevent any accesses to MMIO address ranges still mapped in kernel page
> >>>>>
> >>>>> I think you're talking about a PCI BAR that is mapped into a user
> >>>>> process.
> >>>>
> >>>> For user mappings, including MMIO mappings, we have a reliable
> >>>> approach where we invalidate device address space mappings for all
> >>>> user on first sign of device disconnect and then on all subsequent
> >>>> page faults from the users accessing those ranges we insert dummy
> >>>> zero page into their respective page tables. It's actually the
> >>>> kernel driver, where no page faulting can be used such as for user
> >>>> space, I have issues on how to protect from keep accessing those
> >>>> ranges which already are released by PCI subsystem and hence can be
> >>>> allocated to another hot plugging device.
> >>>
> >>> That doesn't sound reliable to me, but maybe I don't understand what
> >>> you mean by the "first sign of device disconnect."
> >>
> >> See functions drm_dev_enter, drm_dev_exit and drm_dev_unplug in drm_derv.c
> >>
> >> At least from a PCI
> >>> perspective, the first sign of a surprise hot unplug is likely to be
> >>> an MMIO read that returns ~0.
> >>
> >> We set drm_dev_unplug in amdgpu_pci_remove and base all later checks
> >> with drm_dev_enter/drm_dev_exit on this
> >>
> >>>
> >>> It's true that the hot unplug will likely cause an interrupt and we
> >>> *may* be able to unbind the driver before the driver or a user program
> >>> performs an MMIO access, but there's certainly no guarantee.  The
> >>> following sequence is always possible:
> >>>
> >>>     - User unplugs PCIe device
> >>>     - Bridge raises hotplug interrupt
> >>>     - Driver or userspace issues MMIO read
> >>>     - Bridge responds with error because link to device is down
> >>>     - Host bridge receives error, fabricates ~0 data to CPU
> >>>     - Driver or userspace sees ~0 data from MMIO read
> >>>     - PCI core fields hotplug interrupt and unbinds driver
> >>>
> >>>>> It is impossible to reliably *prevent* MMIO accesses to a BAR on a
> >>>>> PCI device that has been unplugged.  There is always a window
> >>>>> where the CPU issues a load/store and the device is unplugged
> >>>>> before the load/store completes.
> >>>>>
> >>>>> If a PCIe device is unplugged, an MMIO read to that BAR will
> >>>>> complete on PCIe with an uncorrectable fatal error.  When that
> >>>>> happens there is no valid data from the PCIe device, so the PCIe
> >>>>> host bridge typically fabricates ~0 data so the CPU load
> >>>>> instruction can complete.
> >>>>>
> >>>>> If you want to reliably recover from unplugs like this, I think
> >>>>> you have to check for that ~0 data at the important points, i.e.,
> >>>>> where you make decisions based on the data.  Of course, ~0 may be
> >>>>> valid data in some cases.  You have to use your knowledge of the
> >>>>> device programming model to determine whether ~0 is possible, and
> >>>>> if it is, check in some other way, like another MMIO read, to tell
> >>>>> whether the read succeeded and returned ~0, or failed because of
> >>>>> PCIe error and returned ~0.
> >>>>
> >>>> Looks like there is a high performance price to pay for this
> >>>> approach if we protect at every possible junction (e.g. register
> >>>> read/write accessors ), I tested this by doing 1M read/writes while
> >>>> using drm_dev_enter/drm_dev_exit which is DRM's RCU based guard
> >>>> against device unplug and even then we hit performance penalty of
> >>>> 40%. I assume that with actual MMIO read (e.g.
> >>>> pci_device_is_present)  will cause a much larger performance
> >>>> penalty.
> >>>
> >>> I guess you have to decide whether you want a fast 90% solution or a
> >>> somewhat slower 100% reliable solution :)
> >>>
> >>> I don't think the checking should be as expensive as you're thinking.
> >>> You only have to check if:
> >>>
> >>>     (1) you're doing an MMIO read (there's no response for MMIO writes,
> >>>         so you can't tell whether they succeed),
> >>>     (2) the MMIO read returns ~0,
> >>>     (3) ~0 might be a valid value for the register you're reading, and
> >>>     (4) the read value is important to your control flow.
> >>>
> >>> For example, if you do a series of reads and act on the data after all
> >>> the reads complete, you only need to worry about whether the *last*
> >>> read failed.  If that last read is to a register that is known to
> >>> contain a zero bit, no additional MMIO read is required and the
> >>> checking is basically free.
> >>
> >> I am more worried about MMIO writes to avoid writing to a BAR
> >> of a newly 'just' plugged in device that got accidentally allocated some
> >> part of MMIO addresses range that our 'ghost' device still using.
> >
> > We have to shut all these down before the ->remove callback has
> > finished. At least that's my understanding, before that's all done the
> > pci subsystem wont reallocate anything.
> >
> > The drm_dev_enter/exit is only meant to lock out entire code paths
> > when we know the device is gone, so that we don't spend an eternity
> > timing out every access and running lots of code for no point. The
> > other reason for drm_dev_exit is to guard the sw side from access of
> > data structures which we tear down (or have to tear down) as part of
> > ->remove callback, like io remaps, maybe timers/workers driving hw
> > access and all these things. Only the uapi objects need to stay around
> > until userspace has closed all the fd/unmapped all the mappings so
> > that we don't oops all over the place.
> >
> > Of course auditing a big driver like amdgpu for this is massive pain,
> > so where/how you sprinkle drm_dev_enter/exit over it is a bit an
> > engineering tradeoff.
> >
> > Either way (if my understanding is correct), after ->remove is
> > finished, your driver _must_ guarantee that all access to mmio ranges
> > has stopped. How you do that is kinda up to you.
> >
> > But also like Bjorn pointed out, _while_ the hotunplug is happening,
> > i.e. at any moment between when transactions start failing up to and
> > including you driver's ->remove callback having finished, you need to
> > be able to cope with the bogus all-0xff replies in your code. But
> > that's kinda an orthogonal requirement: If you've fixed the use-after
> > free you might still crash on 0xff, and even if you fixed the all the
> > crashes due to 0xff reads you might still have lots of use-after frees
> > around.
> >
> > Cheers, Daniel
>
> I see, so, to summarize and confirm I got this correct.
> We drop the per register access guards as this an overkill and hurts performance.
> Post .remove callback I verify to guard any IOCTL (already done), block GPU
> scheduler threads (already done) and disable interrupts (already done), guard
> against any direct IB submissions to HW rings (not done yet) and cancel and
> flush any background running threads (timers mostly probably) (not done yet)
> which might trigger HW programming.
>
> Correct ?

Yeah. Well for clean design I think your ->remove callback should
guarantee that all background tasks your driver has (timers, workers,
whatever) are completely stopped. With drm_dev_enter/exit maybe only
guarding re-arming of these (since if you stop the timer/kthread maybe
also the data structure is gone entirely with kfree() that contained
these). But that's also implementation details which might go one way
or the other.
-Daniel

>
> Andrey
>
> >
> >> Andrey
> >>
> >>>
> >>>>>>> table by the driver AFTER the device is gone physically and
> >>>>>>> from the PCI  subsystem, post pci_driver.remove call back
> >>>>>>> execution. This happens because struct device (and struct
> >>>>>>> drm_device representing the graphic card) are still present
> >>>>>>> because some user clients which  are not aware of hot removal
> >>>>>>> still hold device file open and hence prevents device refcount
> >>>>>>> to drop to 0. The solution in this patch is brute force where
> >>>>>>> we try and find any place we access MMIO mapped to kernel
> >>>>>>> address space and guard against the write access with a
> >>>>>>> 'device-unplug' flag. This solution is obliviously racy
> >>>>>>> because a device can be unplugged right after checking the
> >>>>>>> falg.  I had an idea to instead not release but to keep those
> >>>>>>> ranges reserved even after pci_driver.remove, until DRM device
> >>>>>>> is destroyed when it's refcount drops to 0 and by this to
> >>>>>>> prevent new devices plugged in and allocated some of the same
> >>>>>>> MMIO address  range to get accidental writes into their
> >>>>>>> registers.  But, on dri-devel I was advised that this will
> >>>>>>> upset the PCI subsystem and so best to be avoided but I still
> >>>>>>> would like another opinion from PCI experts on whether this
> >>>>>>> approach is indeed not possible ?
> >
> >
> >



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-02-06  2:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <31a7498d-dd68-f194-cbf5-1f73a53322ff@amd.com>
2021-02-05 15:38 ` Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case] Bjorn Helgaas
2021-02-05 16:08   ` Andrey Grodzovsky
2021-02-05 17:59     ` Daniel Vetter
2021-02-05 19:09       ` Andrey Grodzovsky
2021-02-05 19:45     ` Bjorn Helgaas
2021-02-05 20:42       ` Andrey Grodzovsky
2021-02-05 20:49         ` Daniel Vetter
2021-02-05 21:24           ` Andrey Grodzovsky
2021-02-05 22:15             ` Daniel Vetter
2021-02-05 21:35         ` Keith Busch
2021-02-05 21:40           ` Andrey Grodzovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).