From: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>,
"s.miroshnichenko@yadro.com" <s.miroshnichenko@yadro.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"Koenig, Christian" <Christian.Koenig@amd.com>,
"Antonovitch, Anatoli" <Anatoli.Antonovitch@amd.com>,
Daniel Vetter <daniel@ffwll.ch>
Subject: Re: Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case]
Date: Fri, 5 Feb 2021 15:42:01 -0500 [thread overview]
Message-ID: <c216efcc-6c81-8a7e-a823-1ddb62ebddb7@amd.com> (raw)
In-Reply-To: <20210205194541.GA191443@bjorn-Precision-5520>
On 2/5/21 2:45 PM, Bjorn Helgaas wrote:
> On Fri, Feb 05, 2021 at 11:08:45AM -0500, Andrey Grodzovsky wrote:
>> On 2/5/21 10:38 AM, Bjorn Helgaas wrote:
>>> On Thu, Feb 04, 2021 at 11:03:10AM -0500, Andrey Grodzovsky wrote:
>>>> + linux-pci mailing list and a bit of extra details bellow.
>>>>
>>>> On 2/2/21 12:51 PM, Andrey Grodzovsky wrote:
>>>>> Bjorn, Sergey I would also like to use this opportunity to ask you a
>>>>> question on a related topic - Hot unplug.
>>>>> I've been working for a while on this (see latest patchset set here
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058595.html&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C67eb867f5714488f604608d8ca0ea0c8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481511484863191%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sPRG9cnJDUPR%2FJ1%2Bls0zM6Bidut6bbT%2BpCYuufnc24Q%3D&reserved=0).
>>>>> My question is specifically regarding this patch
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-January%2F058606.html&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C67eb867f5714488f604608d8ca0ea0c8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481511484863191%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=YsGsXwyK%2FtiErUONC9BbcXcceGcljtbnOBWb131kl%2FI%3D&reserved=0
>>>>> - the idea here is to
>>>>> prevent any accesses to MMIO address ranges still mapped in kernel page
>>>
>>> I think you're talking about a PCI BAR that is mapped into a user
>>> process.
>>
>> For user mappings, including MMIO mappings, we have a reliable
>> approach where we invalidate device address space mappings for all
>> user on first sign of device disconnect and then on all subsequent
>> page faults from the users accessing those ranges we insert dummy
>> zero page into their respective page tables. It's actually the
>> kernel driver, where no page faulting can be used such as for user
>> space, I have issues on how to protect from keep accessing those
>> ranges which already are released by PCI subsystem and hence can be
>> allocated to another hot plugging device.
>
> That doesn't sound reliable to me, but maybe I don't understand what
> you mean by the "first sign of device disconnect."
See functions drm_dev_enter, drm_dev_exit and drm_dev_unplug in drm_derv.c
At least from a PCI
> perspective, the first sign of a surprise hot unplug is likely to be
> an MMIO read that returns ~0.
We set drm_dev_unplug in amdgpu_pci_remove and base all later checks
with drm_dev_enter/drm_dev_exit on this
>
> It's true that the hot unplug will likely cause an interrupt and we
> *may* be able to unbind the driver before the driver or a user program
> performs an MMIO access, but there's certainly no guarantee. The
> following sequence is always possible:
>
> - User unplugs PCIe device
> - Bridge raises hotplug interrupt
> - Driver or userspace issues MMIO read
> - Bridge responds with error because link to device is down
> - Host bridge receives error, fabricates ~0 data to CPU
> - Driver or userspace sees ~0 data from MMIO read
> - PCI core fields hotplug interrupt and unbinds driver
>
>>> It is impossible to reliably *prevent* MMIO accesses to a BAR on a
>>> PCI device that has been unplugged. There is always a window
>>> where the CPU issues a load/store and the device is unplugged
>>> before the load/store completes.
>>>
>>> If a PCIe device is unplugged, an MMIO read to that BAR will
>>> complete on PCIe with an uncorrectable fatal error. When that
>>> happens there is no valid data from the PCIe device, so the PCIe
>>> host bridge typically fabricates ~0 data so the CPU load
>>> instruction can complete.
>>>
>>> If you want to reliably recover from unplugs like this, I think
>>> you have to check for that ~0 data at the important points, i.e.,
>>> where you make decisions based on the data. Of course, ~0 may be
>>> valid data in some cases. You have to use your knowledge of the
>>> device programming model to determine whether ~0 is possible, and
>>> if it is, check in some other way, like another MMIO read, to tell
>>> whether the read succeeded and returned ~0, or failed because of
>>> PCIe error and returned ~0.
>>
>> Looks like there is a high performance price to pay for this
>> approach if we protect at every possible junction (e.g. register
>> read/write accessors ), I tested this by doing 1M read/writes while
>> using drm_dev_enter/drm_dev_exit which is DRM's RCU based guard
>> against device unplug and even then we hit performance penalty of
>> 40%. I assume that with actual MMIO read (e.g.
>> pci_device_is_present) will cause a much larger performance
>> penalty.
>
> I guess you have to decide whether you want a fast 90% solution or a
> somewhat slower 100% reliable solution :)
>
> I don't think the checking should be as expensive as you're thinking.
> You only have to check if:
>
> (1) you're doing an MMIO read (there's no response for MMIO writes,
> so you can't tell whether they succeed),
> (2) the MMIO read returns ~0,
> (3) ~0 might be a valid value for the register you're reading, and
> (4) the read value is important to your control flow.
>
> For example, if you do a series of reads and act on the data after all
> the reads complete, you only need to worry about whether the *last*
> read failed. If that last read is to a register that is known to
> contain a zero bit, no additional MMIO read is required and the
> checking is basically free.
I am more worried about MMIO writes to avoid writing to a BAR
of a newly 'just' plugged in device that got accidentally allocated some
part of MMIO addresses range that our 'ghost' device still using.
Andrey
>
>>>>> table by the driver AFTER the device is gone physically and
>>>>> from the PCI subsystem, post pci_driver.remove call back
>>>>> execution. This happens because struct device (and struct
>>>>> drm_device representing the graphic card) are still present
>>>>> because some user clients which are not aware of hot removal
>>>>> still hold device file open and hence prevents device refcount
>>>>> to drop to 0. The solution in this patch is brute force where
>>>>> we try and find any place we access MMIO mapped to kernel
>>>>> address space and guard against the write access with a
>>>>> 'device-unplug' flag. This solution is obliviously racy
>>>>> because a device can be unplugged right after checking the
>>>>> falg. I had an idea to instead not release but to keep those
>>>>> ranges reserved even after pci_driver.remove, until DRM device
>>>>> is destroyed when it's refcount drops to 0 and by this to
>>>>> prevent new devices plugged in and allocated some of the same
>>>>> MMIO address range to get accidental writes into their
>>>>> registers. But, on dri-devel I was advised that this will
>>>>> upset the PCI subsystem and so best to be avoided but I still
>>>>> would like another opinion from PCI experts on whether this
>>>>> approach is indeed not possible ?
next prev parent reply other threads:[~2021-02-05 21:19 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <31a7498d-dd68-f194-cbf5-1f73a53322ff@amd.com>
2021-02-05 15:38 ` Avoid MMIO write access after hot unplug [WAS - Re: Question about supporting AMD eGPU hot plug case] Bjorn Helgaas
2021-02-05 16:08 ` Andrey Grodzovsky
2021-02-05 17:59 ` Daniel Vetter
2021-02-05 19:09 ` Andrey Grodzovsky
2021-02-05 19:45 ` Bjorn Helgaas
2021-02-05 20:42 ` Andrey Grodzovsky [this message]
2021-02-05 20:49 ` Daniel Vetter
2021-02-05 21:24 ` Andrey Grodzovsky
2021-02-05 22:15 ` Daniel Vetter
2021-02-05 21:35 ` Keith Busch
2021-02-05 21:40 ` Andrey Grodzovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c216efcc-6c81-8a7e-a823-1ddb62ebddb7@amd.com \
--to=andrey.grodzovsky@amd.com \
--cc=Alexander.Deucher@amd.com \
--cc=Anatoli.Antonovitch@amd.com \
--cc=Christian.Koenig@amd.com \
--cc=daniel@ffwll.ch \
--cc=helgaas@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=s.miroshnichenko@yadro.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).