linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
To: Sergei Miroshnichenko <s.miroshnichenko@yadro.com>
Cc: "Alexander.Deucher@amd.com" <Alexander.Deucher@amd.com>,
	"Christian.Koenig@amd.com" <Christian.Koenig@amd.com>,
	"anatoli.antonovitch@amd.com" <anatoli.antonovitch@amd.com>,
	"helgaas@kernel.org" <helgaas@kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux@yadro.com" <linux@yadro.com>
Subject: Re: Question about supporting AMD eGPU hot plug case
Date: Thu, 11 Mar 2021 17:40:55 -0500	[thread overview]
Message-ID: <e3f3de55-8011-77d8-25ac-f16f8256beff@amd.com> (raw)
In-Reply-To: <146844cc-e2d9-aade-8223-db41b37853c5@amd.com>



On 2021-03-05 6:13 p.m., Andrey Grodzovsky wrote:
> 
> 
> On 2021-03-05 2:12 p.m., Sergei Miroshnichenko wrote:
>> On Fri, 2021-03-05 at 12:13 -0500, Andrey Grodzovsky wrote:
>>>
>>> On 2021-03-05 11:08 a.m., Sergei Miroshnichenko wrote:
>>>> On Thu, 2021-03-04 at 14:49 -0500, Andrey Grodzovsky wrote:
>>>>> + linux-pci
>>>>>
>>>>> On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote:
>>>>>> On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote:
>>>>>>> On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote:
>>>>>>>> On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote:
>>>>>>>>> On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote:
>>>>>>>>>> ...
>>>>>>>>> Are you saying that even without hot-plugging, while both
>>>>>>>>> nvme
>>>>>>>>> and
>>>>>>>>> AMD
>>>>>>>>> card are present
>>>>>>>>> right from boot, you still get BARs moving and MMIO
>>>>>>>>> ranges
>>>>>>>>> reassigned
>>>>>>>>> for NVME BARs
>>>>>>>>> just because amdgpu driver will start resize of AMD card
>>>>>>>>> BARs
>>>>>>>>> and
>>>>>>>>> this
>>>>>>>>> will trigger NVMEs BARs move to
>>>>>>>>> allow AMD card BARs to cover full range of VIDEO RAM ?
>>>>>>>> Yes. Unconditionally, because it is unknown beforehand if
>>>>>>>> NVMe's
>>>>>>>> BAR
>>>>>>>> movement will help. In this particular case BAR movement is
>>>>>>>> not
>>>>>>>> needed,
>>>>>>>> but is done anyway.
>>>>>>>>
>>>>>>>> BARs are not moved one by one, but the kernel releases all
>>>>>>>> the
>>>>>>>> releasable ones, and then recalculates a new BAR layout to
>>>>>>>> fit
>>>>>>>> them
>>>>>>>> all. Kernel's algorithm is different from BIOS's, so NVME
>>>>>>>> has
>>>>>>>> appeared
>>>>>>>> at a new place.
>>>>>>>>
>>>>>>>> This is triggered by following:
>>>>>>>> - at boot, if BIOS had assigned not every BAR;
>>>>>>>> - during pci_resize_resource();
>>>>>>>> - during pci_rescan_bus() -- after a pciehp event or a
>>>>>>>> manual
>>>>>>>> via
>>>>>>>> sysfs.
>>>>>>>
>>>>>>> By manual via sysfs you mean something like this - 'echo 1 >
>>>>>>> /sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 >
>>>>>>> /sys/bus/pci/rescan ' ? I am looking into how most reliably
>>>>>>> trigger
>>>>>>> PCI
>>>>>>> code to call my callbacks even without having external PCI
>>>>>>> cage
>>>>>>> for
>>>>>>> GPU
>>>>>>> (will take me some time to get it).
>>>>>>
>>>>>> Yeah, this is our way to go when a device can't be physically
>>>>>> removed
>>>>>> or unpowered remotely. With just a bit shorter path:
>>>>>>
>>>>>>      sudo sh -c 'echo 1 >
>>>>>> /sys/bus/pci/devices/0000\:0c\:00.0/remove'
>>>>>>      sudo sh -c 'echo 1 > /sys/bus/pci/rescan'
>>>>>>
>>>>>> Or, just a second command (rescan) is enough: a BAR movement
>>>>>> attempt
>>>>>> will be triggered even if there were no changes in PCI
>>>>>> topology.
>>>>>>
>>>>>> Serge
>>>>>>
>>>>>
>>>>> Hi Segrei
>>>>>
>>>>> Here is a link to initial implementation on top of your tree
>>>>> (movable_bars_v9.1) -
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Dyadro%2Fpcie_hotplug%2Fmovable_bars_v9.1%26id%3D05d6abceed650181bb7fe0a49884a26e378b908e&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C6658f0cc7c344791ce0f08d8e00a96bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637505683386334114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qEC3qIAM8h1vU4gGEgT6sThXsaCuatTI2UjM9Bb8KGI%3D&amp;reserved=0 
>>>>>
>>>>> I am able to pass one re-scan cycle and can use the card
>>>>> afterwards
>>>>> (see
>>>>> log1.log).
>>>>> But, according to your prints only BAR5 which is registers BAR
>>>>> was
>>>>> updated (amdgpu 0000:0b:00.0: BAR 5 updated: 0xfcc00000 ->
>>>>> 0xfc100000)
>>>>> while I am interested to test BAR0 (Graphic RAM) move since this
>>>>> is
>>>>> where most of the complexity is. Is there a way to hack your code
>>>>> to
>>>>> force this ?
>>>>
>>>> Hi Andrey,
>>>>
>>>> Regarding the amdgpu's BAR0 remaining on its place: it seems this
>>>> is
>>>> because of fixed BARs starting from fc600000. The kernel tends to
>>>> group
>>>> the BARs close to each other, making a bridge window as compact as
>>>> possible. So the BAR0 had occupied the closest "comfortable" slots
>>>> 0xe0000000-0xefffffff, with the resulting bridge window of bus 00
>>>> covering all the BARs:
>>>>
>>>>       pci_bus 0000:00: resource 10 [mem 0xe0000000-0xfec2ffff
>>>> window]
>>>>
>>>> I'll let you know if I get an idea how to rearrange that manually.
>>>>
>>>> Two GPUs can actually swap their places.
>>>
>>> What do you mean ?
>>
>> I was thinking: when the scenario of a PCI rescan with two GPUs (as was
>> described below) will start working, BAR0 of GPU0 can take place of
>> BAR0 of GPU1 after the first rescan.
>>
>>>> What also can make a BAR movable -- is rmmod'ing its driver. It
>>>> could
>>>> be some hack from within a tmux, like:
>>>>
>>>>     rmmod igb; \
>>>>     rmmod xhci_hcd; \
>>>>     rmmod ahci; \
>>>>     echo 1 > /sys/bus/pci/rescan; \
>>>>     modprobe igb; \
>>>>     modprobe xhci_hcd; \
>>>>     modprobe ahci
>>>
>>> But should I also rmmod amdgpu ? Or modprobing back the other
>>> drivers
>>> should cause (hopefully) BAR0 move in AMD graphic card ?
>>
>> You have already made the amdgpu movable, so no need to rmmod it --
>> just those with fixed BARs:
>>
>>      xhci_hcd 0000:0c:00.3: BAR 0: assigned fixed [mem 0xfc600000-
>> 0xfc6fffff 64bit]
>>      igb 0000:07:00.0: BAR 0: assigned fixed [mem 0xfc900000-0xfc91ffff]
>>      igb 0000:07:00.0: BAR 3: assigned fixed [mem 0xfc920000-0xfc923fff]
>>      ahci 0000:02:00.1: BAR 6: assigned fixed [mem 0xfcb00000-0xfcb7ffff
>> pref]
>>      ahci 0000:02:00.1: BAR 5: assigned fixed [mem 0xfcb80000-
>> 0xfcb9ffff]
>>      xhci_hcd 0000:02:00.0: BAR 0: assigned fixed [mem 0xfcba0000-
>> 0xfcba7fff 64bit]
>>      xhci_hcd 0000:05:00.0: BAR 0: assigned fixed [mem 0xfca00000-
>> 0xfca07fff 64bit]
>>      ahci 0000:0d:00.2: BAR 5: assigned fixed [mem 0xfce08000-
>> 0xfce08fff]
>>
>> The expected result is they all move closer to the start of PCI address
>> space.
>>
> 
> Ok, I updated as you described. Also I removed PCI conf command to stop
> address decoding and restart later as I noticed PCI core does it itself
> when needed.
> I tested now also with graphic desktop enabled while submitting
> 3d draw commands and seems like under this scenario everything still
> works. Again, this all needs to be tested with VRAM BAR move as then
> I believe I will see more issues like handling of MMIO mapped VRAM 
> objects (like GART table). In case you do have an AMD card you could 
> also maybe give it a try. In the meanwhile I will add support to 
> ioremapping of those VRAM objects.
> 
> Andrey

Just an update, added support for unmaping/remapping of all VRAM
objects, both user space mmaped and kernel ioremaped. Seems to work
ok but again, without forcing VRAM BAR to move I can't be sure.
Alex, Chsristian - take a look when you have some time to give me some
initial feedback on the amdgpu side.

The code is at 
https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1

Andrey

> 
>>>> I think pci_release_resource() should not be in
>>>> amdgpu_device_unmap_mmio() -- the patched kernel will do that
>>>> itself
>>>> for BARs the amdgpu_device_bar_fixed() returns false. Even more --
>>>> the
>>>> kernel will ensure that all BARs which were working before, are
>>>> reassigned properly, so it needs them to be assigned before the
>>>> procedure.
>>>> The same for pci_assign_unassigned_bus_resources() in
>>>> amdgpu_device_remap_mmio(): this callback is invoked from
>>>> pci_rescan_bus() after pci_assign_unassigned_root_bus_resources().
>>>
>>> This seems to me in contrast to your documentation (see
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FYADRO-KNS%2Flinux%2Fcommit%2F5bc12ba7c74f1c19c11db29b4807bd32acfac2c2&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C6658f0cc7c344791ce0f08d8e00a96bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637505683386334114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=kO1OlRL8iHMTcijuV0jDpODCtXpCCTpJv6YIn%2FuypNQ%3D&amp;reserved=0 
>>>
>>> step 1) although step 2 seems also to contradict step 1 with regard
>>> to
>>> BARs release - so now I am a bit confused. Also looking at
>>> nvme_dev_unmap - it calls pci_release_mem_regions. Symmetrical
>>> acquisition happens in nvme_dev_unmap.
>>
>> Ah, there is a difference between pci_release_region() and
>> pci_release_resource(), so subtle that I had to refresh my memory. You
>> are right, this has to be explained in the documentation!
>>
>> $ sudo cat /proc/iomem
>> ...
>> f0000000-fcffffff : PCI Bus 0000:00     -- root bus resource
>> ...
>>    fcf00000-fcffffff : PCI Bus 0000:01   -- bridge window
>>      fcf00000-fcf03fff : 0000:01:00.0    -- pci resource (BAR)
>>        fcf00000-fcf03fff : nvme          -- pci region (reserved by
>>                                             a driver, has its name).
>>
>> So the nvme_dev_unmap() reflects with pci_release_region() that the BAR
>> is not used by the driver anymore -- this actually should be called in
>> every rescan_prepare().
>>
>> But the pci_release_resource() tells to the PCI subsystem that the BAR
>> is "released" from the device and has to be assigned to some address
>> before using again, and makes the pci_resource_start(pdev,
>> relased_barno) invalid.
>>
>> Why the quotes: pci_release_resource() doesn't turn off the BAR,
>> doesn't write the registers -- this happens later.
>>
>> I thouht at first that pci_release_resource() is not safe in a
>> rescan_prepare(), but then double-checked, and found it's fine, just
>> not needed, as the kernel will do it anyway. And the
>> pci_bus_check_bars_assigned() to compare the bitmasks of successfully
>> assigned BARs is called *before* the hook.
>>
>>>>> When testing with 2 graphic cards and triggering rescan, hard
>>>>> hang of
>>>>> the system happens during rescan_prepare of the second card  when
>>>>> stopping the HW (see log2.log) - I don't understand why this
>>>>> would
>>>>> happen as each of them passes fine when they are standalone
>>>>> tested
>>>>> and
>>>>> there should be no interdependence between them as far as i know.
>>>>> Do you have any idea ?
>>>>
>>>> What happens with two GPUs is unclear for me as well, nothing looks
>>>> suspicious.
>>>>
>>>> Serge
>>>>

  reply	other threads:[~2021-03-11 22:42 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <ddef2da4-4726-7321-40fe-5f90788cc836@amd.com>
     [not found] ` <MN2PR12MB44880052F5C4ECF3EF22B58EF7B69@MN2PR12MB4488.namprd12.prod.outlook.com>
     [not found]   ` <ffb816ae8336ff2373ec5fcf695e698900c3d557.camel@yadro.com>
     [not found]     ` <9c41221f-ecfa-5554-f2ea-6f72cfe7dc7e@amd.com>
     [not found]       ` <dae8dfd8-3a99-620d-f0aa-ceb39923b807@amd.com>
     [not found]         ` <7d9e947648ce47a4ba8d223cdec08197@yadro.com>
     [not found]           ` <c82919f3-5296-cd0a-0b8f-c33614ca3ea9@amd.com>
     [not found]             ` <340062dba82b813b311b2c78022d2d3d0e6f414d.camel@yadro.com>
     [not found]               ` <927d7fbe-756f-a4f1-44cd-c68aecd906d7@amd.com>
     [not found]                 ` <dc2de228b92c4e27819c7681f650e1e5@yadro.com>
     [not found]                   ` <a6af29ed-179a-7619-3dde-474542c180f4@amd.com>
     [not found]                     ` <8f53f1403f0c4121885398487bbfa241@yadro.com>
     [not found]                       ` <fc2ea091-8470-9501-242d-8d82714adecb@amd.com>
     [not found]                         ` <50afd1079dbabeba36fd35fdef84e6e15470ef45.camel@yadro.com>
     [not found]                           ` <c53c23b5-dd37-44f1-dffd-ff9699018a82@amd.com>
     [not found]                             ` <8d7e2d7b7982d8d78c0ecaa74b9d40ace4db8453.camel@yadro.com>
2021-03-04 19:49                               ` Question about supporting AMD eGPU hot plug case Andrey Grodzovsky
2021-03-05 16:08                                 ` Sergei Miroshnichenko
2021-03-05 17:13                                   ` Andrey Grodzovsky
2021-03-05 19:12                                     ` Sergei Miroshnichenko
2021-03-05 23:13                                       ` Andrey Grodzovsky
2021-03-11 22:40                                         ` Andrey Grodzovsky [this message]
2021-03-12  9:03                                           ` Christian König
2021-03-12 15:34                                             ` Andrey Grodzovsky
2021-03-15 16:00                                               ` Andrey Grodzovsky
2021-03-15 16:10                                               ` Christian König
2021-03-15 16:21                                                 ` Andrey Grodzovsky
2021-03-15 16:55                                                   ` Christian König
2021-03-15 17:11                                                     ` Andrey Grodzovsky
2021-03-16 16:17                                                       ` Christian König
2021-03-16 17:39                                                         ` Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e3f3de55-8011-77d8-25ac-f16f8256beff@amd.com \
    --to=andrey.grodzovsky@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=anatoli.antonovitch@amd.com \
    --cc=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux@yadro.com \
    --cc=s.miroshnichenko@yadro.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).