+ linux-pci On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote: > On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote: >> On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote: >>> On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote: >>>> On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote: >>>>> ... >>>> Are you saying that even without hot-plugging, while both nvme >>>> and >>>> AMD >>>> card are present >>>> right from boot, you still get BARs moving and MMIO ranges >>>> reassigned >>>> for NVME BARs >>>> just because amdgpu driver will start resize of AMD card BARs and >>>> this >>>> will trigger NVMEs BARs move to >>>> allow AMD card BARs to cover full range of VIDEO RAM ? >>> Yes. Unconditionally, because it is unknown beforehand if NVMe's >>> BAR >>> movement will help. In this particular case BAR movement is not >>> needed, >>> but is done anyway. >>> >>> BARs are not moved one by one, but the kernel releases all the >>> releasable ones, and then recalculates a new BAR layout to fit them >>> all. Kernel's algorithm is different from BIOS's, so NVME has >>> appeared >>> at a new place. >>> >>> This is triggered by following: >>> - at boot, if BIOS had assigned not every BAR; >>> - during pci_resize_resource(); >>> - during pci_rescan_bus() -- after a pciehp event or a manual via >>> sysfs. >> >> By manual via sysfs you mean something like this - 'echo 1 > >> /sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 > >> /sys/bus/pci/rescan ' ? I am looking into how most reliably trigger >> PCI >> code to call my callbacks even without having external PCI cage for >> GPU >> (will take me some time to get it). > > Yeah, this is our way to go when a device can't be physically removed > or unpowered remotely. With just a bit shorter path: > > sudo sh -c 'echo 1 > /sys/bus/pci/devices/0000\:0c\:00.0/remove' > sudo sh -c 'echo 1 > /sys/bus/pci/rescan' > > Or, just a second command (rescan) is enough: a BAR movement attempt > will be triggered even if there were no changes in PCI topology. > > Serge > Hi Segrei Here is a link to initial implementation on top of your tree (movable_bars_v9.1) - https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=05d6abceed650181bb7fe0a49884a26e378b908e I am able to pass one re-scan cycle and can use the card afterwards (see log1.log). But, according to your prints only BAR5 which is registers BAR was updated (amdgpu 0000:0b:00.0: BAR 5 updated: 0xfcc00000 -> 0xfc100000) while I am interested to test BAR0 (Graphic RAM) move since this is where most of the complexity is. Is there a way to hack your code to force this ? When testing with 2 graphic cards and triggering rescan, hard hang of the system happens during rescan_prepare of the second card when stopping the HW (see log2.log) - I don't understand why this would happen as each of them passes fine when they are standalone tested and there should be no interdependence between them as far as i know. Do you have any idea ? Andrey