+ linux-pci

On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote:
> On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote:
>> On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote:
>>> On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote:
>>>> On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote:
>>>>> ...
>>>> Are you saying that even without hot-plugging, while both nvme
>>>> and
>>>> AMD
>>>> card are present
>>>> right from boot, you still get BARs moving and MMIO ranges
>>>> reassigned
>>>> for NVME BARs
>>>> just because amdgpu driver will start resize of AMD card BARs and
>>>> this
>>>> will trigger NVMEs BARs move to
>>>> allow AMD card BARs to cover full range of VIDEO RAM ?
>>> Yes. Unconditionally, because it is unknown beforehand if NVMe's
>>> BAR
>>> movement will help. In this particular case BAR movement is not
>>> needed,
>>> but is done anyway.
>>>
>>> BARs are not moved one by one, but the kernel releases all the
>>> releasable ones, and then recalculates a new BAR layout to fit them
>>> all. Kernel's algorithm is different from BIOS's, so NVME has
>>> appeared
>>> at a new place.
>>>
>>> This is triggered by following:
>>> - at boot, if BIOS had assigned not every BAR;
>>> - during pci_resize_resource();
>>> - during pci_rescan_bus() -- after a pciehp event or a manual via
>>> sysfs.
>>
>> By manual via sysfs you mean something like this - 'echo 1 >
>> /sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 >
>> /sys/bus/pci/rescan ' ? I am looking into how most reliably trigger
>> PCI
>> code to call my callbacks even without having external PCI cage for
>> GPU
>> (will take me some time to get it).
> 
> Yeah, this is our way to go when a device can't be physically removed
> or unpowered remotely. With just a bit shorter path:
> 
>    sudo sh -c 'echo 1 > /sys/bus/pci/devices/0000\:0c\:00.0/remove'
>    sudo sh -c 'echo 1 > /sys/bus/pci/rescan'
> 
> Or, just a second command (rescan) is enough: a BAR movement attempt
> will be triggered even if there were no changes in PCI topology.
> 
> Serge
> 

Hi Segrei

Here is a link to initial implementation on top of your tree 
(movable_bars_v9.1) - 
https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=05d6abceed650181bb7fe0a49884a26e378b908e
I am able to pass one re-scan cycle and can use the card afterwards (see 
log1.log).
But, according to your prints only BAR5 which is registers BAR was
updated (amdgpu 0000:0b:00.0: BAR 5 updated: 0xfcc00000 -> 0xfc100000)
while I am interested to test BAR0 (Graphic RAM) move since this is
where most of the complexity is. Is there a way to hack your code to 
force this ?
When testing with 2 graphic cards and triggering rescan, hard hang of
the system happens during rescan_prepare of the second card  when 
stopping the HW (see log2.log) - I don't understand why this would 
happen as each of them passes fine when they are standalone tested and 
there should be no interdependence between them as far as i know.
Do you have any idea ?

Andrey