* Re: Question about supporting AMD eGPU hot plug case [not found] ` <8d7e2d7b7982d8d78c0ecaa74b9d40ace4db8453.camel@yadro.com> @ 2021-03-04 19:49 ` Andrey Grodzovsky 2021-03-05 16:08 ` Sergei Miroshnichenko 0 siblings, 1 reply; 15+ messages in thread From: Andrey Grodzovsky @ 2021-03-04 19:49 UTC (permalink / raw) To: Sergei Miroshnichenko Cc: Alexander.Deucher, Christian.Koenig, anatoli.antonovitch, helgaas, linux-pci [-- Attachment #1: Type: text/plain, Size: 3049 bytes --] + linux-pci On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote: > On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote: >> On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote: >>> On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote: >>>> On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote: >>>>> ... >>>> Are you saying that even without hot-plugging, while both nvme >>>> and >>>> AMD >>>> card are present >>>> right from boot, you still get BARs moving and MMIO ranges >>>> reassigned >>>> for NVME BARs >>>> just because amdgpu driver will start resize of AMD card BARs and >>>> this >>>> will trigger NVMEs BARs move to >>>> allow AMD card BARs to cover full range of VIDEO RAM ? >>> Yes. Unconditionally, because it is unknown beforehand if NVMe's >>> BAR >>> movement will help. In this particular case BAR movement is not >>> needed, >>> but is done anyway. >>> >>> BARs are not moved one by one, but the kernel releases all the >>> releasable ones, and then recalculates a new BAR layout to fit them >>> all. Kernel's algorithm is different from BIOS's, so NVME has >>> appeared >>> at a new place. >>> >>> This is triggered by following: >>> - at boot, if BIOS had assigned not every BAR; >>> - during pci_resize_resource(); >>> - during pci_rescan_bus() -- after a pciehp event or a manual via >>> sysfs. >> >> By manual via sysfs you mean something like this - 'echo 1 > >> /sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 > >> /sys/bus/pci/rescan ' ? I am looking into how most reliably trigger >> PCI >> code to call my callbacks even without having external PCI cage for >> GPU >> (will take me some time to get it). > > Yeah, this is our way to go when a device can't be physically removed > or unpowered remotely. With just a bit shorter path: > > sudo sh -c 'echo 1 > /sys/bus/pci/devices/0000\:0c\:00.0/remove' > sudo sh -c 'echo 1 > /sys/bus/pci/rescan' > > Or, just a second command (rescan) is enough: a BAR movement attempt > will be triggered even if there were no changes in PCI topology. > > Serge > Hi Segrei Here is a link to initial implementation on top of your tree (movable_bars_v9.1) - https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=05d6abceed650181bb7fe0a49884a26e378b908e I am able to pass one re-scan cycle and can use the card afterwards (see log1.log). But, according to your prints only BAR5 which is registers BAR was updated (amdgpu 0000:0b:00.0: BAR 5 updated: 0xfcc00000 -> 0xfc100000) while I am interested to test BAR0 (Graphic RAM) move since this is where most of the complexity is. Is there a way to hack your code to force this ? When testing with 2 graphic cards and triggering rescan, hard hang of the system happens during rescan_prepare of the second card when stopping the HW (see log2.log) - I don't understand why this would happen as each of them passes fine when they are standalone tested and there should be no interdependence between them as far as i know. Do you have any idea ? Andrey [-- Attachment #2: log2.log --] [-- Type: text/x-log, Size: 1253 bytes --] Dual Vega + Polaris 11 - hang [ 102.022609 < 12.522871>] amdgpu 0000:0b:00.0: amdgpu: amdgpu_device_rescan_prepare - start [ 102.301619 < 0.279010>] [drm] psp command (0x2) failed and response status is (0x117) [ 102.301702 < 0.000083>] [drm] free PSP TMR buffer [ 102.334903 < 0.033201>] amdgpu 0000:0b:00.0: BAR 5: releasing [mem 0xfcb00000-0xfcb7ffff] [ 102.334925 < 0.000022>] amdgpu 0000:0b:00.0: BAR 2: releasing [mem 0xf0000000-0xf01fffff 64bit pref] [ 102.334941 < 0.000016>] amdgpu 0000:0b:00.0: BAR 0: releasing [mem 0xe0000000-0xefffffff 64bit pref] [ 102.334955 < 0.000014>] amdgpu 0000:0b:00.0: amdgpu: amdgpu_device_rescan_prepare - end [ 102.354480 < 0.019525>] amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_rescan_prepare - start [ 102.355059 < 0.000579>] amdgpu: last message was failed ret is 65535 [ 102.355073 < 0.000014>] amdgpu: failed to send message 261 ret is 65535 [ 102.355087 < 0.000014>] amdgpu: last message was failed ret is 65535 [ 102.355097 < 0.000010>] amdgpu: failed to send message 261 ret is 65535 [ 102.355110 < 0.000013>] amdgpu: [-- Attachment #3: log1.log --] [-- Type: text/x-log, Size: 17571 bytes --] (reverse-i-search)`cat': ^Ct /sys/kernel/debug/dri/0/amdgpu_gpu_recover root@andrey-test:~# echo 1 > /sys/bus/pci/rescan [ 126.923571 < 16.141524>] amdgpu 0000:0b:00.0: amdgpu: amdgpu_device_rescan_prepare - start [ 127.210952 < 0.287381>] [drm] psp command (0x2) failed and response status is (0x117) [ 127.211037 < 0.000085>] [drm] free PSP TMR buffer [ 127.244310 < 0.033273>] amdgpu 0000:0b:00.0: BAR 5: releasing [mem 0xfcc00000-0xfcc7ffff] [ 127.244333 < 0.000023>] amdgpu 0000:0b:00.0: BAR 2: releasing [mem 0xf0000000-0xf01fffff 64bit pref] [ 127.244348 < 0.000015>] amdgpu 0000:0b:00.0: BAR 0: releasing [mem 0xe0000000-0xefffffff 64bit pref] [ 127.244363 < 0.000015>] amdgpu 0000:0b:00.0: amdgpu: amdgpu_device_rescan_prepare - end [ 127.244932 < 0.000569>] pcieport 0000:03:07.0: resource 13 [io 0xe000-0xefff] released [ 127.244949 < 0.000017>] pcieport 0000:02:00.2: resource 13 [io 0xe000-0xefff] released [ 127.244964 < 0.000015>] pcieport 0000:00:01.3: resource 13 [io 0xe000-0xefff] released [ 127.244978 < 0.000014>] pcieport 0000:0a:00.0: resource 13 [io 0xd000-0xdfff] released [ 127.244992 < 0.000014>] pcieport 0000:09:00.0: resource 13 [io 0xd000-0xdfff] released [ 127.245005 < 0.000013>] pcieport 0000:00:03.1: resource 13 [io 0xd000-0xdfff] released [ 127.245019 < 0.000014>] pcieport 0000:00:01.1: resource 14 [mem 0xfcf00000-0xfcffffff] released [ 127.245034 < 0.000015>] pcieport 0000:00:01.3: resource 14 [mem 0xfc900000-0xfcbfffff] released [ 127.245047 < 0.000013>] pcieport 0000:00:03.1: resource 14 [mem 0xfcc00000-0xfcdfffff] released [ 127.245062 < 0.000015>] pcieport 0000:00:07.1: resource 14 [mem 0xfc600000-0xfc8fffff] released [ 127.245076 < 0.000014>] pcieport 0000:00:08.1: resource 14 [mem 0xfce00000-0xfcefffff] released [ 127.245091 < 0.000015>] pcieport 0000:00:03.1: resource 15 [mem 0xe0000000-0xf01fffff 64bit pref] released [ 127.245311 < 0.000220>] pcieport 0000:00:01.3: BAR 13: assigned [io 0xe000-0xefff] [ 127.245335 < 0.000024>] pcieport 0000:00:07.1: BAR 14: assigned [mem 0xfc400000-0xfc6fffff] [ 127.245359 < 0.000024>] pcieport 0000:00:01.3: BAR 14: assigned [mem 0xfc900000-0xfcbfffff] [ 127.245383 < 0.000024>] pcieport 0000:00:08.1: BAR 14: assigned [mem 0xfce00000-0xfcefffff] [ 127.245470 < 0.000087>] pcieport 0000:00:03.1: BAR 15: assigned [mem 0xe0000000-0xf7ffffff 64bit pref] [ 127.245492 < 0.000022>] pcieport 0000:00:01.1: BAR 14: assigned [mem 0xfc000000-0xfc0fffff] [ 127.245512 < 0.000020>] pcieport 0000:00:03.1: BAR 14: assigned [mem 0xfc100000-0xfc2fffff] [ 127.245534 < 0.000022>] pcieport 0000:00:03.1: BAR 13: assigned [io 0x1000-0x1fff] [ 127.245613 < 0.000079>] nvme 0000:01:00.0: BAR 0: assigned [mem 0xfc000000-0xfc003fff 64bit] [ 127.245639 < 0.000026>] nvme 0000:01:00.0: BAR 0 updated: 0xfcf00000 -> 0xfc000000 [ 127.245755 < 0.000116>] pcieport 0000:02:00.2: BAR 13: assigned [io 0xe000-0xefff] [ 127.245771 < 0.000016>] pcieport 0000:02:00.2: BAR 14: assigned [mem 0xfc900000-0xfcafffff] [ 127.245785 < 0.000014>] ahci 0000:02:00.1: BAR 6: assigned fixed [mem 0xfcb00000-0xfcb7ffff pref] [ 127.245798 < 0.000013>] ahci 0000:02:00.1: BAR 5: assigned fixed [mem 0xfcb80000-0xfcb9ffff] [ 127.245812 < 0.000014>] xhci_hcd 0000:02:00.0: BAR 0: assigned fixed [mem 0xfcba0000-0xfcba7fff 64bit] [ 127.245939 < 0.000127>] pcieport 0000:03:07.0: BAR 13: assigned [io 0xe000-0xefff] [ 127.245955 < 0.000016>] pcieport 0000:03:07.0: BAR 14: assigned [mem 0xfc900000-0xfc9fffff] [ 127.245971 < 0.000016>] pcieport 0000:03:04.0: BAR 14: assigned [mem 0xfca00000-0xfcafffff] [ 127.246040 < 0.000069>] xhci_hcd 0000:05:00.0: BAR 0: assigned fixed [mem 0xfca00000-0xfca07fff 64bit] [ 127.246126 < 0.000086>] igb 0000:07:00.0: BAR 2: assigned fixed [io 0xe000-0xe01f] [ 127.246138 < 0.000012>] igb 0000:07:00.0: BAR 0: assigned fixed [mem 0xfc900000-0xfc91ffff] [ 127.246151 < 0.000013>] igb 0000:07:00.0: BAR 3: assigned fixed [mem 0xfc920000-0xfc923fff] [ 127.246278 < 0.000127>] pcieport 0000:09:00.0: BAR 15: assigned [mem 0xe0000000-0xf7ffffff 64bit pref] [ 127.246295 < 0.000017>] pcieport 0000:09:00.0: BAR 14: assigned [mem 0xfc100000-0xfc1fffff] [ 127.246311 < 0.000016>] pcieport 0000:09:00.0: BAR 0: assigned [mem 0xfc200000-0xfc203fff] [ 127.246327 < 0.000016>] pcieport 0000:09:00.0: BAR 0 updated: 0xfcd00000 -> 0xfc200000 [ 127.246340 < 0.000013>] pcieport 0000:09:00.0: BAR 13: assigned [io 0x1000-0x1fff] [ 127.246583 < 0.000243>] pcieport 0000:0a:00.0: BAR 15: assigned [mem 0xe0000000-0xf7ffffff 64bit pref] [ 127.246619 < 0.000036>] pcieport 0000:0a:00.0: BAR 14: assigned [mem 0xfc100000-0xfc1fffff] [ 127.246648 < 0.000029>] pcieport 0000:0a:00.0: BAR 13: assigned [io 0x1000-0x1fff] [ 127.246951 < 0.000303>] amdgpu 0000:0b:00.0: BAR 0: assigned [mem 0xe0000000-0xefffffff 64bit pref] [ 127.247000 < 0.000049>] amdgpu 0000:0b:00.0: BAR 2: assigned [mem 0xf0000000-0xf01fffff 64bit pref] [ 127.247040 < 0.000040>] amdgpu 0000:0b:00.0: BAR 5: assigned [mem 0xfc100000-0xfc17ffff] [ 127.247066 < 0.000026>] amdgpu 0000:0b:00.0: BAR 5 updated: 0xfcc00000 -> 0xfc100000 [ 127.247099 < 0.000033>] pci 0000:0b:00.1: BAR 0: assigned [mem 0xfc180000-0xfc183fff] [ 127.247124 < 0.000025>] pci 0000:0b:00.1: BAR 0 updated: 0xfcca0000 -> 0xfc180000 [ 127.247145 < 0.000021>] amdgpu 0000:0b:00.0: BAR 4: assigned [io 0x1000-0x10ff] [ 127.247169 < 0.000024>] amdgpu 0000:0b:00.0: BAR 4 updated: 0xd000 -> 0x1000 [ 127.247365 < 0.000196>] xhci_hcd 0000:0c:00.3: BAR 0: assigned fixed [mem 0xfc600000-0xfc6fffff 64bit] [ 127.247410 < 0.000045>] pci 0000:0c:00.2: BAR 2: assigned [mem 0xfc400000-0xfc4fffff] [ 127.247434 < 0.000024>] pci 0000:0c:00.2: BAR 2 updated: 0xfc700000 -> 0xfc400000 [ 127.247457 < 0.000023>] pci 0000:0c:00.2: BAR 5: assigned [mem 0xfc500000-0xfc501fff] [ 127.247480 < 0.000023>] pci 0000:0c:00.2: BAR 5 updated: 0xfc800000 -> 0xfc500000 [ 127.247590 < 0.000110>] ahci 0000:0d:00.2: BAR 5: assigned fixed [mem 0xfce08000-0xfce08fff] [ 127.247632 < 0.000042>] pci 0000:0d:00.3: BAR 0: assigned [mem 0xfce00000-0xfce07fff] [ 127.247676 < 0.000044>] pci_bus 0000:00: resource 4 [io 0x0000-0x03af window] [ 127.247695 < 0.000019>] pci_bus 0000:00: resource 5 [io 0x03e0-0x0cf7 window] [ 127.247714 < 0.000019>] pci_bus 0000:00: resource 6 [io 0x03b0-0x03df window] [ 127.247742 < 0.000028>] pci_bus 0000:00: resource 7 [io 0x0d00-0xefff window] [ 127.247759 < 0.000017>] pci_bus 0000:00: resource 8 [mem 0x000a0000-0x000bffff window] [ 127.247779 < 0.000020>] pci_bus 0000:00: resource 9 [mem 0x000c0000-0x000dffff window] [ 127.247808 < 0.000029>] pci_bus 0000:00: resource 10 [mem 0xe0000000-0xfec2ffff window] [ 127.247827 < 0.000019>] pci_bus 0000:00: resource 11 [mem 0xfee00000-0xffffffff window] [ 127.247848 < 0.000021>] pci_bus 0000:01: resource 1 [mem 0xfc000000-0xfc0fffff] [ 127.247877 < 0.000029>] pci_bus 0000:02: resource 0 [io 0xe000-0xefff] [ 127.247893 < 0.000016>] pci_bus 0000:02: resource 1 [mem 0xfc900000-0xfcbfffff] [ 127.247913 < 0.000020>] pci_bus 0000:03: resource 0 [io 0xe000-0xefff] [ 127.247940 < 0.000027>] pci_bus 0000:03: resource 1 [mem 0xfc900000-0xfcafffff] [ 127.247958 < 0.000018>] pci_bus 0000:05: resource 1 [mem 0xfca00000-0xfcafffff] [ 127.247977 < 0.000019>] pci_bus 0000:07: resource 0 [io 0xe000-0xefff] [ 127.247995 < 0.000018>] pci_bus 0000:07: resource 1 [mem 0xfc900000-0xfc9fffff] [ 127.248015 < 0.000020>] pci_bus 0000:09: resource 0 [io 0x1000-0x1fff] [ 127.248031 < 0.000016>] pci_bus 0000:09: resource 1 [mem 0xfc100000-0xfc2fffff] [ 127.248051 < 0.000020>] pci_bus 0000:09: resource 2 [mem 0xe0000000-0xf7ffffff 64bit pref] [ 127.248078 < 0.000027>] pci_bus 0000:0a: resource 0 [io 0x1000-0x1fff] [ 127.248095 < 0.000017>] pci_bus 0000:0a: resource 1 [mem 0xfc100000-0xfc1fffff] [ 127.248114 < 0.000019>] pci_bus 0000:0a: resource 2 [mem 0xe0000000-0xf7ffffff 64bit pref] [ 127.248143 < 0.000029>] pci_bus 0000:0b: resource 0 [io 0x1000-0x1fff] [ 127.248160 < 0.000017>] pci_bus 0000:0b: resource 1 [mem 0xfc100000-0xfc1fffff] [ 127.248179 < 0.000019>] pci_bus 0000:0b: resource 2 [mem 0xe0000000-0xf7ffffff 64bit pref] [ 127.248209 < 0.000030>] pci_bus 0000:0c: resource 1 [mem 0xfc400000-0xfc6fffff] [ 127.248227 < 0.000018>] pci_bus 0000:0d: resource 1 [mem 0xfce00000-0xfcefffff] [ 127.248259 < 0.000032>] pcieport 0000:00:01.1: PCI bridge to [bus 01] [ 127.248282 < 0.000023>] pcieport 0000:00:01.1: bridge window [mem 0xfc000000-0xfc0fffff] [ 127.248308 < 0.000026>] pcieport 0000:03:00.0: PCI bridge to [bus 04] [ 127.248339 < 0.000031>] pcieport 0000:03:04.0: PCI bridge to [bus 05] [ 127.248360 < 0.000021>] pcieport 0000:03:04.0: bridge window [mem 0xfca00000-0xfcafffff] [ 127.248388 < 0.000028>] pcieport 0000:03:06.0: PCI bridge to [bus 06] [ 127.248418 < 0.000030>] pcieport 0000:03:07.0: PCI bridge to [bus 07] [ 127.248436 < 0.000018>] pcieport 0000:03:07.0: bridge window [io 0xe000-0xefff] [ 127.248464 < 0.000028>] pcieport 0000:03:07.0: bridge window [mem 0xfc900000-0xfc9fffff] [ 127.248492 < 0.000028>] pcieport 0000:03:09.0: PCI bridge to [bus 08] [ 127.248523 < 0.000031>] pcieport 0000:02:00.2: PCI bridge to [bus 03-08] [ 127.248540 < 0.000017>] pcieport 0000:02:00.2: bridge window [io 0xe000-0xefff] [ 127.248562 < 0.000022>] pcieport 0000:02:00.2: bridge window [mem 0xfc900000-0xfcafffff] [ 127.248595 < 0.000033>] pcieport 0000:00:01.3: PCI bridge to [bus 02-08] [ 127.248612 < 0.000017>] pcieport 0000:00:01.3: bridge window [io 0xe000-0xefff] [ 127.248633 < 0.000021>] pcieport 0000:00:01.3: bridge window [mem 0xfc900000-0xfcbfffff] [ 127.248660 < 0.000027>] pcieport 0000:0a:00.0: PCI bridge to [bus 0b] [ 127.248677 < 0.000017>] pcieport 0000:0a:00.0: bridge window [io 0x1000-0x1fff] [ 127.248699 < 0.000022>] pcieport 0000:0a:00.0: bridge window [mem 0xfc100000-0xfc1fffff] [ 127.248720 < 0.000021>] pcieport 0000:0a:00.0: bridge window [mem 0xe0000000-0xf7ffffff 64bit pref] [ 127.248745 < 0.000025>] pcieport 0000:09:00.0: PCI bridge to [bus 0a-0b] [ 127.248763 < 0.000018>] pcieport 0000:09:00.0: bridge window [io 0x1000-0x1fff] [ 127.248785 < 0.000022>] pcieport 0000:09:00.0: bridge window [mem 0xfc100000-0xfc1fffff] [ 127.248806 < 0.000021>] pcieport 0000:09:00.0: bridge window [mem 0xe0000000-0xf7ffffff 64bit pref] [ 127.248832 < 0.000026>] pcieport 0000:00:03.1: PCI bridge to [bus 09-0b] [ 127.248849 < 0.000017>] pcieport 0000:00:03.1: bridge window [io 0x1000-0x1fff] [ 127.248869 < 0.000020>] pcieport 0000:00:03.1: bridge window [mem 0xfc100000-0xfc2fffff] [ 127.248889 < 0.000020>] pcieport 0000:00:03.1: bridge window [mem 0xe0000000-0xf7ffffff 64bit pref] [ 127.248917 < 0.000028>] pcieport 0000:00:07.1: PCI bridge to [bus 0c] [ 127.248936 < 0.000019>] pcieport 0000:00:07.1: bridge window [mem 0xfc400000-0xfc6fffff] [ 127.248962 < 0.000026>] pcieport 0000:00:08.1: PCI bridge to [bus 0d] [ 127.248981 < 0.000019>] pcieport 0000:00:08.1: bridge window [mem 0xfce00000-0xfcefffff] [ 127.249893 < 0.000912>] amdgpu 0000:0b:00.0: amdgpu: amdgpu_device_rescan_done - start [ 127.250694 < 0.000801>] amdgpu 0000:0b:00.0: amdgpu: GPU reset succeeded, trying to resume [ 127.250916 < 0.000222>] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000). [ 127.251063 < 0.000147>] [drm] VRAM is lost due to GPU reset! [ 127.251478 < 0.000415>] [drm] PSP is resuming... [ 127.311254 < 0.059776>] [drm] reserve 0x400000 from 0xf7fec00000 for PSP TMR [ 127.342993 < 0.031739>] nvme nvme0: Shutdown timeout set to 8 seconds [ 127.380310 < 0.037317>] nvme nvme0: 32/0/0 default/read/poll queues [ 127.762031 < 0.381721>] [drm] kiq ring mec 2 pipe 1 q 0 [ 127.844977 < 0.082946>] [drm] UVD and UVD ENC initialized successfully. [ 127.944787 < 0.099810>] [drm] VCE initialized successfully. [ 127.944812 < 0.000025>] amdgpu 0000:0b:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0 [ 127.944826 < 0.000014>] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 127.944838 < 0.000012>] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 127.944849 < 0.000011>] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [ 127.944860 < 0.000011>] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [ 127.944871 < 0.000011>] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [ 127.944882 < 0.000011>] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [ 127.944893 < 0.000011>] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [ 127.944904 < 0.000011>] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [ 127.944915 < 0.000011>] amdgpu 0000:0b:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 [ 127.944926 < 0.000011>] amdgpu 0000:0b:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1 [ 127.944937 < 0.000011>] amdgpu 0000:0b:00.0: amdgpu: ring page0 uses VM inv eng 1 on hub 1 [ 127.944947 < 0.000010>] amdgpu 0000:0b:00.0: amdgpu: ring sdma1 uses VM inv eng 4 on hub 1 [ 127.944958 < 0.000011>] amdgpu 0000:0b:00.0: amdgpu: ring page1 uses VM inv eng 5 on hub 1 [ 127.944969 < 0.000011>] amdgpu 0000:0b:00.0: amdgpu: ring uvd_0 uses VM inv eng 6 on hub 1 [ 127.944979 < 0.000010>] amdgpu 0000:0b:00.0: amdgpu: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1 [ 127.944990 < 0.000011>] amdgpu 0000:0b:00.0: amdgpu: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1 [ 127.945000 < 0.000010>] amdgpu 0000:0b:00.0: amdgpu: ring vce0 uses VM inv eng 9 on hub 1 [ 127.945011 < 0.000011>] amdgpu 0000:0b:00.0: amdgpu: ring vce1 uses VM inv eng 10 on hub 1 [ 127.945021 < 0.000010>] amdgpu 0000:0b:00.0: amdgpu: ring vce2 uses VM inv eng 11 on hub 1 root@andrey-test:~# [ 127.951047 < 0.006026>] amdgpu 0000:0b:00.0: amdgpu: recover vram bo from shadow start [ 127.951304 < 0.000257>] amdgpu 0000:0b:00.0: amdgpu: recover vram bo from shadow done [ 127.951403 < 0.000099>] amdgpu 0000:0b:00.0: amdgpu: amdgpu_device_rescan_done - end [ 127.951801 < 0.000398>] pcieport 0000:00:01.1: Max Payload Size set to 256/ 512 (was 256), Max Read Rq 512 [ 127.951824 < 0.000023>] nvme 0000:01:00.0: Max Payload Size set to 256/ 256 (was 256), Max Read Rq 512 [ 127.951847 < 0.000023>] pcieport 0000:00:01.3: Max Payload Size set to 512/ 512 (was 512), Max Read Rq 512 [ 127.951868 < 0.000021>] xhci_hcd 0000:02:00.0: Max Payload Size set to 512/ 512 (was 512), Max Read Rq 512 [ 127.951888 < 0.000020>] ahci 0000:02:00.1: Max Payload Size set to 512/ 512 (was 512), Max Read Rq 512 [ 127.951908 < 0.000020>] pcieport 0000:02:00.2: Max Payload Size set to 512/ 512 (was 512), Max Read Rq 512 [ 127.951928 < 0.000020>] pcieport 0000:03:00.0: Max Payload Size set to 512/ 512 (was 512), Max Read Rq 512 [ 127.951948 < 0.000020>] pcieport 0000:03:04.0: Max Payload Size set to 512/ 512 (was 512), Max Read Rq 512 [ 127.951972 < 0.000024>] xhci_hcd 0000:05:00.0: Max Payload Size set to 512/ 512 (was 512), Max Read Rq 512 [ 127.951991 < 0.000019>] pcieport 0000:03:06.0: Max Payload Size set to 512/ 512 (was 512), Max Read Rq 512 [ 127.952011 < 0.000020>] pcieport 0000:03:07.0: Max Payload Size set to 512/ 512 (was 512), Max Read Rq 512 [ 127.952038 < 0.000027>] igb 0000:07:00.0: Max Payload Size set to 512/ 512 (was 512), Max Read Rq 512 [ 127.952058 < 0.000020>] pcieport 0000:03:09.0: Max Payload Size set to 512/ 512 (was 512), Max Read Rq 512 [ 127.952077 < 0.000019>] pcieport 0000:00:03.1: Max Payload Size set to 256/ 512 (was 256), Max Read Rq 512 [ 127.952097 < 0.000020>] pcieport 0000:09:00.0: Max Payload Size set to 256/ 512 (was 256), Max Read Rq 512 [ 127.952116 < 0.000019>] pcieport 0000:0a:00.0: Max Payload Size set to 256/ 512 (was 256), Max Read Rq 512 [ 127.952135 < 0.000019>] amdgpu 0000:0b:00.0: Max Payload Size set to 256/ 256 (was 256), Max Read Rq 512 [ 127.952154 < 0.000019>] pci 0000:0b:00.1: Max Payload Size set to 256/ 256 (was 256), Max Read Rq 512 [ 127.952172 < 0.000018>] pcieport 0000:00:07.1: Max Payload Size set to 256/ 512 (was 256), Max Read Rq 512 [ 127.952190 < 0.000018>] pci 0000:0c:00.0: Max Payload Size set to 256/ 256 (was 256), Max Read Rq 512 [ 127.952208 < 0.000018>] pci 0000:0c:00.2: Max Payload Size set to 256/ 256 (was 256), Max Read Rq 512 [ 127.952225 < 0.000017>] xhci_hcd 0000:0c:00.3: Max Payload Size set to 256/ 256 (was 256), Max Read Rq 512 [ 127.952244 < 0.000019>] pcieport 0000:00:08.1: Max Payload Size set to 256/ 512 (was 256), Max Read Rq 512 [ 127.952262 < 0.000018>] pci 0000:0d:00.0: Max Payload Size set to 256/ 256 (was 256), Max Read Rq 512 [ 127.952280 < 0.000018>] ahci 0000:0d:00.2: Max Payload Size set to 256/ 256 (was 256), Max Read Rq 512 [ 127.952298 < 0.000018>] pci 0000:0d:00.3: Max Payload Size set to 256/ 256 (was 256), Max Read Rq 512 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-04 19:49 ` Question about supporting AMD eGPU hot plug case Andrey Grodzovsky @ 2021-03-05 16:08 ` Sergei Miroshnichenko 2021-03-05 17:13 ` Andrey Grodzovsky 0 siblings, 1 reply; 15+ messages in thread From: Sergei Miroshnichenko @ 2021-03-05 16:08 UTC (permalink / raw) To: andrey.grodzovsky Cc: Alexander.Deucher, Christian.Koenig, anatoli.antonovitch, helgaas, linux-pci, linux On Thu, 2021-03-04 at 14:49 -0500, Andrey Grodzovsky wrote: > + linux-pci > > On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote: > > On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote: > > > On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote: > > > > On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote: > > > > > On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote: > > > > > > ... > > > > > Are you saying that even without hot-plugging, while both > > > > > nvme > > > > > and > > > > > AMD > > > > > card are present > > > > > right from boot, you still get BARs moving and MMIO ranges > > > > > reassigned > > > > > for NVME BARs > > > > > just because amdgpu driver will start resize of AMD card BARs > > > > > and > > > > > this > > > > > will trigger NVMEs BARs move to > > > > > allow AMD card BARs to cover full range of VIDEO RAM ? > > > > Yes. Unconditionally, because it is unknown beforehand if > > > > NVMe's > > > > BAR > > > > movement will help. In this particular case BAR movement is not > > > > needed, > > > > but is done anyway. > > > > > > > > BARs are not moved one by one, but the kernel releases all the > > > > releasable ones, and then recalculates a new BAR layout to fit > > > > them > > > > all. Kernel's algorithm is different from BIOS's, so NVME has > > > > appeared > > > > at a new place. > > > > > > > > This is triggered by following: > > > > - at boot, if BIOS had assigned not every BAR; > > > > - during pci_resize_resource(); > > > > - during pci_rescan_bus() -- after a pciehp event or a manual > > > > via > > > > sysfs. > > > > > > By manual via sysfs you mean something like this - 'echo 1 > > > > /sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 > > > > /sys/bus/pci/rescan ' ? I am looking into how most reliably > > > trigger > > > PCI > > > code to call my callbacks even without having external PCI cage > > > for > > > GPU > > > (will take me some time to get it). > > > > Yeah, this is our way to go when a device can't be physically > > removed > > or unpowered remotely. With just a bit shorter path: > > > > sudo sh -c 'echo 1 > /sys/bus/pci/devices/0000\:0c\:00.0/remove' > > sudo sh -c 'echo 1 > /sys/bus/pci/rescan' > > > > Or, just a second command (rescan) is enough: a BAR movement > > attempt > > will be triggered even if there were no changes in PCI topology. > > > > Serge > > > > Hi Segrei > > Here is a link to initial implementation on top of your tree > (movable_bars_v9.1) - > https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=05d6abceed650181bb7fe0a49884a26e378b908e > I am able to pass one re-scan cycle and can use the card afterwards > (see > log1.log). > But, according to your prints only BAR5 which is registers BAR was > updated (amdgpu 0000:0b:00.0: BAR 5 updated: 0xfcc00000 -> > 0xfc100000) > while I am interested to test BAR0 (Graphic RAM) move since this is > where most of the complexity is. Is there a way to hack your code to > force this ? Hi Andrey, Regarding the amdgpu's BAR0 remaining on its place: it seems this is because of fixed BARs starting from fc600000. The kernel tends to group the BARs close to each other, making a bridge window as compact as possible. So the BAR0 had occupied the closest "comfortable" slots 0xe0000000-0xefffffff, with the resulting bridge window of bus 00 covering all the BARs: pci_bus 0000:00: resource 10 [mem 0xe0000000-0xfec2ffff window] I'll let you know if I get an idea how to rearrange that manually. Two GPUs can actually swap their places. What also can make a BAR movable -- is rmmod'ing its driver. It could be some hack from within a tmux, like: rmmod igb; \ rmmod xhci_hcd; \ rmmod ahci; \ echo 1 > /sys/bus/pci/rescan; \ modprobe igb; \ modprobe xhci_hcd; \ modprobe ahci I think pci_release_resource() should not be in amdgpu_device_unmap_mmio() -- the patched kernel will do that itself for BARs the amdgpu_device_bar_fixed() returns false. Even more -- the kernel will ensure that all BARs which were working before, are reassigned properly, so it needs them to be assigned before the procedure. The same for pci_assign_unassigned_bus_resources() in amdgpu_device_remap_mmio(): this callback is invoked from pci_rescan_bus() after pci_assign_unassigned_root_bus_resources(). > When testing with 2 graphic cards and triggering rescan, hard hang of > the system happens during rescan_prepare of the second card when > stopping the HW (see log2.log) - I don't understand why this would > happen as each of them passes fine when they are standalone tested > and > there should be no interdependence between them as far as i know. > Do you have any idea ? What happens with two GPUs is unclear for me as well, nothing looks suspicious. Serge ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-05 16:08 ` Sergei Miroshnichenko @ 2021-03-05 17:13 ` Andrey Grodzovsky 2021-03-05 19:12 ` Sergei Miroshnichenko 0 siblings, 1 reply; 15+ messages in thread From: Andrey Grodzovsky @ 2021-03-05 17:13 UTC (permalink / raw) To: Sergei Miroshnichenko Cc: Alexander.Deucher, Christian.Koenig, anatoli.antonovitch, helgaas, linux-pci, linux On 2021-03-05 11:08 a.m., Sergei Miroshnichenko wrote: > On Thu, 2021-03-04 at 14:49 -0500, Andrey Grodzovsky wrote: >> + linux-pci >> >> On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote: >>> On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote: >>>> On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote: >>>>> On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote: >>>>>> On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote: >>>>>>> ... >>>>>> Are you saying that even without hot-plugging, while both >>>>>> nvme >>>>>> and >>>>>> AMD >>>>>> card are present >>>>>> right from boot, you still get BARs moving and MMIO ranges >>>>>> reassigned >>>>>> for NVME BARs >>>>>> just because amdgpu driver will start resize of AMD card BARs >>>>>> and >>>>>> this >>>>>> will trigger NVMEs BARs move to >>>>>> allow AMD card BARs to cover full range of VIDEO RAM ? >>>>> Yes. Unconditionally, because it is unknown beforehand if >>>>> NVMe's >>>>> BAR >>>>> movement will help. In this particular case BAR movement is not >>>>> needed, >>>>> but is done anyway. >>>>> >>>>> BARs are not moved one by one, but the kernel releases all the >>>>> releasable ones, and then recalculates a new BAR layout to fit >>>>> them >>>>> all. Kernel's algorithm is different from BIOS's, so NVME has >>>>> appeared >>>>> at a new place. >>>>> >>>>> This is triggered by following: >>>>> - at boot, if BIOS had assigned not every BAR; >>>>> - during pci_resize_resource(); >>>>> - during pci_rescan_bus() -- after a pciehp event or a manual >>>>> via >>>>> sysfs. >>>> >>>> By manual via sysfs you mean something like this - 'echo 1 > >>>> /sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 > >>>> /sys/bus/pci/rescan ' ? I am looking into how most reliably >>>> trigger >>>> PCI >>>> code to call my callbacks even without having external PCI cage >>>> for >>>> GPU >>>> (will take me some time to get it). >>> >>> Yeah, this is our way to go when a device can't be physically >>> removed >>> or unpowered remotely. With just a bit shorter path: >>> >>> sudo sh -c 'echo 1 > /sys/bus/pci/devices/0000\:0c\:00.0/remove' >>> sudo sh -c 'echo 1 > /sys/bus/pci/rescan' >>> >>> Or, just a second command (rescan) is enough: a BAR movement >>> attempt >>> will be triggered even if there were no changes in PCI topology. >>> >>> Serge >>> >> >> Hi Segrei >> >> Here is a link to initial implementation on top of your tree >> (movable_bars_v9.1) - >> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Dyadro%2Fpcie_hotplug%2Fmovable_bars_v9.1%26id%3D05d6abceed650181bb7fe0a49884a26e378b908e&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C637bf3aab3634ab69dc608d8dff0e0b1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637505572958565332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2F2urbXqCLMzZBKe0lTyAfhgqDqDxiOZN04R9LTVxvI0%3D&reserved=0 >> I am able to pass one re-scan cycle and can use the card afterwards >> (see >> log1.log). >> But, according to your prints only BAR5 which is registers BAR was >> updated (amdgpu 0000:0b:00.0: BAR 5 updated: 0xfcc00000 -> >> 0xfc100000) >> while I am interested to test BAR0 (Graphic RAM) move since this is >> where most of the complexity is. Is there a way to hack your code to >> force this ? > > Hi Andrey, > > Regarding the amdgpu's BAR0 remaining on its place: it seems this is > because of fixed BARs starting from fc600000. The kernel tends to group > the BARs close to each other, making a bridge window as compact as > possible. So the BAR0 had occupied the closest "comfortable" slots > 0xe0000000-0xefffffff, with the resulting bridge window of bus 00 > covering all the BARs: > > pci_bus 0000:00: resource 10 [mem 0xe0000000-0xfec2ffff window] > > I'll let you know if I get an idea how to rearrange that manually. > > Two GPUs can actually swap their places. What do you mean ? > > What also can make a BAR movable -- is rmmod'ing its driver. It could > be some hack from within a tmux, like: > > rmmod igb; \ > rmmod xhci_hcd; \ > rmmod ahci; \ > echo 1 > /sys/bus/pci/rescan; \ > modprobe igb; \ > modprobe xhci_hcd; \ > modprobe ahci But should I also rmmod amdgpu ? Or modprobing back the other drivers should cause (hopefully) BAR0 move in AMD graphic card ? > > I think pci_release_resource() should not be in > amdgpu_device_unmap_mmio() -- the patched kernel will do that itself > for BARs the amdgpu_device_bar_fixed() returns false. Even more -- the > kernel will ensure that all BARs which were working before, are > reassigned properly, so it needs them to be assigned before the > procedure. > The same for pci_assign_unassigned_bus_resources() in > amdgpu_device_remap_mmio(): this callback is invoked from > pci_rescan_bus() after pci_assign_unassigned_root_bus_resources(). This seems to me in contrast to your documentation (see https://github.com/YADRO-KNS/linux/commit/5bc12ba7c74f1c19c11db29b4807bd32acfac2c2 step 1) although step 2 seems also to contradict step 1 with regard to BARs release - so now I am a bit confused. Also looking at nvme_dev_unmap - it calls pci_release_mem_regions. Symmetrical acquisition happens in nvme_dev_unmap. Andrey > >> When testing with 2 graphic cards and triggering rescan, hard hang of >> the system happens during rescan_prepare of the second card when >> stopping the HW (see log2.log) - I don't understand why this would >> happen as each of them passes fine when they are standalone tested >> and >> there should be no interdependence between them as far as i know. >> Do you have any idea ? > > What happens with two GPUs is unclear for me as well, nothing looks > suspicious. > > Serge > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-05 17:13 ` Andrey Grodzovsky @ 2021-03-05 19:12 ` Sergei Miroshnichenko 2021-03-05 23:13 ` Andrey Grodzovsky 0 siblings, 1 reply; 15+ messages in thread From: Sergei Miroshnichenko @ 2021-03-05 19:12 UTC (permalink / raw) To: andrey.grodzovsky Cc: Alexander.Deucher, Christian.Koenig, anatoli.antonovitch, helgaas, linux-pci, linux On Fri, 2021-03-05 at 12:13 -0500, Andrey Grodzovsky wrote: > > On 2021-03-05 11:08 a.m., Sergei Miroshnichenko wrote: > > On Thu, 2021-03-04 at 14:49 -0500, Andrey Grodzovsky wrote: > > > + linux-pci > > > > > > On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote: > > > > On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote: > > > > > On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote: > > > > > > On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote: > > > > > > > On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote: > > > > > > > > ... > > > > > > > Are you saying that even without hot-plugging, while both > > > > > > > nvme > > > > > > > and > > > > > > > AMD > > > > > > > card are present > > > > > > > right from boot, you still get BARs moving and MMIO > > > > > > > ranges > > > > > > > reassigned > > > > > > > for NVME BARs > > > > > > > just because amdgpu driver will start resize of AMD card > > > > > > > BARs > > > > > > > and > > > > > > > this > > > > > > > will trigger NVMEs BARs move to > > > > > > > allow AMD card BARs to cover full range of VIDEO RAM ? > > > > > > Yes. Unconditionally, because it is unknown beforehand if > > > > > > NVMe's > > > > > > BAR > > > > > > movement will help. In this particular case BAR movement is > > > > > > not > > > > > > needed, > > > > > > but is done anyway. > > > > > > > > > > > > BARs are not moved one by one, but the kernel releases all > > > > > > the > > > > > > releasable ones, and then recalculates a new BAR layout to > > > > > > fit > > > > > > them > > > > > > all. Kernel's algorithm is different from BIOS's, so NVME > > > > > > has > > > > > > appeared > > > > > > at a new place. > > > > > > > > > > > > This is triggered by following: > > > > > > - at boot, if BIOS had assigned not every BAR; > > > > > > - during pci_resize_resource(); > > > > > > - during pci_rescan_bus() -- after a pciehp event or a > > > > > > manual > > > > > > via > > > > > > sysfs. > > > > > > > > > > By manual via sysfs you mean something like this - 'echo 1 > > > > > > /sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 > > > > > > /sys/bus/pci/rescan ' ? I am looking into how most reliably > > > > > trigger > > > > > PCI > > > > > code to call my callbacks even without having external PCI > > > > > cage > > > > > for > > > > > GPU > > > > > (will take me some time to get it). > > > > > > > > Yeah, this is our way to go when a device can't be physically > > > > removed > > > > or unpowered remotely. With just a bit shorter path: > > > > > > > > sudo sh -c 'echo 1 > > > > > /sys/bus/pci/devices/0000\:0c\:00.0/remove' > > > > sudo sh -c 'echo 1 > /sys/bus/pci/rescan' > > > > > > > > Or, just a second command (rescan) is enough: a BAR movement > > > > attempt > > > > will be triggered even if there were no changes in PCI > > > > topology. > > > > > > > > Serge > > > > > > > > > > Hi Segrei > > > > > > Here is a link to initial implementation on top of your tree > > > (movable_bars_v9.1) - > > > https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Dyadro%2Fpcie_hotplug%2Fmovable_bars_v9.1%26id%3D05d6abceed650181bb7fe0a49884a26e378b908e&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C637bf3aab3634ab69dc608d8dff0e0b1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637505572958565332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2F2urbXqCLMzZBKe0lTyAfhgqDqDxiOZN04R9LTVxvI0%3D&reserved=0 > > > I am able to pass one re-scan cycle and can use the card > > > afterwards > > > (see > > > log1.log). > > > But, according to your prints only BAR5 which is registers BAR > > > was > > > updated (amdgpu 0000:0b:00.0: BAR 5 updated: 0xfcc00000 -> > > > 0xfc100000) > > > while I am interested to test BAR0 (Graphic RAM) move since this > > > is > > > where most of the complexity is. Is there a way to hack your code > > > to > > > force this ? > > > > Hi Andrey, > > > > Regarding the amdgpu's BAR0 remaining on its place: it seems this > > is > > because of fixed BARs starting from fc600000. The kernel tends to > > group > > the BARs close to each other, making a bridge window as compact as > > possible. So the BAR0 had occupied the closest "comfortable" slots > > 0xe0000000-0xefffffff, with the resulting bridge window of bus 00 > > covering all the BARs: > > > > pci_bus 0000:00: resource 10 [mem 0xe0000000-0xfec2ffff > > window] > > > > I'll let you know if I get an idea how to rearrange that manually. > > > > Two GPUs can actually swap their places. > > What do you mean ? I was thinking: when the scenario of a PCI rescan with two GPUs (as was described below) will start working, BAR0 of GPU0 can take place of BAR0 of GPU1 after the first rescan. > > What also can make a BAR movable -- is rmmod'ing its driver. It > > could > > be some hack from within a tmux, like: > > > > rmmod igb; \ > > rmmod xhci_hcd; \ > > rmmod ahci; \ > > echo 1 > /sys/bus/pci/rescan; \ > > modprobe igb; \ > > modprobe xhci_hcd; \ > > modprobe ahci > > But should I also rmmod amdgpu ? Or modprobing back the other > drivers > should cause (hopefully) BAR0 move in AMD graphic card ? You have already made the amdgpu movable, so no need to rmmod it -- just those with fixed BARs: xhci_hcd 0000:0c:00.3: BAR 0: assigned fixed [mem 0xfc600000- 0xfc6fffff 64bit] igb 0000:07:00.0: BAR 0: assigned fixed [mem 0xfc900000-0xfc91ffff] igb 0000:07:00.0: BAR 3: assigned fixed [mem 0xfc920000-0xfc923fff] ahci 0000:02:00.1: BAR 6: assigned fixed [mem 0xfcb00000-0xfcb7ffff pref] ahci 0000:02:00.1: BAR 5: assigned fixed [mem 0xfcb80000- 0xfcb9ffff] xhci_hcd 0000:02:00.0: BAR 0: assigned fixed [mem 0xfcba0000- 0xfcba7fff 64bit] xhci_hcd 0000:05:00.0: BAR 0: assigned fixed [mem 0xfca00000- 0xfca07fff 64bit] ahci 0000:0d:00.2: BAR 5: assigned fixed [mem 0xfce08000- 0xfce08fff] The expected result is they all move closer to the start of PCI address space. > > I think pci_release_resource() should not be in > > amdgpu_device_unmap_mmio() -- the patched kernel will do that > > itself > > for BARs the amdgpu_device_bar_fixed() returns false. Even more -- > > the > > kernel will ensure that all BARs which were working before, are > > reassigned properly, so it needs them to be assigned before the > > procedure. > > The same for pci_assign_unassigned_bus_resources() in > > amdgpu_device_remap_mmio(): this callback is invoked from > > pci_rescan_bus() after pci_assign_unassigned_root_bus_resources(). > > This seems to me in contrast to your documentation (see > https://github.com/YADRO-KNS/linux/commit/5bc12ba7c74f1c19c11db29b4807bd32acfac2c2 > step 1) although step 2 seems also to contradict step 1 with regard > to > BARs release - so now I am a bit confused. Also looking at > nvme_dev_unmap - it calls pci_release_mem_regions. Symmetrical > acquisition happens in nvme_dev_unmap. Ah, there is a difference between pci_release_region() and pci_release_resource(), so subtle that I had to refresh my memory. You are right, this has to be explained in the documentation! $ sudo cat /proc/iomem ... f0000000-fcffffff : PCI Bus 0000:00 -- root bus resource ... fcf00000-fcffffff : PCI Bus 0000:01 -- bridge window fcf00000-fcf03fff : 0000:01:00.0 -- pci resource (BAR) fcf00000-fcf03fff : nvme -- pci region (reserved by a driver, has its name). So the nvme_dev_unmap() reflects with pci_release_region() that the BAR is not used by the driver anymore -- this actually should be called in every rescan_prepare(). But the pci_release_resource() tells to the PCI subsystem that the BAR is "released" from the device and has to be assigned to some address before using again, and makes the pci_resource_start(pdev, relased_barno) invalid. Why the quotes: pci_release_resource() doesn't turn off the BAR, doesn't write the registers -- this happens later. I thouht at first that pci_release_resource() is not safe in a rescan_prepare(), but then double-checked, and found it's fine, just not needed, as the kernel will do it anyway. And the pci_bus_check_bars_assigned() to compare the bitmasks of successfully assigned BARs is called *before* the hook. > > > When testing with 2 graphic cards and triggering rescan, hard > > > hang of > > > the system happens during rescan_prepare of the second card when > > > stopping the HW (see log2.log) - I don't understand why this > > > would > > > happen as each of them passes fine when they are standalone > > > tested > > > and > > > there should be no interdependence between them as far as i know. > > > Do you have any idea ? > > > > What happens with two GPUs is unclear for me as well, nothing looks > > suspicious. > > > > Serge > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-05 19:12 ` Sergei Miroshnichenko @ 2021-03-05 23:13 ` Andrey Grodzovsky 2021-03-11 22:40 ` Andrey Grodzovsky 0 siblings, 1 reply; 15+ messages in thread From: Andrey Grodzovsky @ 2021-03-05 23:13 UTC (permalink / raw) To: Sergei Miroshnichenko Cc: Alexander.Deucher, Christian.Koenig, anatoli.antonovitch, helgaas, linux-pci, linux On 2021-03-05 2:12 p.m., Sergei Miroshnichenko wrote: > On Fri, 2021-03-05 at 12:13 -0500, Andrey Grodzovsky wrote: >> >> On 2021-03-05 11:08 a.m., Sergei Miroshnichenko wrote: >>> On Thu, 2021-03-04 at 14:49 -0500, Andrey Grodzovsky wrote: >>>> + linux-pci >>>> >>>> On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote: >>>>> On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote: >>>>>> On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote: >>>>>>> On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote: >>>>>>>> On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote: >>>>>>>>> ... >>>>>>>> Are you saying that even without hot-plugging, while both >>>>>>>> nvme >>>>>>>> and >>>>>>>> AMD >>>>>>>> card are present >>>>>>>> right from boot, you still get BARs moving and MMIO >>>>>>>> ranges >>>>>>>> reassigned >>>>>>>> for NVME BARs >>>>>>>> just because amdgpu driver will start resize of AMD card >>>>>>>> BARs >>>>>>>> and >>>>>>>> this >>>>>>>> will trigger NVMEs BARs move to >>>>>>>> allow AMD card BARs to cover full range of VIDEO RAM ? >>>>>>> Yes. Unconditionally, because it is unknown beforehand if >>>>>>> NVMe's >>>>>>> BAR >>>>>>> movement will help. In this particular case BAR movement is >>>>>>> not >>>>>>> needed, >>>>>>> but is done anyway. >>>>>>> >>>>>>> BARs are not moved one by one, but the kernel releases all >>>>>>> the >>>>>>> releasable ones, and then recalculates a new BAR layout to >>>>>>> fit >>>>>>> them >>>>>>> all. Kernel's algorithm is different from BIOS's, so NVME >>>>>>> has >>>>>>> appeared >>>>>>> at a new place. >>>>>>> >>>>>>> This is triggered by following: >>>>>>> - at boot, if BIOS had assigned not every BAR; >>>>>>> - during pci_resize_resource(); >>>>>>> - during pci_rescan_bus() -- after a pciehp event or a >>>>>>> manual >>>>>>> via >>>>>>> sysfs. >>>>>> >>>>>> By manual via sysfs you mean something like this - 'echo 1 > >>>>>> /sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 > >>>>>> /sys/bus/pci/rescan ' ? I am looking into how most reliably >>>>>> trigger >>>>>> PCI >>>>>> code to call my callbacks even without having external PCI >>>>>> cage >>>>>> for >>>>>> GPU >>>>>> (will take me some time to get it). >>>>> >>>>> Yeah, this is our way to go when a device can't be physically >>>>> removed >>>>> or unpowered remotely. With just a bit shorter path: >>>>> >>>>> sudo sh -c 'echo 1 > >>>>> /sys/bus/pci/devices/0000\:0c\:00.0/remove' >>>>> sudo sh -c 'echo 1 > /sys/bus/pci/rescan' >>>>> >>>>> Or, just a second command (rescan) is enough: a BAR movement >>>>> attempt >>>>> will be triggered even if there were no changes in PCI >>>>> topology. >>>>> >>>>> Serge >>>>> >>>> >>>> Hi Segrei >>>> >>>> Here is a link to initial implementation on top of your tree >>>> (movable_bars_v9.1) - >>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Dyadro%2Fpcie_hotplug%2Fmovable_bars_v9.1%26id%3D05d6abceed650181bb7fe0a49884a26e378b908e&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C6658f0cc7c344791ce0f08d8e00a96bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637505683386334114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qEC3qIAM8h1vU4gGEgT6sThXsaCuatTI2UjM9Bb8KGI%3D&reserved=0 >>>> I am able to pass one re-scan cycle and can use the card >>>> afterwards >>>> (see >>>> log1.log). >>>> But, according to your prints only BAR5 which is registers BAR >>>> was >>>> updated (amdgpu 0000:0b:00.0: BAR 5 updated: 0xfcc00000 -> >>>> 0xfc100000) >>>> while I am interested to test BAR0 (Graphic RAM) move since this >>>> is >>>> where most of the complexity is. Is there a way to hack your code >>>> to >>>> force this ? >>> >>> Hi Andrey, >>> >>> Regarding the amdgpu's BAR0 remaining on its place: it seems this >>> is >>> because of fixed BARs starting from fc600000. The kernel tends to >>> group >>> the BARs close to each other, making a bridge window as compact as >>> possible. So the BAR0 had occupied the closest "comfortable" slots >>> 0xe0000000-0xefffffff, with the resulting bridge window of bus 00 >>> covering all the BARs: >>> >>> pci_bus 0000:00: resource 10 [mem 0xe0000000-0xfec2ffff >>> window] >>> >>> I'll let you know if I get an idea how to rearrange that manually. >>> >>> Two GPUs can actually swap their places. >> >> What do you mean ? > > I was thinking: when the scenario of a PCI rescan with two GPUs (as was > described below) will start working, BAR0 of GPU0 can take place of > BAR0 of GPU1 after the first rescan. > >>> What also can make a BAR movable -- is rmmod'ing its driver. It >>> could >>> be some hack from within a tmux, like: >>> >>> rmmod igb; \ >>> rmmod xhci_hcd; \ >>> rmmod ahci; \ >>> echo 1 > /sys/bus/pci/rescan; \ >>> modprobe igb; \ >>> modprobe xhci_hcd; \ >>> modprobe ahci >> >> But should I also rmmod amdgpu ? Or modprobing back the other >> drivers >> should cause (hopefully) BAR0 move in AMD graphic card ? > > You have already made the amdgpu movable, so no need to rmmod it -- > just those with fixed BARs: > > xhci_hcd 0000:0c:00.3: BAR 0: assigned fixed [mem 0xfc600000- > 0xfc6fffff 64bit] > igb 0000:07:00.0: BAR 0: assigned fixed [mem 0xfc900000-0xfc91ffff] > igb 0000:07:00.0: BAR 3: assigned fixed [mem 0xfc920000-0xfc923fff] > ahci 0000:02:00.1: BAR 6: assigned fixed [mem 0xfcb00000-0xfcb7ffff > pref] > ahci 0000:02:00.1: BAR 5: assigned fixed [mem 0xfcb80000- > 0xfcb9ffff] > xhci_hcd 0000:02:00.0: BAR 0: assigned fixed [mem 0xfcba0000- > 0xfcba7fff 64bit] > xhci_hcd 0000:05:00.0: BAR 0: assigned fixed [mem 0xfca00000- > 0xfca07fff 64bit] > ahci 0000:0d:00.2: BAR 5: assigned fixed [mem 0xfce08000- > 0xfce08fff] > > The expected result is they all move closer to the start of PCI address > space. > Ok, I updated as you described. Also I removed PCI conf command to stop address decoding and restart later as I noticed PCI core does it itself when needed. I tested now also with graphic desktop enabled while submitting 3d draw commands and seems like under this scenario everything still works. Again, this all needs to be tested with VRAM BAR move as then I believe I will see more issues like handling of MMIO mapped VRAM objects (like GART table). In case you do have an AMD card you could also maybe give it a try. In the meanwhile I will add support to ioremapping of those VRAM objects. Andrey >>> I think pci_release_resource() should not be in >>> amdgpu_device_unmap_mmio() -- the patched kernel will do that >>> itself >>> for BARs the amdgpu_device_bar_fixed() returns false. Even more -- >>> the >>> kernel will ensure that all BARs which were working before, are >>> reassigned properly, so it needs them to be assigned before the >>> procedure. >>> The same for pci_assign_unassigned_bus_resources() in >>> amdgpu_device_remap_mmio(): this callback is invoked from >>> pci_rescan_bus() after pci_assign_unassigned_root_bus_resources(). >> >> This seems to me in contrast to your documentation (see >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FYADRO-KNS%2Flinux%2Fcommit%2F5bc12ba7c74f1c19c11db29b4807bd32acfac2c2&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C6658f0cc7c344791ce0f08d8e00a96bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637505683386334114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=kO1OlRL8iHMTcijuV0jDpODCtXpCCTpJv6YIn%2FuypNQ%3D&reserved=0 >> step 1) although step 2 seems also to contradict step 1 with regard >> to >> BARs release - so now I am a bit confused. Also looking at >> nvme_dev_unmap - it calls pci_release_mem_regions. Symmetrical >> acquisition happens in nvme_dev_unmap. > > Ah, there is a difference between pci_release_region() and > pci_release_resource(), so subtle that I had to refresh my memory. You > are right, this has to be explained in the documentation! > > $ sudo cat /proc/iomem > ... > f0000000-fcffffff : PCI Bus 0000:00 -- root bus resource > ... > fcf00000-fcffffff : PCI Bus 0000:01 -- bridge window > fcf00000-fcf03fff : 0000:01:00.0 -- pci resource (BAR) > fcf00000-fcf03fff : nvme -- pci region (reserved by > a driver, has its name). > > So the nvme_dev_unmap() reflects with pci_release_region() that the BAR > is not used by the driver anymore -- this actually should be called in > every rescan_prepare(). > > But the pci_release_resource() tells to the PCI subsystem that the BAR > is "released" from the device and has to be assigned to some address > before using again, and makes the pci_resource_start(pdev, > relased_barno) invalid. > > Why the quotes: pci_release_resource() doesn't turn off the BAR, > doesn't write the registers -- this happens later. > > I thouht at first that pci_release_resource() is not safe in a > rescan_prepare(), but then double-checked, and found it's fine, just > not needed, as the kernel will do it anyway. And the > pci_bus_check_bars_assigned() to compare the bitmasks of successfully > assigned BARs is called *before* the hook. > >>>> When testing with 2 graphic cards and triggering rescan, hard >>>> hang of >>>> the system happens during rescan_prepare of the second card when >>>> stopping the HW (see log2.log) - I don't understand why this >>>> would >>>> happen as each of them passes fine when they are standalone >>>> tested >>>> and >>>> there should be no interdependence between them as far as i know. >>>> Do you have any idea ? >>> >>> What happens with two GPUs is unclear for me as well, nothing looks >>> suspicious. >>> >>> Serge >>> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-05 23:13 ` Andrey Grodzovsky @ 2021-03-11 22:40 ` Andrey Grodzovsky 2021-03-12 9:03 ` Christian König 0 siblings, 1 reply; 15+ messages in thread From: Andrey Grodzovsky @ 2021-03-11 22:40 UTC (permalink / raw) To: Sergei Miroshnichenko Cc: Alexander.Deucher, Christian.Koenig, anatoli.antonovitch, helgaas, linux-pci, linux On 2021-03-05 6:13 p.m., Andrey Grodzovsky wrote: > > > On 2021-03-05 2:12 p.m., Sergei Miroshnichenko wrote: >> On Fri, 2021-03-05 at 12:13 -0500, Andrey Grodzovsky wrote: >>> >>> On 2021-03-05 11:08 a.m., Sergei Miroshnichenko wrote: >>>> On Thu, 2021-03-04 at 14:49 -0500, Andrey Grodzovsky wrote: >>>>> + linux-pci >>>>> >>>>> On 2021-02-26 1:44 a.m., Sergei Miroshnichenko wrote: >>>>>> On Thu, 2021-02-25 at 13:28 -0500, Andrey Grodzovsky wrote: >>>>>>> On 2021-02-25 2:00 a.m., Sergei Miroshnichenko wrote: >>>>>>>> On Wed, 2021-02-24 at 17:51 -0500, Andrey Grodzovsky wrote: >>>>>>>>> On 2021-02-24 1:23 p.m., Sergei Miroshnichenko wrote: >>>>>>>>>> ... >>>>>>>>> Are you saying that even without hot-plugging, while both >>>>>>>>> nvme >>>>>>>>> and >>>>>>>>> AMD >>>>>>>>> card are present >>>>>>>>> right from boot, you still get BARs moving and MMIO >>>>>>>>> ranges >>>>>>>>> reassigned >>>>>>>>> for NVME BARs >>>>>>>>> just because amdgpu driver will start resize of AMD card >>>>>>>>> BARs >>>>>>>>> and >>>>>>>>> this >>>>>>>>> will trigger NVMEs BARs move to >>>>>>>>> allow AMD card BARs to cover full range of VIDEO RAM ? >>>>>>>> Yes. Unconditionally, because it is unknown beforehand if >>>>>>>> NVMe's >>>>>>>> BAR >>>>>>>> movement will help. In this particular case BAR movement is >>>>>>>> not >>>>>>>> needed, >>>>>>>> but is done anyway. >>>>>>>> >>>>>>>> BARs are not moved one by one, but the kernel releases all >>>>>>>> the >>>>>>>> releasable ones, and then recalculates a new BAR layout to >>>>>>>> fit >>>>>>>> them >>>>>>>> all. Kernel's algorithm is different from BIOS's, so NVME >>>>>>>> has >>>>>>>> appeared >>>>>>>> at a new place. >>>>>>>> >>>>>>>> This is triggered by following: >>>>>>>> - at boot, if BIOS had assigned not every BAR; >>>>>>>> - during pci_resize_resource(); >>>>>>>> - during pci_rescan_bus() -- after a pciehp event or a >>>>>>>> manual >>>>>>>> via >>>>>>>> sysfs. >>>>>>> >>>>>>> By manual via sysfs you mean something like this - 'echo 1 > >>>>>>> /sys/bus/pci/drivers/amdgpu/0000\:0c\:00.0/remove && echo 1 > >>>>>>> /sys/bus/pci/rescan ' ? I am looking into how most reliably >>>>>>> trigger >>>>>>> PCI >>>>>>> code to call my callbacks even without having external PCI >>>>>>> cage >>>>>>> for >>>>>>> GPU >>>>>>> (will take me some time to get it). >>>>>> >>>>>> Yeah, this is our way to go when a device can't be physically >>>>>> removed >>>>>> or unpowered remotely. With just a bit shorter path: >>>>>> >>>>>> sudo sh -c 'echo 1 > >>>>>> /sys/bus/pci/devices/0000\:0c\:00.0/remove' >>>>>> sudo sh -c 'echo 1 > /sys/bus/pci/rescan' >>>>>> >>>>>> Or, just a second command (rescan) is enough: a BAR movement >>>>>> attempt >>>>>> will be triggered even if there were no changes in PCI >>>>>> topology. >>>>>> >>>>>> Serge >>>>>> >>>>> >>>>> Hi Segrei >>>>> >>>>> Here is a link to initial implementation on top of your tree >>>>> (movable_bars_v9.1) - >>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Dyadro%2Fpcie_hotplug%2Fmovable_bars_v9.1%26id%3D05d6abceed650181bb7fe0a49884a26e378b908e&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C6658f0cc7c344791ce0f08d8e00a96bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637505683386334114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qEC3qIAM8h1vU4gGEgT6sThXsaCuatTI2UjM9Bb8KGI%3D&reserved=0 >>>>> >>>>> I am able to pass one re-scan cycle and can use the card >>>>> afterwards >>>>> (see >>>>> log1.log). >>>>> But, according to your prints only BAR5 which is registers BAR >>>>> was >>>>> updated (amdgpu 0000:0b:00.0: BAR 5 updated: 0xfcc00000 -> >>>>> 0xfc100000) >>>>> while I am interested to test BAR0 (Graphic RAM) move since this >>>>> is >>>>> where most of the complexity is. Is there a way to hack your code >>>>> to >>>>> force this ? >>>> >>>> Hi Andrey, >>>> >>>> Regarding the amdgpu's BAR0 remaining on its place: it seems this >>>> is >>>> because of fixed BARs starting from fc600000. The kernel tends to >>>> group >>>> the BARs close to each other, making a bridge window as compact as >>>> possible. So the BAR0 had occupied the closest "comfortable" slots >>>> 0xe0000000-0xefffffff, with the resulting bridge window of bus 00 >>>> covering all the BARs: >>>> >>>> pci_bus 0000:00: resource 10 [mem 0xe0000000-0xfec2ffff >>>> window] >>>> >>>> I'll let you know if I get an idea how to rearrange that manually. >>>> >>>> Two GPUs can actually swap their places. >>> >>> What do you mean ? >> >> I was thinking: when the scenario of a PCI rescan with two GPUs (as was >> described below) will start working, BAR0 of GPU0 can take place of >> BAR0 of GPU1 after the first rescan. >> >>>> What also can make a BAR movable -- is rmmod'ing its driver. It >>>> could >>>> be some hack from within a tmux, like: >>>> >>>> rmmod igb; \ >>>> rmmod xhci_hcd; \ >>>> rmmod ahci; \ >>>> echo 1 > /sys/bus/pci/rescan; \ >>>> modprobe igb; \ >>>> modprobe xhci_hcd; \ >>>> modprobe ahci >>> >>> But should I also rmmod amdgpu ? Or modprobing back the other >>> drivers >>> should cause (hopefully) BAR0 move in AMD graphic card ? >> >> You have already made the amdgpu movable, so no need to rmmod it -- >> just those with fixed BARs: >> >> xhci_hcd 0000:0c:00.3: BAR 0: assigned fixed [mem 0xfc600000- >> 0xfc6fffff 64bit] >> igb 0000:07:00.0: BAR 0: assigned fixed [mem 0xfc900000-0xfc91ffff] >> igb 0000:07:00.0: BAR 3: assigned fixed [mem 0xfc920000-0xfc923fff] >> ahci 0000:02:00.1: BAR 6: assigned fixed [mem 0xfcb00000-0xfcb7ffff >> pref] >> ahci 0000:02:00.1: BAR 5: assigned fixed [mem 0xfcb80000- >> 0xfcb9ffff] >> xhci_hcd 0000:02:00.0: BAR 0: assigned fixed [mem 0xfcba0000- >> 0xfcba7fff 64bit] >> xhci_hcd 0000:05:00.0: BAR 0: assigned fixed [mem 0xfca00000- >> 0xfca07fff 64bit] >> ahci 0000:0d:00.2: BAR 5: assigned fixed [mem 0xfce08000- >> 0xfce08fff] >> >> The expected result is they all move closer to the start of PCI address >> space. >> > > Ok, I updated as you described. Also I removed PCI conf command to stop > address decoding and restart later as I noticed PCI core does it itself > when needed. > I tested now also with graphic desktop enabled while submitting > 3d draw commands and seems like under this scenario everything still > works. Again, this all needs to be tested with VRAM BAR move as then > I believe I will see more issues like handling of MMIO mapped VRAM > objects (like GART table). In case you do have an AMD card you could > also maybe give it a try. In the meanwhile I will add support to > ioremapping of those VRAM objects. > > Andrey Just an update, added support for unmaping/remapping of all VRAM objects, both user space mmaped and kernel ioremaped. Seems to work ok but again, without forcing VRAM BAR to move I can't be sure. Alex, Chsristian - take a look when you have some time to give me some initial feedback on the amdgpu side. The code is at https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1 Andrey > >>>> I think pci_release_resource() should not be in >>>> amdgpu_device_unmap_mmio() -- the patched kernel will do that >>>> itself >>>> for BARs the amdgpu_device_bar_fixed() returns false. Even more -- >>>> the >>>> kernel will ensure that all BARs which were working before, are >>>> reassigned properly, so it needs them to be assigned before the >>>> procedure. >>>> The same for pci_assign_unassigned_bus_resources() in >>>> amdgpu_device_remap_mmio(): this callback is invoked from >>>> pci_rescan_bus() after pci_assign_unassigned_root_bus_resources(). >>> >>> This seems to me in contrast to your documentation (see >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FYADRO-KNS%2Flinux%2Fcommit%2F5bc12ba7c74f1c19c11db29b4807bd32acfac2c2&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C6658f0cc7c344791ce0f08d8e00a96bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637505683386334114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=kO1OlRL8iHMTcijuV0jDpODCtXpCCTpJv6YIn%2FuypNQ%3D&reserved=0 >>> >>> step 1) although step 2 seems also to contradict step 1 with regard >>> to >>> BARs release - so now I am a bit confused. Also looking at >>> nvme_dev_unmap - it calls pci_release_mem_regions. Symmetrical >>> acquisition happens in nvme_dev_unmap. >> >> Ah, there is a difference between pci_release_region() and >> pci_release_resource(), so subtle that I had to refresh my memory. You >> are right, this has to be explained in the documentation! >> >> $ sudo cat /proc/iomem >> ... >> f0000000-fcffffff : PCI Bus 0000:00 -- root bus resource >> ... >> fcf00000-fcffffff : PCI Bus 0000:01 -- bridge window >> fcf00000-fcf03fff : 0000:01:00.0 -- pci resource (BAR) >> fcf00000-fcf03fff : nvme -- pci region (reserved by >> a driver, has its name). >> >> So the nvme_dev_unmap() reflects with pci_release_region() that the BAR >> is not used by the driver anymore -- this actually should be called in >> every rescan_prepare(). >> >> But the pci_release_resource() tells to the PCI subsystem that the BAR >> is "released" from the device and has to be assigned to some address >> before using again, and makes the pci_resource_start(pdev, >> relased_barno) invalid. >> >> Why the quotes: pci_release_resource() doesn't turn off the BAR, >> doesn't write the registers -- this happens later. >> >> I thouht at first that pci_release_resource() is not safe in a >> rescan_prepare(), but then double-checked, and found it's fine, just >> not needed, as the kernel will do it anyway. And the >> pci_bus_check_bars_assigned() to compare the bitmasks of successfully >> assigned BARs is called *before* the hook. >> >>>>> When testing with 2 graphic cards and triggering rescan, hard >>>>> hang of >>>>> the system happens during rescan_prepare of the second card when >>>>> stopping the HW (see log2.log) - I don't understand why this >>>>> would >>>>> happen as each of them passes fine when they are standalone >>>>> tested >>>>> and >>>>> there should be no interdependence between them as far as i know. >>>>> Do you have any idea ? >>>> >>>> What happens with two GPUs is unclear for me as well, nothing looks >>>> suspicious. >>>> >>>> Serge >>>> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-11 22:40 ` Andrey Grodzovsky @ 2021-03-12 9:03 ` Christian König 2021-03-12 15:34 ` Andrey Grodzovsky 0 siblings, 1 reply; 15+ messages in thread From: Christian König @ 2021-03-12 9:03 UTC (permalink / raw) To: Andrey Grodzovsky, Sergei Miroshnichenko Cc: Alexander.Deucher, anatoli.antonovitch, helgaas, linux-pci, linux Am 11.03.21 um 23:40 schrieb Andrey Grodzovsky: > [SNIP] >>> The expected result is they all move closer to the start of PCI address >>> space. >>> >> >> Ok, I updated as you described. Also I removed PCI conf command to stop >> address decoding and restart later as I noticed PCI core does it itself >> when needed. >> I tested now also with graphic desktop enabled while submitting >> 3d draw commands and seems like under this scenario everything still >> works. Again, this all needs to be tested with VRAM BAR move as then >> I believe I will see more issues like handling of MMIO mapped VRAM >> objects (like GART table). In case you do have an AMD card you could >> also maybe give it a try. In the meanwhile I will add support to >> ioremapping of those VRAM objects. >> >> Andrey > > Just an update, added support for unmaping/remapping of all VRAM > objects, both user space mmaped and kernel ioremaped. Seems to work > ok but again, without forcing VRAM BAR to move I can't be sure. > Alex, Chsristian - take a look when you have some time to give me some > initial feedback on the amdgpu side. > > The code is at > https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1 Mhm, that let's userspace busy retry until the BAR movement is done. Not sure if that can't live lock somehow. Christian. > > Andrey ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-12 9:03 ` Christian König @ 2021-03-12 15:34 ` Andrey Grodzovsky 2021-03-15 16:00 ` Andrey Grodzovsky 2021-03-15 16:10 ` Christian König 0 siblings, 2 replies; 15+ messages in thread From: Andrey Grodzovsky @ 2021-03-12 15:34 UTC (permalink / raw) To: Christian König, Sergei Miroshnichenko Cc: Alexander.Deucher, anatoli.antonovitch, helgaas, linux-pci, linux On 2021-03-12 4:03 a.m., Christian König wrote: > Am 11.03.21 um 23:40 schrieb Andrey Grodzovsky: >> [SNIP] >>>> The expected result is they all move closer to the start of PCI address >>>> space. >>>> >>> >>> Ok, I updated as you described. Also I removed PCI conf command to stop >>> address decoding and restart later as I noticed PCI core does it itself >>> when needed. >>> I tested now also with graphic desktop enabled while submitting >>> 3d draw commands and seems like under this scenario everything still >>> works. Again, this all needs to be tested with VRAM BAR move as then >>> I believe I will see more issues like handling of MMIO mapped VRAM >>> objects (like GART table). In case you do have an AMD card you could >>> also maybe give it a try. In the meanwhile I will add support to >>> ioremapping of those VRAM objects. >>> >>> Andrey >> >> Just an update, added support for unmaping/remapping of all VRAM >> objects, both user space mmaped and kernel ioremaped. Seems to work >> ok but again, without forcing VRAM BAR to move I can't be sure. >> Alex, Chsristian - take a look when you have some time to give me some >> initial feedback on the amdgpu side. >> >> The code is at >> https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1 >> > > Mhm, that let's userspace busy retry until the BAR movement is done. > > Not sure if that can't live lock somehow. > > Christian. In my testing it didn't but, I can instead route them to some global static dummy page while BARs are moving and then when everything done just invalidate the device address space again and let the pagefaults fill in valid PFNs again. Andrey > >> >> Andrey > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-12 15:34 ` Andrey Grodzovsky @ 2021-03-15 16:00 ` Andrey Grodzovsky 2021-03-15 16:10 ` Christian König 1 sibling, 0 replies; 15+ messages in thread From: Andrey Grodzovsky @ 2021-03-15 16:00 UTC (permalink / raw) To: Christian König, Sergei Miroshnichenko Cc: Alexander.Deucher, anatoli.antonovitch, helgaas, linux-pci, linux Ping Andrey On 2021-03-12 10:34 a.m., Andrey Grodzovsky wrote: > > > On 2021-03-12 4:03 a.m., Christian König wrote: >> Am 11.03.21 um 23:40 schrieb Andrey Grodzovsky: >>> [SNIP] >>>>> The expected result is they all move closer to the start of PCI >>>>> address >>>>> space. >>>>> >>>> >>>> Ok, I updated as you described. Also I removed PCI conf command to >>>> stop >>>> address decoding and restart later as I noticed PCI core does it >>>> itself >>>> when needed. >>>> I tested now also with graphic desktop enabled while submitting >>>> 3d draw commands and seems like under this scenario everything still >>>> works. Again, this all needs to be tested with VRAM BAR move as then >>>> I believe I will see more issues like handling of MMIO mapped VRAM >>>> objects (like GART table). In case you do have an AMD card you >>>> could also maybe give it a try. In the meanwhile I will add support >>>> to ioremapping of those VRAM objects. >>>> >>>> Andrey >>> >>> Just an update, added support for unmaping/remapping of all VRAM >>> objects, both user space mmaped and kernel ioremaped. Seems to work >>> ok but again, without forcing VRAM BAR to move I can't be sure. >>> Alex, Chsristian - take a look when you have some time to give me some >>> initial feedback on the amdgpu side. >>> >>> The code is at >>> https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1 >>> >> >> Mhm, that let's userspace busy retry until the BAR movement is done. >> >> Not sure if that can't live lock somehow. >> >> Christian. > > In my testing it didn't but, I can instead route them to some > global static dummy page while BARs are moving and then when everything > done just invalidate the device address space again and let the > pagefaults fill in valid PFNs again. > > Andrey > >> >>> >>> Andrey >> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-12 15:34 ` Andrey Grodzovsky 2021-03-15 16:00 ` Andrey Grodzovsky @ 2021-03-15 16:10 ` Christian König 2021-03-15 16:21 ` Andrey Grodzovsky 1 sibling, 1 reply; 15+ messages in thread From: Christian König @ 2021-03-15 16:10 UTC (permalink / raw) To: Andrey Grodzovsky, Sergei Miroshnichenko Cc: Alexander.Deucher, anatoli.antonovitch, helgaas, linux-pci, linux Am 12.03.21 um 16:34 schrieb Andrey Grodzovsky: > > > On 2021-03-12 4:03 a.m., Christian König wrote: >> Am 11.03.21 um 23:40 schrieb Andrey Grodzovsky: >>> [SNIP] >>>>> The expected result is they all move closer to the start of PCI >>>>> address >>>>> space. >>>>> >>>> >>>> Ok, I updated as you described. Also I removed PCI conf command to >>>> stop >>>> address decoding and restart later as I noticed PCI core does it >>>> itself >>>> when needed. >>>> I tested now also with graphic desktop enabled while submitting >>>> 3d draw commands and seems like under this scenario everything still >>>> works. Again, this all needs to be tested with VRAM BAR move as then >>>> I believe I will see more issues like handling of MMIO mapped VRAM >>>> objects (like GART table). In case you do have an AMD card you >>>> could also maybe give it a try. In the meanwhile I will add support >>>> to ioremapping of those VRAM objects. >>>> >>>> Andrey >>> >>> Just an update, added support for unmaping/remapping of all VRAM >>> objects, both user space mmaped and kernel ioremaped. Seems to work >>> ok but again, without forcing VRAM BAR to move I can't be sure. >>> Alex, Chsristian - take a look when you have some time to give me some >>> initial feedback on the amdgpu side. >>> >>> The code is at >>> https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1 >>> >> >> Mhm, that let's userspace busy retry until the BAR movement is done. >> >> Not sure if that can't live lock somehow. >> >> Christian. > > In my testing it didn't but, I can instead route them to some > global static dummy page while BARs are moving and then when everything > done just invalidate the device address space again and let the > pagefaults fill in valid PFNs again. Well that won't work because the reads/writes which are done in the meantime do need to wait for the BAR to be available again. So waiting for the BAR move to finish is correct, but what we should do is to use a lock instead of an SRCU because that makes lockdep complain when we do something nasty. Christian. > > Andrey > >> >>> >>> Andrey >> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-15 16:10 ` Christian König @ 2021-03-15 16:21 ` Andrey Grodzovsky 2021-03-15 16:55 ` Christian König 0 siblings, 1 reply; 15+ messages in thread From: Andrey Grodzovsky @ 2021-03-15 16:21 UTC (permalink / raw) To: Christian König, Sergei Miroshnichenko Cc: Alexander.Deucher, anatoli.antonovitch, helgaas, linux-pci, linux On 2021-03-15 12:10 p.m., Christian König wrote: > Am 12.03.21 um 16:34 schrieb Andrey Grodzovsky: >> >> >> On 2021-03-12 4:03 a.m., Christian König wrote: >>> Am 11.03.21 um 23:40 schrieb Andrey Grodzovsky: >>>> [SNIP] >>>>>> The expected result is they all move closer to the start of PCI >>>>>> address >>>>>> space. >>>>>> >>>>> >>>>> Ok, I updated as you described. Also I removed PCI conf command to >>>>> stop >>>>> address decoding and restart later as I noticed PCI core does it >>>>> itself >>>>> when needed. >>>>> I tested now also with graphic desktop enabled while submitting >>>>> 3d draw commands and seems like under this scenario everything still >>>>> works. Again, this all needs to be tested with VRAM BAR move as then >>>>> I believe I will see more issues like handling of MMIO mapped VRAM >>>>> objects (like GART table). In case you do have an AMD card you >>>>> could also maybe give it a try. In the meanwhile I will add >>>>> support to ioremapping of those VRAM objects. >>>>> >>>>> Andrey >>>> >>>> Just an update, added support for unmaping/remapping of all VRAM >>>> objects, both user space mmaped and kernel ioremaped. Seems to work >>>> ok but again, without forcing VRAM BAR to move I can't be sure. >>>> Alex, Chsristian - take a look when you have some time to give me some >>>> initial feedback on the amdgpu side. >>>> >>>> The code is at >>>> https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1 >>>> >>> >>> Mhm, that let's userspace busy retry until the BAR movement is done. >>> >>> Not sure if that can't live lock somehow. >>> >>> Christian. >> >> In my testing it didn't but, I can instead route them to some >> global static dummy page while BARs are moving and then when everything >> done just invalidate the device address space again and let the >> pagefaults fill in valid PFNs again. > > Well that won't work because the reads/writes which are done in the > meantime do need to wait for the BAR to be available again. > > So waiting for the BAR move to finish is correct, but what we should > do is to use a lock instead of an SRCU because that makes lockdep > complain when we do something nasty. > > Christian. Spinlock I assume ? We can't sleep there - it's an interrupt. Andrey > >> >> Andrey >> >>> >>>> >>>> Andrey >>> > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-15 16:21 ` Andrey Grodzovsky @ 2021-03-15 16:55 ` Christian König 2021-03-15 17:11 ` Andrey Grodzovsky 0 siblings, 1 reply; 15+ messages in thread From: Christian König @ 2021-03-15 16:55 UTC (permalink / raw) To: Andrey Grodzovsky, Sergei Miroshnichenko Cc: Alexander.Deucher, anatoli.antonovitch, helgaas, linux-pci, linux Am 15.03.21 um 17:21 schrieb Andrey Grodzovsky: > > On 2021-03-15 12:10 p.m., Christian König wrote: >> Am 12.03.21 um 16:34 schrieb Andrey Grodzovsky: >>> >>> >>> On 2021-03-12 4:03 a.m., Christian König wrote: >>>> Am 11.03.21 um 23:40 schrieb Andrey Grodzovsky: >>>>> [SNIP] >>>>>>> The expected result is they all move closer to the start of PCI >>>>>>> address >>>>>>> space. >>>>>>> >>>>>> >>>>>> Ok, I updated as you described. Also I removed PCI conf command >>>>>> to stop >>>>>> address decoding and restart later as I noticed PCI core does it >>>>>> itself >>>>>> when needed. >>>>>> I tested now also with graphic desktop enabled while submitting >>>>>> 3d draw commands and seems like under this scenario everything still >>>>>> works. Again, this all needs to be tested with VRAM BAR move as then >>>>>> I believe I will see more issues like handling of MMIO mapped >>>>>> VRAM objects (like GART table). In case you do have an AMD card >>>>>> you could also maybe give it a try. In the meanwhile I will add >>>>>> support to ioremapping of those VRAM objects. >>>>>> >>>>>> Andrey >>>>> >>>>> Just an update, added support for unmaping/remapping of all VRAM >>>>> objects, both user space mmaped and kernel ioremaped. Seems to work >>>>> ok but again, without forcing VRAM BAR to move I can't be sure. >>>>> Alex, Chsristian - take a look when you have some time to give me >>>>> some >>>>> initial feedback on the amdgpu side. >>>>> >>>>> The code is at >>>>> https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1 >>>>> >>>> >>>> Mhm, that let's userspace busy retry until the BAR movement is done. >>>> >>>> Not sure if that can't live lock somehow. >>>> >>>> Christian. >>> >>> In my testing it didn't but, I can instead route them to some >>> global static dummy page while BARs are moving and then when everything >>> done just invalidate the device address space again and let the >>> pagefaults fill in valid PFNs again. >> >> Well that won't work because the reads/writes which are done in the >> meantime do need to wait for the BAR to be available again. >> >> So waiting for the BAR move to finish is correct, but what we should >> do is to use a lock instead of an SRCU because that makes lockdep >> complain when we do something nasty. >> >> Christian. > > > Spinlock I assume ? We can't sleep there - it's an interrupt. Mhm, the BAR movement is in interrupt context? Well that is rather bad. I was hoping to rename the GPU reset rw_sem into device_access rw_sem and then use the same lock for both (It's essentially the same problem). But when we need to move the BAR in atomic/interrupt context that makes things a bit more complicated. Christian. > > Andrey > > >> >>> >>> Andrey >>> >>>> >>>>> >>>>> Andrey >>>> >> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-15 16:55 ` Christian König @ 2021-03-15 17:11 ` Andrey Grodzovsky 2021-03-16 16:17 ` Christian König 0 siblings, 1 reply; 15+ messages in thread From: Andrey Grodzovsky @ 2021-03-15 17:11 UTC (permalink / raw) To: Christian König, Sergei Miroshnichenko Cc: Alexander.Deucher, anatoli.antonovitch, helgaas, linux-pci, linux On 2021-03-15 12:55 p.m., Christian König wrote: > > > Am 15.03.21 um 17:21 schrieb Andrey Grodzovsky: >> >> On 2021-03-15 12:10 p.m., Christian König wrote: >>> Am 12.03.21 um 16:34 schrieb Andrey Grodzovsky: >>>> >>>> >>>> On 2021-03-12 4:03 a.m., Christian König wrote: >>>>> Am 11.03.21 um 23:40 schrieb Andrey Grodzovsky: >>>>>> [SNIP] >>>>>>>> The expected result is they all move closer to the start of PCI >>>>>>>> address >>>>>>>> space. >>>>>>>> >>>>>>> >>>>>>> Ok, I updated as you described. Also I removed PCI conf command >>>>>>> to stop >>>>>>> address decoding and restart later as I noticed PCI core does it >>>>>>> itself >>>>>>> when needed. >>>>>>> I tested now also with graphic desktop enabled while submitting >>>>>>> 3d draw commands and seems like under this scenario everything >>>>>>> still >>>>>>> works. Again, this all needs to be tested with VRAM BAR move as >>>>>>> then >>>>>>> I believe I will see more issues like handling of MMIO mapped >>>>>>> VRAM objects (like GART table). In case you do have an AMD card >>>>>>> you could also maybe give it a try. In the meanwhile I will add >>>>>>> support to ioremapping of those VRAM objects. >>>>>>> >>>>>>> Andrey >>>>>> >>>>>> Just an update, added support for unmaping/remapping of all VRAM >>>>>> objects, both user space mmaped and kernel ioremaped. Seems to work >>>>>> ok but again, without forcing VRAM BAR to move I can't be sure. >>>>>> Alex, Chsristian - take a look when you have some time to give me >>>>>> some >>>>>> initial feedback on the amdgpu side. >>>>>> >>>>>> The code is at >>>>>> https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1 >>>>>> >>>>> >>>>> Mhm, that let's userspace busy retry until the BAR movement is done. >>>>> >>>>> Not sure if that can't live lock somehow. >>>>> >>>>> Christian. >>>> >>>> In my testing it didn't but, I can instead route them to some >>>> global static dummy page while BARs are moving and then when >>>> everything >>>> done just invalidate the device address space again and let the >>>> pagefaults fill in valid PFNs again. >>> >>> Well that won't work because the reads/writes which are done in the >>> meantime do need to wait for the BAR to be available again. >>> >>> So waiting for the BAR move to finish is correct, but what we should >>> do is to use a lock instead of an SRCU because that makes lockdep >>> complain when we do something nasty. >>> >>> Christian. >> >> >> Spinlock I assume ? We can't sleep there - it's an interrupt. > > Mhm, the BAR movement is in interrupt context? No, BARs move is in task context I believe. The page faults are in interrupt context and so we can only lock a spinlock there I assume, not a mutex which might sleep. But we can't lock spinlock for the entire BAR move because HW suspend + asic reset is a long process with some sleeps/context switches inside it probably. > > Well that is rather bad. I was hoping to rename the GPU reset rw_sem > into device_access rw_sem and then use the same lock for both (It's > essentially the same problem). I was thinking about it from day 1 but what looked to me different is that in GPU reset case there is no technical need to block MMIO accesses as the BARs are not moving and so the page table entries remain valid. It's true that while the device in reset those MMIO accesses are meaninglessness - so this indeed could be good reason to block access even during GPU reset. Andrey > > But when we need to move the BAR in atomic/interrupt context that > makes things a bit more complicated. > > Christian. > >> >> Andrey >> >> >>> >>>> >>>> Andrey >>>> >>>>> >>>>>> >>>>>> Andrey >>>>> >>> > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-15 17:11 ` Andrey Grodzovsky @ 2021-03-16 16:17 ` Christian König 2021-03-16 17:39 ` Andrey Grodzovsky 0 siblings, 1 reply; 15+ messages in thread From: Christian König @ 2021-03-16 16:17 UTC (permalink / raw) To: Andrey Grodzovsky, Sergei Miroshnichenko Cc: Alexander.Deucher, anatoli.antonovitch, helgaas, linux-pci, linux Am 15.03.21 um 18:11 schrieb Andrey Grodzovsky: > > On 2021-03-15 12:55 p.m., Christian König wrote: >> >> >> Am 15.03.21 um 17:21 schrieb Andrey Grodzovsky: >>> >>> On 2021-03-15 12:10 p.m., Christian König wrote: >>>> Am 12.03.21 um 16:34 schrieb Andrey Grodzovsky: >>>>> >>>>> >>>>> On 2021-03-12 4:03 a.m., Christian König wrote: >>>>>> Am 11.03.21 um 23:40 schrieb Andrey Grodzovsky: >>>>>>> [SNIP] >>>>>>>>> The expected result is they all move closer to the start of >>>>>>>>> PCI address >>>>>>>>> space. >>>>>>>>> >>>>>>>> >>>>>>>> Ok, I updated as you described. Also I removed PCI conf command >>>>>>>> to stop >>>>>>>> address decoding and restart later as I noticed PCI core does >>>>>>>> it itself >>>>>>>> when needed. >>>>>>>> I tested now also with graphic desktop enabled while submitting >>>>>>>> 3d draw commands and seems like under this scenario everything >>>>>>>> still >>>>>>>> works. Again, this all needs to be tested with VRAM BAR move as >>>>>>>> then >>>>>>>> I believe I will see more issues like handling of MMIO mapped >>>>>>>> VRAM objects (like GART table). In case you do have an AMD card >>>>>>>> you could also maybe give it a try. In the meanwhile I will add >>>>>>>> support to ioremapping of those VRAM objects. >>>>>>>> >>>>>>>> Andrey >>>>>>> >>>>>>> Just an update, added support for unmaping/remapping of all VRAM >>>>>>> objects, both user space mmaped and kernel ioremaped. Seems to work >>>>>>> ok but again, without forcing VRAM BAR to move I can't be sure. >>>>>>> Alex, Chsristian - take a look when you have some time to give >>>>>>> me some >>>>>>> initial feedback on the amdgpu side. >>>>>>> >>>>>>> The code is at >>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1 >>>>>>> >>>>>> >>>>>> Mhm, that let's userspace busy retry until the BAR movement is done. >>>>>> >>>>>> Not sure if that can't live lock somehow. >>>>>> >>>>>> Christian. >>>>> >>>>> In my testing it didn't but, I can instead route them to some >>>>> global static dummy page while BARs are moving and then when >>>>> everything >>>>> done just invalidate the device address space again and let the >>>>> pagefaults fill in valid PFNs again. >>>> >>>> Well that won't work because the reads/writes which are done in the >>>> meantime do need to wait for the BAR to be available again. >>>> >>>> So waiting for the BAR move to finish is correct, but what we >>>> should do is to use a lock instead of an SRCU because that makes >>>> lockdep complain when we do something nasty. >>>> >>>> Christian. >>> >>> >>> Spinlock I assume ? We can't sleep there - it's an interrupt. >> >> Mhm, the BAR movement is in interrupt context? > > > No, BARs move is in task context I believe. The page faults are in > interrupt context and so we can only lock a spinlock there I assume, No, page faults are in task context as well! Otherwise you wouldn't be able to sleep for network I/O in a page fault for example. > not a mutex which might sleep. But we can't lock > spinlock for the entire BAR move because HW suspend + asic reset is a > long process with some sleeps/context switches inside it probably. > > >> >> Well that is rather bad. I was hoping to rename the GPU reset rw_sem >> into device_access rw_sem and then use the same lock for both (It's >> essentially the same problem). > > > I was thinking about it from day 1 but what looked to me different is > that in GPU reset case there is no technical need to block MMIO > accesses as the BARs are not moving > and so the page table entries remain valid. It's true that while the > device in reset those MMIO accesses are meaninglessness - so this > indeed could be good reason to block > access even during GPU reset. From the experience now I would say that we should block MMIO access during GPU reset as necessary. We can't do things like always taking the lock in each IOCTL, but for low level hardware access it shouldn't be a problem at all. Christian. > > Andrey > > >> >> But when we need to move the BAR in atomic/interrupt context that >> makes things a bit more complicated. >> >> Christian. >> >>> >>> Andrey >>> >>> >>>> >>>>> >>>>> Andrey >>>>> >>>>>> >>>>>>> >>>>>>> Andrey >>>>>> >>>> >> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Question about supporting AMD eGPU hot plug case 2021-03-16 16:17 ` Christian König @ 2021-03-16 17:39 ` Andrey Grodzovsky 0 siblings, 0 replies; 15+ messages in thread From: Andrey Grodzovsky @ 2021-03-16 17:39 UTC (permalink / raw) To: Christian König, Sergei Miroshnichenko Cc: Alexander.Deucher, anatoli.antonovitch, helgaas, linux-pci, linux On 2021-03-16 12:17 p.m., Christian König wrote: > Am 15.03.21 um 18:11 schrieb Andrey Grodzovsky: >> >> On 2021-03-15 12:55 p.m., Christian König wrote: >>> >>> >>> Am 15.03.21 um 17:21 schrieb Andrey Grodzovsky: >>>> >>>> On 2021-03-15 12:10 p.m., Christian König wrote: >>>>> Am 12.03.21 um 16:34 schrieb Andrey Grodzovsky: >>>>>> >>>>>> >>>>>> On 2021-03-12 4:03 a.m., Christian König wrote: >>>>>>> Am 11.03.21 um 23:40 schrieb Andrey Grodzovsky: >>>>>>>> [SNIP] >>>>>>>>>> The expected result is they all move closer to the start of >>>>>>>>>> PCI address >>>>>>>>>> space. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Ok, I updated as you described. Also I removed PCI conf >>>>>>>>> command to stop >>>>>>>>> address decoding and restart later as I noticed PCI core does >>>>>>>>> it itself >>>>>>>>> when needed. >>>>>>>>> I tested now also with graphic desktop enabled while submitting >>>>>>>>> 3d draw commands and seems like under this scenario everything >>>>>>>>> still >>>>>>>>> works. Again, this all needs to be tested with VRAM BAR move >>>>>>>>> as then >>>>>>>>> I believe I will see more issues like handling of MMIO mapped >>>>>>>>> VRAM objects (like GART table). In case you do have an AMD >>>>>>>>> card you could also maybe give it a try. In the meanwhile I >>>>>>>>> will add support to ioremapping of those VRAM objects. >>>>>>>>> >>>>>>>>> Andrey >>>>>>>> >>>>>>>> Just an update, added support for unmaping/remapping of all VRAM >>>>>>>> objects, both user space mmaped and kernel ioremaped. Seems to >>>>>>>> work >>>>>>>> ok but again, without forcing VRAM BAR to move I can't be sure. >>>>>>>> Alex, Chsristian - take a look when you have some time to give >>>>>>>> me some >>>>>>>> initial feedback on the amdgpu side. >>>>>>>> >>>>>>>> The code is at >>>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1 >>>>>>>> >>>>>>> >>>>>>> Mhm, that let's userspace busy retry until the BAR movement is >>>>>>> done. >>>>>>> >>>>>>> Not sure if that can't live lock somehow. >>>>>>> >>>>>>> Christian. >>>>>> >>>>>> In my testing it didn't but, I can instead route them to some >>>>>> global static dummy page while BARs are moving and then when >>>>>> everything >>>>>> done just invalidate the device address space again and let the >>>>>> pagefaults fill in valid PFNs again. >>>>> >>>>> Well that won't work because the reads/writes which are done in >>>>> the meantime do need to wait for the BAR to be available again. >>>>> >>>>> So waiting for the BAR move to finish is correct, but what we >>>>> should do is to use a lock instead of an SRCU because that makes >>>>> lockdep complain when we do something nasty. >>>>> >>>>> Christian. >>>> >>>> >>>> Spinlock I assume ? We can't sleep there - it's an interrupt. >>> >>> Mhm, the BAR movement is in interrupt context? >> >> >> No, BARs move is in task context I believe. The page faults are in >> interrupt context and so we can only lock a spinlock there I assume, > > No, page faults are in task context as well! Otherwise you wouldn't be > able to sleep for network I/O in a page fault for example. Ok, that was a long standing confusion on my side, especially because 'Understanding the Linux Kernel' states that do_page_fault is an interrupt handler here - https://vistech.net/~champ/online-docs/books/linuxkernel2/060.htm while in fact this is an exception handler which is ran in the context of the user process causing it and hence can sleep ( as explained here by Rober Love himself) https://www.spinics.net/lists/newbies/msg07287.html > >> not a mutex which might sleep. But we can't lock >> spinlock for the entire BAR move because HW suspend + asic reset is a >> long process with some sleeps/context switches inside it probably. >> >> >>> >>> Well that is rather bad. I was hoping to rename the GPU reset rw_sem >>> into device_access rw_sem and then use the same lock for both (It's >>> essentially the same problem). >> >> >> I was thinking about it from day 1 but what looked to me different is >> that in GPU reset case there is no technical need to block MMIO >> accesses as the BARs are not moving >> and so the page table entries remain valid. It's true that while the >> device in reset those MMIO accesses are meaninglessness - so this >> indeed could be good reason to block >> access even during GPU reset. > > From the experience now I would say that we should block MMIO access > during GPU reset as necessary. > > We can't do things like always taking the lock in each IOCTL, but for > low level hardware access it shouldn't be a problem at all. > > Christian. I will update the code then to reuse our adev->reset_sem for this locking. Andrey > >> >> Andrey >> >> >>> >>> But when we need to move the BAR in atomic/interrupt context that >>> makes things a bit more complicated. >>> >>> Christian. >>> >>>> >>>> Andrey >>>> >>>> >>>>> >>>>>> >>>>>> Andrey >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Andrey >>>>>>> >>>>> >>> > ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2021-03-16 17:40 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <ddef2da4-4726-7321-40fe-5f90788cc836@amd.com> [not found] ` <MN2PR12MB44880052F5C4ECF3EF22B58EF7B69@MN2PR12MB4488.namprd12.prod.outlook.com> [not found] ` <ffb816ae8336ff2373ec5fcf695e698900c3d557.camel@yadro.com> [not found] ` <9c41221f-ecfa-5554-f2ea-6f72cfe7dc7e@amd.com> [not found] ` <dae8dfd8-3a99-620d-f0aa-ceb39923b807@amd.com> [not found] ` <7d9e947648ce47a4ba8d223cdec08197@yadro.com> [not found] ` <c82919f3-5296-cd0a-0b8f-c33614ca3ea9@amd.com> [not found] ` <340062dba82b813b311b2c78022d2d3d0e6f414d.camel@yadro.com> [not found] ` <927d7fbe-756f-a4f1-44cd-c68aecd906d7@amd.com> [not found] ` <dc2de228b92c4e27819c7681f650e1e5@yadro.com> [not found] ` <a6af29ed-179a-7619-3dde-474542c180f4@amd.com> [not found] ` <8f53f1403f0c4121885398487bbfa241@yadro.com> [not found] ` <fc2ea091-8470-9501-242d-8d82714adecb@amd.com> [not found] ` <50afd1079dbabeba36fd35fdef84e6e15470ef45.camel@yadro.com> [not found] ` <c53c23b5-dd37-44f1-dffd-ff9699018a82@amd.com> [not found] ` <8d7e2d7b7982d8d78c0ecaa74b9d40ace4db8453.camel@yadro.com> 2021-03-04 19:49 ` Question about supporting AMD eGPU hot plug case Andrey Grodzovsky 2021-03-05 16:08 ` Sergei Miroshnichenko 2021-03-05 17:13 ` Andrey Grodzovsky 2021-03-05 19:12 ` Sergei Miroshnichenko 2021-03-05 23:13 ` Andrey Grodzovsky 2021-03-11 22:40 ` Andrey Grodzovsky 2021-03-12 9:03 ` Christian König 2021-03-12 15:34 ` Andrey Grodzovsky 2021-03-15 16:00 ` Andrey Grodzovsky 2021-03-15 16:10 ` Christian König 2021-03-15 16:21 ` Andrey Grodzovsky 2021-03-15 16:55 ` Christian König 2021-03-15 17:11 ` Andrey Grodzovsky 2021-03-16 16:17 ` Christian König 2021-03-16 17:39 ` Andrey Grodzovsky
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.