* Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? @ 2019-06-28 10:23 Timur Kristóf 2019-06-28 10:32 ` Mika Westerberg 0 siblings, 1 reply; 34+ messages in thread From: Timur Kristóf @ 2019-06-28 10:23 UTC (permalink / raw) To: Mika Westerberg, michael.jamet; +Cc: dri-devel Hi guys, I use an AMD RX 570 in a Thunderbolt 3 external GPU box. dmesg gives me the following message: pci 0000:3a:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x4 link at 0000:04:04.0 (capable of 31.504 Gb/s with 8 GT/s x4 link) Here is a tree view of the devices as well as the output of lspci -vvv: https://pastebin.com/CSsS2akZ The critical path of the device tree looks like this: 00:1c.4 Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) 03:00.0 Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02) 04:04.0 Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02) 3a:00.0 Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] 3b:01.0 Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] 3c:00.0 Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev ef) Here is the weird part: Accoding to lspci, all of these devices report in their LnkCap that they support 8 GT/s, except the 04:04.0 and 3a:00.0 which say they only support 2.5 GT/s. Contradictory to lspci, sysfs on the other hand says that both of them are capable of 8 GT/s as well: "/sys/bus/pci/devices/0000:04:04.0/max_link_speed" and "/sys/bus/pci/devices/0000:3a:00.0/max_link_speed" are 8 GT/s. It seems that there is a discrepancy between what lspci thinks and what the devices are actually capable of. Questions: 1. Why are there four bridge devices? 04:00.0, 04:01.0 and 04:02.0 look superfluous to me and nothing is connected to them. It actually gives me the feeling that the TB3 driver creates 4 devices with 2.5 GT/s each, instead of one device that can do the full 8 GT/s. 2. Why are some of the bridge devices only capable of 2.5 GT/s according to lspci? 3. Is it possible to manually set them to 8 GT/s? Thanks in advance for your answers! Best regards, Tim _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-06-28 10:23 Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? Timur Kristóf @ 2019-06-28 10:32 ` Mika Westerberg 2019-06-28 11:08 ` Timur Kristóf 0 siblings, 1 reply; 34+ messages in thread From: Mika Westerberg @ 2019-06-28 10:32 UTC (permalink / raw) To: Timur Kristóf; +Cc: michael.jamet, dri-devel On Fri, Jun 28, 2019 at 12:23:09PM +0200, Timur Kristóf wrote: > Hi guys, > > I use an AMD RX 570 in a Thunderbolt 3 external GPU box. > dmesg gives me the following message: > pci 0000:3a:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x4 link at 0000:04:04.0 (capable of 31.504 Gb/s with 8 GT/s x4 link) > > Here is a tree view of the devices as well as the output of lspci -vvv: > https://pastebin.com/CSsS2akZ > > The critical path of the device tree looks like this: > > 00:1c.4 Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) > 03:00.0 Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02) > 04:04.0 Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02) > 3a:00.0 Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] > 3b:01.0 Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] > 3c:00.0 Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev ef) > > Here is the weird part: > > Accoding to lspci, all of these devices report in their LnkCap that > they support 8 GT/s, except the 04:04.0 and 3a:00.0 which say they only > support 2.5 GT/s. Contradictory to lspci, sysfs on the other hand says > that both of them are capable of 8 GT/s as well: > "/sys/bus/pci/devices/0000:04:04.0/max_link_speed" and > "/sys/bus/pci/devices/0000:3a:00.0/max_link_speed" are 8 GT/s. > It seems that there is a discrepancy between what lspci thinks and what > the devices are actually capable of. > > Questions: > > 1. Why are there four bridge devices? 04:00.0, 04:01.0 and 04:02.0 look > superfluous to me and nothing is connected to them. It actually gives > me the feeling that the TB3 driver creates 4 devices with 2.5 GT/s > each, instead of one device that can do the full 8 GT/s. Because it is standard PCIe switch with one upstream port and n downstream ports. > 2. Why are some of the bridge devices only capable of 2.5 GT/s > according to lspci? You need to talk to lspci maintainer. > 3. Is it possible to manually set them to 8 GT/s? No idea. Are you actually seeing some performance issue because of this or are you just curious? _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-06-28 10:32 ` Mika Westerberg @ 2019-06-28 11:08 ` Timur Kristóf 2019-06-28 11:34 ` Mika Westerberg 0 siblings, 1 reply; 34+ messages in thread From: Timur Kristóf @ 2019-06-28 11:08 UTC (permalink / raw) To: Mika Westerberg; +Cc: michael.jamet, dri-devel Hi Mika, Thanks for your quick reply. > > 1. Why are there four bridge devices? 04:00.0, 04:01.0 and 04:02.0 > > look > > superfluous to me and nothing is connected to them. It actually > > gives > > me the feeling that the TB3 driver creates 4 devices with 2.5 GT/s > > each, instead of one device that can do the full 8 GT/s. > > Because it is standard PCIe switch with one upstream port and n > downstream ports. Sure, though in this case 3 of those downstream ports are not exposed by the hardware, so it's a bit surprising to see them there. Why I asked about it is because I have a suspicion that maybe the bandwidth is allocated equally between the 4 downstream ports, even though only one of them is used. > > > 2. Why are some of the bridge devices only capable of 2.5 GT/s > > according to lspci? > > You need to talk to lspci maintainer. Sorry if the question was unclear. It's not only lspci, the kernel also prints a warning about it. Like I said the device really is limited to 2.5 GT/s even though it should be able to do 8 GT/s. > > > 3. Is it possible to manually set them to 8 GT/s? > > No idea. > > Are you actually seeing some performance issue because of this or are > you just curious? Yes, I see a noticable performance hit: some games have very low frame rate while neither the CPU nor the GPU are fully utilized. (Side note: mesa 19.1 has a radeonsi patch that reduces the bandwidth use, which does help. However it doesn't solve the underlying problem of the slow TB3 interface.) Best regards, Tim _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-06-28 11:08 ` Timur Kristóf @ 2019-06-28 11:34 ` Mika Westerberg 2019-06-28 12:21 ` Timur Kristóf 0 siblings, 1 reply; 34+ messages in thread From: Mika Westerberg @ 2019-06-28 11:34 UTC (permalink / raw) To: Timur Kristóf; +Cc: michael.jamet, dri-devel On Fri, Jun 28, 2019 at 01:08:07PM +0200, Timur Kristóf wrote: > Hi Mika, > > Thanks for your quick reply. > > > > 1. Why are there four bridge devices? 04:00.0, 04:01.0 and 04:02.0 > > > look > > > superfluous to me and nothing is connected to them. It actually > > > gives > > > me the feeling that the TB3 driver creates 4 devices with 2.5 GT/s > > > each, instead of one device that can do the full 8 GT/s. > > > > Because it is standard PCIe switch with one upstream port and n > > downstream ports. > > Sure, though in this case 3 of those downstream ports are not exposed > by the hardware, so it's a bit surprising to see them there. They lead to other peripherals on the TBT host router such as the TBT controller and xHCI. Also there are two downstream ports for extension from which you eGPU is using one. > Why I asked about it is because I have a suspicion that maybe the > bandwidth is allocated equally between the 4 downstream ports, even > though only one of them is used. > > > > > > 2. Why are some of the bridge devices only capable of 2.5 GT/s > > > according to lspci? > > > > You need to talk to lspci maintainer. > > Sorry if the question was unclear. > It's not only lspci, the kernel also prints a warning about it. > > Like I said the device really is limited to 2.5 GT/s even though it > should be able to do 8 GT/s. There is Thunderbolt link between the host router (your host system) and the eGPU box. That link is not limited to 2.5 GT/s so even if the slot claims it is PCI gen1 the actual bandwidth can be much higher because of the virtual link. > > > 3. Is it possible to manually set them to 8 GT/s? > > > > No idea. > > > > Are you actually seeing some performance issue because of this or are > > you just curious? > > Yes, I see a noticable performance hit: some games have very low frame > rate while neither the CPU nor the GPU are fully utilized. Is that problem in Linux only or do you see the same issue in Windows as well? _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-06-28 11:34 ` Mika Westerberg @ 2019-06-28 12:21 ` Timur Kristóf 2019-06-28 12:53 ` Mika Westerberg ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Timur Kristóf @ 2019-06-28 12:21 UTC (permalink / raw) To: Mika Westerberg; +Cc: michael.jamet, dri-devel > > Sure, though in this case 3 of those downstream ports are not > > exposed > > by the hardware, so it's a bit surprising to see them there. > > They lead to other peripherals on the TBT host router such as the TBT > controller and xHCI. Also there are two downstream ports for > extension > from which you eGPU is using one. If you look at the device tree from my first email, you can see that both the GPU and the XHCI uses the same port: 04:04.0 - in fact I can even remove the other 3 ports from the system without any consequences. > > Like I said the device really is limited to 2.5 GT/s even though it > > should be able to do 8 GT/s. > > There is Thunderbolt link between the host router (your host system) > and > the eGPU box. That link is not limited to 2.5 GT/s so even if the > slot > claims it is PCI gen1 the actual bandwidth can be much higher because > of > the virtual link. Not sure I understand correctly, are you saying that TB3 can do 40 Gbit/sec even though the kernel thinks it can only do 8 Gbit / sec? I haven't found a good way to measure the maximum PCIe throughput between the CPU and GPU, but I did take a look at AMD's sysfs interface at /sys/class/drm/card1/device/pcie_bw which while running the bottlenecked game. The highest throughput I saw there was only 2.43 Gbit /sec. One more thought. I've also looked at /sys/class/drm/card1/device/pp_dpm_pcie - which tells me that amdgpu thinks it is running on a 2.5GT/s x8 link (as opposed to the expected 8 GT/s x4). Can this be a problem? > > > > > 3. Is it possible to manually set them to 8 GT/s? > > > > > > No idea. > > > > > > Are you actually seeing some performance issue because of this or > > > are > > > you just curious? > > > > Yes, I see a noticable performance hit: some games have very low > > frame > > rate while neither the CPU nor the GPU are fully utilized. > > Is that problem in Linux only or do you see the same issue in Windows > as > well? I admit I don't have Windows on this computer now and it has been some time since I last tried it, but when I did, I didn't see this problem. Best regards, Tim _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-06-28 12:21 ` Timur Kristóf @ 2019-06-28 12:53 ` Mika Westerberg 2019-06-28 13:33 ` Timur Kristóf 2019-07-01 14:28 ` Alex Deucher 2019-07-01 14:54 ` Michel Dänzer 2 siblings, 1 reply; 34+ messages in thread From: Mika Westerberg @ 2019-06-28 12:53 UTC (permalink / raw) To: Timur Kristóf; +Cc: michael.jamet, dri-devel On Fri, Jun 28, 2019 at 02:21:36PM +0200, Timur Kristóf wrote: > > > > Sure, though in this case 3 of those downstream ports are not > > > exposed > > > by the hardware, so it's a bit surprising to see them there. > > > > They lead to other peripherals on the TBT host router such as the TBT > > controller and xHCI. Also there are two downstream ports for > > extension > > from which you eGPU is using one. > > If you look at the device tree from my first email, you can see that > both the GPU and the XHCI uses the same port: 04:04.0 - in fact I can > even remove the other 3 ports from the system without any consequences. Well that's the extension PCIe downstream port. The other one is 04:01.0. Typically 04:00.0 and 04:00.2 are used to connect TBT (05:00.0) and xHCI (39:00.0) but in your case you don't seem to have USB 3 devices connected to that so it is not present. If you plug in USB-C device (non-TBT) you should see the host router xHCI appearing as well. This is pretty standard topology. > > > Like I said the device really is limited to 2.5 GT/s even though it > > > should be able to do 8 GT/s. > > > > There is Thunderbolt link between the host router (your host system) > > and > > the eGPU box. That link is not limited to 2.5 GT/s so even if the > > slot > > claims it is PCI gen1 the actual bandwidth can be much higher because > > of > > the virtual link. > > Not sure I understand correctly, are you saying that TB3 can do 40 > Gbit/sec even though the kernel thinks it can only do 8 Gbit / sec? Yes the PCIe switch upstream port (3a:00.0) is connected back to the host router over virtual Thunderbolt 40Gb/s link so the PCIe gen1 speeds it reports do not really matter here (same goes for the downstream). The topology looks like bellow if I got it right from the lspci output: 00:1c.4 (root port) 8 GT/s x 4 ^ | real PCIe link v 03:00.0 (upstream port) 8 GT/s x 4 04:04.0 (downstream port) 2.5 GT/s x 4 ^ | virtual link 40 Gb/s v 3a:00.0 (upstream port) 2.5 GT/s x 4 3b:01.0 (downstream port) 8 GT/s x 4 ^ | real PCIe link v 3c:00.0 (eGPU) 8 GT/s x 4 In other words all the real PCIe links run at full 8 GT/s x 4 which is what is expected, I think. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-06-28 12:53 ` Mika Westerberg @ 2019-06-28 13:33 ` Timur Kristóf 2019-06-28 14:14 ` Mika Westerberg 0 siblings, 1 reply; 34+ messages in thread From: Timur Kristóf @ 2019-06-28 13:33 UTC (permalink / raw) To: Mika Westerberg; +Cc: michael.jamet, dri-devel > Well that's the extension PCIe downstream port. The other one is > 04:01.0. > > Typically 04:00.0 and 04:00.2 are used to connect TBT (05:00.0) and > xHCI > (39:00.0) but in your case you don't seem to have USB 3 devices > connected to that so it is not present. If you plug in USB-C device > (non-TBT) you should see the host router xHCI appearing as well. > > This is pretty standard topology. > > > > Not sure I understand correctly, are you saying that TB3 can do 40 > > Gbit/sec even though the kernel thinks it can only do 8 Gbit / sec? > > Yes the PCIe switch upstream port (3a:00.0) is connected back to the > host router over virtual Thunderbolt 40Gb/s link so the PCIe gen1 > speeds > it reports do not really matter here (same goes for the downstream). > > The topology looks like bellow if I got it right from the lspci > output: > > 00:1c.4 (root port) 8 GT/s x 4 > ^ > | real PCIe link > v > 03:00.0 (upstream port) 8 GT/s x 4 > 04:04.0 (downstream port) 2.5 GT/s x 4 > ^ > | virtual link 40 Gb/s > v > 3a:00.0 (upstream port) 2.5 GT/s x 4 > 3b:01.0 (downstream port) 8 GT/s x 4 > ^ > | real PCIe link > v > 3c:00.0 (eGPU) 8 GT/s x 4 > > In other words all the real PCIe links run at full 8 GT/s x 4 which > is > what is expected, I think. It makes sense now. This is hands down the best explanation I've seen about how TB3 hangs together. Thanks for taking the time to explain it! I have two more questions: 1. What is the best way to test that the virtual link is indeed capable of 40 Gbit / sec? So far I've been unable to figure out how to measure its maximum throughput. 2. Why is it that the game can only utilize as much as 2.5 Gbit / sec when it gets bottlenecked? The same problem is not present on a desktop computer with a "normal" PCIe port. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-06-28 13:33 ` Timur Kristóf @ 2019-06-28 14:14 ` Mika Westerberg 2019-06-28 14:53 ` Timur Kristóf 0 siblings, 1 reply; 34+ messages in thread From: Mika Westerberg @ 2019-06-28 14:14 UTC (permalink / raw) To: Timur Kristóf; +Cc: michael.jamet, dri-devel On Fri, Jun 28, 2019 at 03:33:56PM +0200, Timur Kristóf wrote: > I have two more questions: > > 1. What is the best way to test that the virtual link is indeed capable > of 40 Gbit / sec? So far I've been unable to figure out how to measure > its maximum throughput. I don't think there is any good way to test it but the Thunderbolt gen 3 link is pretty much always 40 Gb/s (20 Gb/s x 2) from which the bandwidth is shared dynamically between different tunnels (virtual links). > 2. Why is it that the game can only utilize as much as 2.5 Gbit / sec > when it gets bottlenecked? The same problem is not present on a desktop > computer with a "normal" PCIe port. This is outside of my knowledge, sorry. How that game even knows it can "utilize" only 2.5 Gbit/s. Does it go over the output of "lspci" as well? :-) The PCIe links itself should to get you the 8 GT/s x 4 and I'm quite sure the underlying TBT link works fine as well so my guess is that the issue lies somewhere else but where, I have no idea. Maybe the problem is in the game itself? _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-06-28 14:14 ` Mika Westerberg @ 2019-06-28 14:53 ` Timur Kristóf 2019-07-01 11:44 ` Mika Westerberg 0 siblings, 1 reply; 34+ messages in thread From: Timur Kristóf @ 2019-06-28 14:53 UTC (permalink / raw) To: Mika Westerberg; +Cc: michael.jamet, dri-devel On Fri, 2019-06-28 at 17:14 +0300, Mika Westerberg wrote: > On Fri, Jun 28, 2019 at 03:33:56PM +0200, Timur Kristóf wrote: > > I have two more questions: > > > > 1. What is the best way to test that the virtual link is indeed > > capable > > of 40 Gbit / sec? So far I've been unable to figure out how to > > measure > > its maximum throughput. > > I don't think there is any good way to test it but the Thunderbolt > gen 3 > link is pretty much always 40 Gb/s (20 Gb/s x 2) from which the > bandwidth is shared dynamically between different tunnels (virtual > links). That's unfortunate, I would have expected there to be some sort of PCIe speed test utility. Now that I gave it a try, I can measure ~20 Gbit/sec when I run Gnome Wayland on this system (which forces the eGPU to send the framebuffer back and forth all the time - for two 4K monitors). But it still doesn't give me 40 Gbit/sec. > > > 2. Why is it that the game can only utilize as much as 2.5 Gbit / > > sec > > when it gets bottlenecked? The same problem is not present on a > > desktop > > computer with a "normal" PCIe port. > > This is outside of my knowledge, sorry. How that game even knows it > can > "utilize" only 2.5 Gbit/s. Does it go over the output of "lspci" as > well? :-) > > The PCIe links itself should to get you the 8 GT/s x 4 and I'm quite > sure the underlying TBT link works fine as well so my guess is that > the > issue lies somewhere else but where, I have no idea. > > Maybe the problem is in the game itself? I had a brief discussion with Marek about this earlier, and he said that this has to do with latency too, not just bandwidth, but he didn't explain any further. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-06-28 14:53 ` Timur Kristóf @ 2019-07-01 11:44 ` Mika Westerberg 2019-07-01 14:25 ` Timur Kristóf 0 siblings, 1 reply; 34+ messages in thread From: Mika Westerberg @ 2019-07-01 11:44 UTC (permalink / raw) To: Timur Kristóf; +Cc: michael.jamet, dri-devel On Fri, Jun 28, 2019 at 04:53:02PM +0200, Timur Kristóf wrote: > On Fri, 2019-06-28 at 17:14 +0300, Mika Westerberg wrote: > > On Fri, Jun 28, 2019 at 03:33:56PM +0200, Timur Kristóf wrote: > > > I have two more questions: > > > > > > 1. What is the best way to test that the virtual link is indeed > > > capable > > > of 40 Gbit / sec? So far I've been unable to figure out how to > > > measure > > > its maximum throughput. > > > > I don't think there is any good way to test it but the Thunderbolt > > gen 3 > > link is pretty much always 40 Gb/s (20 Gb/s x 2) from which the > > bandwidth is shared dynamically between different tunnels (virtual > > links). > > That's unfortunate, I would have expected there to be some sort of PCIe > speed test utility. > > Now that I gave it a try, I can measure ~20 Gbit/sec when I run Gnome > Wayland on this system (which forces the eGPU to send the framebuffer > back and forth all the time - for two 4K monitors). But it still > doesn't give me 40 Gbit/sec. How do you measure that? Is there a DP stream also? As I said the bandwidth is dynamically shared between the consumers so you probably do not get the full bandwidth for PCIe only because it needs to reserve something for possible DP streams and so on. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-01 11:44 ` Mika Westerberg @ 2019-07-01 14:25 ` Timur Kristóf 0 siblings, 0 replies; 34+ messages in thread From: Timur Kristóf @ 2019-07-01 14:25 UTC (permalink / raw) To: Mika Westerberg; +Cc: michael.jamet, dri-devel > > > > That's unfortunate, I would have expected there to be some sort of > > PCIe > > speed test utility. > > > > Now that I gave it a try, I can measure ~20 Gbit/sec when I run > > Gnome > > Wayland on this system (which forces the eGPU to send the > > framebuffer > > back and forth all the time - for two 4K monitors). But it still > > doesn't give me 40 Gbit/sec. > > How do you measure that? Is there a DP stream also? As I said the > bandwidth is dynamically shared between the consumers so you probably > do > not get the full bandwidth for PCIe only because it needs to reserve > something for possible DP streams and so on. I'm measuring it using AMD's pcie_bw sysfs interface which shows how many packets were sent and received by the GPU, and the max packet size. So it's not an exact measurement but a good estimate. AFAIK there is no DP stream. Only the eGPU is connected to the TB3 port and nothing else. The graphics card inside the TB3 enclosure does have a DP connector which is in use, but I assume that's not what you mean. It also doesn't seem to make a difference whether or not anything is plugged into the USB ports provided by the eGPU. (Some online posts suggest that not using those ports would allow higher throughput to the eGPU, but I don't see that it would make any difference here.) _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-06-28 12:21 ` Timur Kristóf 2019-06-28 12:53 ` Mika Westerberg @ 2019-07-01 14:28 ` Alex Deucher 2019-07-01 14:38 ` Timur Kristóf 2019-07-01 14:54 ` Michel Dänzer 2 siblings, 1 reply; 34+ messages in thread From: Alex Deucher @ 2019-07-01 14:28 UTC (permalink / raw) To: Timur Kristóf; +Cc: michael.jamet, Mika Westerberg, dri-devel On Sun, Jun 30, 2019 at 2:27 PM Timur Kristóf <timur.kristof@gmail.com> wrote: > > > > > Sure, though in this case 3 of those downstream ports are not > > > exposed > > > by the hardware, so it's a bit surprising to see them there. > > > > They lead to other peripherals on the TBT host router such as the TBT > > controller and xHCI. Also there are two downstream ports for > > extension > > from which you eGPU is using one. > > If you look at the device tree from my first email, you can see that > both the GPU and the XHCI uses the same port: 04:04.0 - in fact I can > even remove the other 3 ports from the system without any consequences. > > > > Like I said the device really is limited to 2.5 GT/s even though it > > > should be able to do 8 GT/s. > > > > There is Thunderbolt link between the host router (your host system) > > and > > the eGPU box. That link is not limited to 2.5 GT/s so even if the > > slot > > claims it is PCI gen1 the actual bandwidth can be much higher because > > of > > the virtual link. > > Not sure I understand correctly, are you saying that TB3 can do 40 > Gbit/sec even though the kernel thinks it can only do 8 Gbit / sec? > > I haven't found a good way to measure the maximum PCIe throughput > between the CPU and GPU, but I did take a look at AMD's sysfs interface > at /sys/class/drm/card1/device/pcie_bw which while running the > bottlenecked game. The highest throughput I saw there was only 2.43 > Gbit /sec. > > One more thought. I've also looked at > /sys/class/drm/card1/device/pp_dpm_pcie - which tells me that amdgpu > thinks it is running on a 2.5GT/s x8 link (as opposed to the expected 8 > GT/s x4). Can this be a problem? We limit the speed of the link the the driver to the max speed of any upstream links. So if there are any links upstream limited to 2.5 GT/s, it doesn't make sense to clock the local link up to faster speeds. Alex > > > > > > > > 3. Is it possible to manually set them to 8 GT/s? > > > > > > > > No idea. > > > > > > > > Are you actually seeing some performance issue because of this or > > > > are > > > > you just curious? > > > > > > Yes, I see a noticable performance hit: some games have very low > > > frame > > > rate while neither the CPU nor the GPU are fully utilized. > > > > Is that problem in Linux only or do you see the same issue in Windows > > as > > well? > > > I admit I don't have Windows on this computer now and it has been some > time since I last tried it, but when I did, I didn't see this problem. > > Best regards, > Tim > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-01 14:28 ` Alex Deucher @ 2019-07-01 14:38 ` Timur Kristóf 2019-07-01 14:46 ` Alex Deucher 0 siblings, 1 reply; 34+ messages in thread From: Timur Kristóf @ 2019-07-01 14:38 UTC (permalink / raw) To: Alex Deucher; +Cc: michael.jamet, Mika Westerberg, dri-devel > > > > Like I said the device really is limited to 2.5 GT/s even > > > > though it > > > > should be able to do 8 GT/s. > > > > > > There is Thunderbolt link between the host router (your host > > > system) > > > and > > > the eGPU box. That link is not limited to 2.5 GT/s so even if the > > > slot > > > claims it is PCI gen1 the actual bandwidth can be much higher > > > because > > > of > > > the virtual link. > > > > Not sure I understand correctly, are you saying that TB3 can do 40 > > Gbit/sec even though the kernel thinks it can only do 8 Gbit / sec? > > > > I haven't found a good way to measure the maximum PCIe throughput > > between the CPU and GPU, but I did take a look at AMD's sysfs > > interface > > at /sys/class/drm/card1/device/pcie_bw which while running the > > bottlenecked game. The highest throughput I saw there was only 2.43 > > Gbit /sec. > > > > One more thought. I've also looked at > > /sys/class/drm/card1/device/pp_dpm_pcie - which tells me that > > amdgpu > > thinks it is running on a 2.5GT/s x8 link (as opposed to the > > expected 8 > > GT/s x4). Can this be a problem? > > We limit the speed of the link the the driver to the max speed of any > upstream links. So if there are any links upstream limited to 2.5 > GT/s, it doesn't make sense to clock the local link up to faster > speeds. > > Alex Hi Alex, I have two concerns about it: 1. Why does amdgpu think that the link has 8 lanes, when it only has 4? 2. As far as I understood what Mika said, there isn't really a 2.5 GT/s limitation there, since the virtual link should be running at 40 Gb/s regardless of the reported speed of that device. Would it be possible to run the AMD GPU at 8 GT/s in this case? Best regards, Tim _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-01 14:38 ` Timur Kristóf @ 2019-07-01 14:46 ` Alex Deucher 2019-07-01 15:10 ` Mika Westerberg 0 siblings, 1 reply; 34+ messages in thread From: Alex Deucher @ 2019-07-01 14:46 UTC (permalink / raw) To: Timur Kristóf; +Cc: michael.jamet, Mika Westerberg, dri-devel On Mon, Jul 1, 2019 at 10:38 AM Timur Kristóf <timur.kristof@gmail.com> wrote: > > > > > > Like I said the device really is limited to 2.5 GT/s even > > > > > though it > > > > > should be able to do 8 GT/s. > > > > > > > > There is Thunderbolt link between the host router (your host > > > > system) > > > > and > > > > the eGPU box. That link is not limited to 2.5 GT/s so even if the > > > > slot > > > > claims it is PCI gen1 the actual bandwidth can be much higher > > > > because > > > > of > > > > the virtual link. > > > > > > Not sure I understand correctly, are you saying that TB3 can do 40 > > > Gbit/sec even though the kernel thinks it can only do 8 Gbit / sec? > > > > > > I haven't found a good way to measure the maximum PCIe throughput > > > between the CPU and GPU, but I did take a look at AMD's sysfs > > > interface > > > at /sys/class/drm/card1/device/pcie_bw which while running the > > > bottlenecked game. The highest throughput I saw there was only 2.43 > > > Gbit /sec. > > > > > > One more thought. I've also looked at > > > /sys/class/drm/card1/device/pp_dpm_pcie - which tells me that > > > amdgpu > > > thinks it is running on a 2.5GT/s x8 link (as opposed to the > > > expected 8 > > > GT/s x4). Can this be a problem? > > > > We limit the speed of the link the the driver to the max speed of any > > upstream links. So if there are any links upstream limited to 2.5 > > GT/s, it doesn't make sense to clock the local link up to faster > > speeds. > > > > Alex > > Hi Alex, > > I have two concerns about it: > > 1. Why does amdgpu think that the link has 8 lanes, when it only has 4? Not sure. We use pcie_bandwidth_available() on kernel 5.3 and newer to determine the max speed and links when we set up the GPU. For older kernels use use an open coded version of it in the driver. > > 2. As far as I understood what Mika said, there isn't really a 2.5 GT/s > limitation there, since the virtual link should be running at 40 Gb/s > regardless of the reported speed of that device. Would it be possible > to run the AMD GPU at 8 GT/s in this case? If there is really a faster link here then we need some way to pass that information to the drivers. We rely on the information from the upstream bridges and the pcie core helper functions. Alex _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-01 14:46 ` Alex Deucher @ 2019-07-01 15:10 ` Mika Westerberg 0 siblings, 0 replies; 34+ messages in thread From: Mika Westerberg @ 2019-07-01 15:10 UTC (permalink / raw) To: Alex Deucher; +Cc: michael.jamet, dri-devel, Timur Kristóf On Mon, Jul 01, 2019 at 10:46:34AM -0400, Alex Deucher wrote: > > 2. As far as I understood what Mika said, there isn't really a 2.5 GT/s > > limitation there, since the virtual link should be running at 40 Gb/s > > regardless of the reported speed of that device. Would it be possible > > to run the AMD GPU at 8 GT/s in this case? > > If there is really a faster link here then we need some way to pass > that information to the drivers. We rely on the information from the > upstream bridges and the pcie core helper functions. I think you may use "pci_dev->is_thunderbolt" in the GPU driver and then just use whatever the real PCI link speed & width is between the GPU and the downstream port it connects to. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-06-28 12:21 ` Timur Kristóf 2019-06-28 12:53 ` Mika Westerberg 2019-07-01 14:28 ` Alex Deucher @ 2019-07-01 14:54 ` Michel Dänzer 2019-07-01 16:01 ` Timur Kristóf 2 siblings, 1 reply; 34+ messages in thread From: Michel Dänzer @ 2019-07-01 14:54 UTC (permalink / raw) To: Timur Kristóf, Mika Westerberg; +Cc: michael.jamet, dri-devel On 2019-06-28 2:21 p.m., Timur Kristóf wrote: > > I haven't found a good way to measure the maximum PCIe throughput > between the CPU and GPU, amdgpu.benchmark=3 on the kernel command line will measure throughput for various transfer sizes during driver initialization. > but I did take a look at AMD's sysfs interface at > /sys/class/drm/card1/device/pcie_bw which while running the bottlenecked > game. The highest throughput I saw there was only 2.43 Gbit /sec. PCIe bandwidth generally isn't a bottleneck for games, since they don't constantly transfer large data volumes across PCIe, but store them in the GPU's local VRAM, which is connected at much higher bandwidth. -- Earthling Michel Dänzer | https://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-01 14:54 ` Michel Dänzer @ 2019-07-01 16:01 ` Timur Kristóf 2019-07-02 8:09 ` Michel Dänzer 0 siblings, 1 reply; 34+ messages in thread From: Timur Kristóf @ 2019-07-01 16:01 UTC (permalink / raw) To: Michel Dänzer, Mika Westerberg; +Cc: michael.jamet, dri-devel On Mon, 2019-07-01 at 16:54 +0200, Michel Dänzer wrote: > On 2019-06-28 2:21 p.m., Timur Kristóf wrote: > > I haven't found a good way to measure the maximum PCIe throughput > > between the CPU and GPU, > > amdgpu.benchmark=3 > > on the kernel command line will measure throughput for various > transfer > sizes during driver initialization. Thanks, I will definitely try that. Is this the only way to do this, or is there a way to benchmark it after it already booted? > > but I did take a look at AMD's sysfs interface at > > /sys/class/drm/card1/device/pcie_bw which while running the > > bottlenecked > > game. The highest throughput I saw there was only 2.43 Gbit /sec. > > PCIe bandwidth generally isn't a bottleneck for games, since they > don't > constantly transfer large data volumes across PCIe, but store them in > the GPU's local VRAM, which is connected at much higher bandwidth. There are reasons why I think the problem is the bandwidth: 1. The same issues don't happen when the GPU is not used with a TB3 enclosure. 2. In case of radeonsi, the problem was mitigated once Marek's SDMA patch was merged, which hugely reduces the PCIe bandwidth use. 3. In less optimized cases (for example D9VK), the problem is still very noticable. Best regards, Tim _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-01 16:01 ` Timur Kristóf @ 2019-07-02 8:09 ` Michel Dänzer 2019-07-02 9:49 ` Timur Kristóf 0 siblings, 1 reply; 34+ messages in thread From: Michel Dänzer @ 2019-07-02 8:09 UTC (permalink / raw) To: Timur Kristóf, Mika Westerberg; +Cc: michael.jamet, dri-devel On 2019-07-01 6:01 p.m., Timur Kristóf wrote: > On Mon, 2019-07-01 at 16:54 +0200, Michel Dänzer wrote: >> On 2019-06-28 2:21 p.m., Timur Kristóf wrote: >>> I haven't found a good way to measure the maximum PCIe throughput >>> between the CPU and GPU, >> >> amdgpu.benchmark=3 >> >> on the kernel command line will measure throughput for various >> transfer >> sizes during driver initialization. > > Thanks, I will definitely try that. > Is this the only way to do this, or is there a way to benchmark it > after it already booted? The former. At least in theory, it's possible to unload the amdgpu module while nothing is using it, then load it again. >>> but I did take a look at AMD's sysfs interface at >>> /sys/class/drm/card1/device/pcie_bw which while running the >>> bottlenecked >>> game. The highest throughput I saw there was only 2.43 Gbit /sec. >> >> PCIe bandwidth generally isn't a bottleneck for games, since they >> don't >> constantly transfer large data volumes across PCIe, but store them in >> the GPU's local VRAM, which is connected at much higher bandwidth. > > There are reasons why I think the problem is the bandwidth: > 1. The same issues don't happen when the GPU is not used with a TB3 > enclosure. > 2. In case of radeonsi, the problem was mitigated once Marek's SDMA > patch was merged, which hugely reduces the PCIe bandwidth use. > 3. In less optimized cases (for example D9VK), the problem is still > very noticable. However, since you saw as much as ~20 Gbit/s under different circumstances, the 2.43 Gbit/s used by this game clearly isn't a hard limit; there must be other limiting factors. -- Earthling Michel Dänzer | https://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-02 8:09 ` Michel Dänzer @ 2019-07-02 9:49 ` Timur Kristóf 2019-07-03 8:07 ` Michel Dänzer 0 siblings, 1 reply; 34+ messages in thread From: Timur Kristóf @ 2019-07-02 9:49 UTC (permalink / raw) To: Michel Dänzer, Mika Westerberg; +Cc: michael.jamet, dri-devel On Tue, 2019-07-02 at 10:09 +0200, Michel Dänzer wrote: > On 2019-07-01 6:01 p.m., Timur Kristóf wrote: > > On Mon, 2019-07-01 at 16:54 +0200, Michel Dänzer wrote: > > > On 2019-06-28 2:21 p.m., Timur Kristóf wrote: > > > > I haven't found a good way to measure the maximum PCIe > > > > throughput > > > > between the CPU and GPU, > > > > > > amdgpu.benchmark=3 > > > > > > on the kernel command line will measure throughput for various > > > transfer > > > sizes during driver initialization. > > > > Thanks, I will definitely try that. > > Is this the only way to do this, or is there a way to benchmark it > > after it already booted? > > The former. At least in theory, it's possible to unload the amdgpu > module while nothing is using it, then load it again. Okay, so I booted my system with amdgpu.benchmark=3 You can find the full dmesg log here: https://pastebin.com/zN9FYGw4 The result is between 1-5 Gbit / sec depending on the transfer size (the higher the better), which corresponds to neither the 8 Gbit / sec that the kernel thinks it is limited to, nor the 20 Gbit / sec which I measured earlier with pcie_bw. Since pcie_bw only shows the maximum PCIe packet size (and not the actual size), could it be that it's so inaccurate that the 20 Gbit / sec is a fluke? Side note: after booting with amdgpu.benchmark=3 the graphical session was useless and straight out hanged the system after I logged in. So I had to reboot into runlevel 3 to be able to save the above dmesg log. > > > > > but I did take a look at AMD's sysfs interface at > > > > /sys/class/drm/card1/device/pcie_bw which while running the > > > > bottlenecked > > > > game. The highest throughput I saw there was only 2.43 Gbit > > > > /sec. > > > > > > PCIe bandwidth generally isn't a bottleneck for games, since they > > > don't > > > constantly transfer large data volumes across PCIe, but store > > > them in > > > the GPU's local VRAM, which is connected at much higher > > > bandwidth. > > > > There are reasons why I think the problem is the bandwidth: > > 1. The same issues don't happen when the GPU is not used with a TB3 > > enclosure. > > 2. In case of radeonsi, the problem was mitigated once Marek's SDMA > > patch was merged, which hugely reduces the PCIe bandwidth use. > > 3. In less optimized cases (for example D9VK), the problem is still > > very noticable. > > However, since you saw as much as ~20 Gbit/s under different > circumstances, the 2.43 Gbit/s used by this game clearly isn't a hard > limit; there must be other limiting factors. There may be other factors, yes. I can't offer a good explanation on what exactly is happening, but it's pretty clear that amdgpu can't take full advantage of the TB3 link, so it seemed like a good idea to start investigating this first. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-02 9:49 ` Timur Kristóf @ 2019-07-03 8:07 ` Michel Dänzer 2019-07-03 11:04 ` Timur Kristóf 2019-07-03 18:44 ` Marek Olšák 0 siblings, 2 replies; 34+ messages in thread From: Michel Dänzer @ 2019-07-03 8:07 UTC (permalink / raw) To: Timur Kristóf, Mika Westerberg; +Cc: michael.jamet, dri-devel On 2019-07-02 11:49 a.m., Timur Kristóf wrote: > On Tue, 2019-07-02 at 10:09 +0200, Michel Dänzer wrote: >> On 2019-07-01 6:01 p.m., Timur Kristóf wrote: >>> On Mon, 2019-07-01 at 16:54 +0200, Michel Dänzer wrote: >>>> On 2019-06-28 2:21 p.m., Timur Kristóf wrote: >>>>> I haven't found a good way to measure the maximum PCIe >>>>> throughput >>>>> between the CPU and GPU, >>>> >>>> amdgpu.benchmark=3 >>>> >>>> on the kernel command line will measure throughput for various >>>> transfer >>>> sizes during driver initialization. >>> >>> Thanks, I will definitely try that. >>> Is this the only way to do this, or is there a way to benchmark it >>> after it already booted? >> >> The former. At least in theory, it's possible to unload the amdgpu >> module while nothing is using it, then load it again. > > Okay, so I booted my system with amdgpu.benchmark=3 > You can find the full dmesg log here: https://pastebin.com/zN9FYGw4 > > The result is between 1-5 Gbit / sec depending on the transfer size > (the higher the better), which corresponds to neither the 8 Gbit / sec > that the kernel thinks it is limited to, nor the 20 Gbit / sec which I > measured earlier with pcie_bw. 5 Gbit/s throughput could be consistent with 8 Gbit/s theoretical bandwidth, due to various overhead. > Since pcie_bw only shows the maximum PCIe packet size (and not the > actual size), could it be that it's so inaccurate that the 20 Gbit / > sec is a fluke? Seems likely or at least plausible. >>>>> but I did take a look at AMD's sysfs interface at >>>>> /sys/class/drm/card1/device/pcie_bw which while running the >>>>> bottlenecked >>>>> game. The highest throughput I saw there was only 2.43 Gbit >>>>> /sec. >>>> >>>> PCIe bandwidth generally isn't a bottleneck for games, since they >>>> don't >>>> constantly transfer large data volumes across PCIe, but store >>>> them in >>>> the GPU's local VRAM, which is connected at much higher >>>> bandwidth. >>> >>> There are reasons why I think the problem is the bandwidth: >>> 1. The same issues don't happen when the GPU is not used with a TB3 >>> enclosure. >>> 2. In case of radeonsi, the problem was mitigated once Marek's SDMA >>> patch was merged, which hugely reduces the PCIe bandwidth use. >>> 3. In less optimized cases (for example D9VK), the problem is still >>> very noticable. >> >> However, since you saw as much as ~20 Gbit/s under different >> circumstances, the 2.43 Gbit/s used by this game clearly isn't a hard >> limit; there must be other limiting factors. > > There may be other factors, yes. I can't offer a good explanation on > what exactly is happening, but it's pretty clear that amdgpu can't take > full advantage of the TB3 link, so it seemed like a good idea to start > investigating this first. Yeah, actually it would be consistent with ~16-32 KB granularity transfers based on your measurements above, which is plausible. So making sure that the driver doesn't artificially limit the PCIe bandwidth might indeed help. OTOH this also indicates a similar potential for improvement by using larger transfers in Mesa and/or the kernel. -- Earthling Michel Dänzer | https://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-03 8:07 ` Michel Dänzer @ 2019-07-03 11:04 ` Timur Kristóf 2019-07-04 8:26 ` Michel Dänzer 2019-07-03 18:44 ` Marek Olšák 1 sibling, 1 reply; 34+ messages in thread From: Timur Kristóf @ 2019-07-03 11:04 UTC (permalink / raw) To: Michel Dänzer, Mika Westerberg; +Cc: michael.jamet, dri-devel > > Okay, so I booted my system with amdgpu.benchmark=3 > > You can find the full dmesg log here: https://pastebin.com/zN9FYGw4 > > > > The result is between 1-5 Gbit / sec depending on the transfer size > > (the higher the better), which corresponds to neither the 8 Gbit / > > sec > > that the kernel thinks it is limited to, nor the 20 Gbit / sec > > which I > > measured earlier with pcie_bw. > > 5 Gbit/s throughput could be consistent with 8 Gbit/s theoretical > bandwidth, due to various overhead. Okay, that's good to know. > > Since pcie_bw only shows the maximum PCIe packet size (and not the > > actual size), could it be that it's so inaccurate that the 20 Gbit > > / > > sec is a fluke? > > Seems likely or at least plausible. Thanks for the confirmation. It also looks like it is the slowest with small transfers, which I assume mesa is doing for this game. > > > > There may be other factors, yes. I can't offer a good explanation > > on > > what exactly is happening, but it's pretty clear that amdgpu can't > > take > > full advantage of the TB3 link, so it seemed like a good idea to > > start > > investigating this first. > > Yeah, actually it would be consistent with ~16-32 KB granularity > transfers based on your measurements above, which is plausible. So > making sure that the driver doesn't artificially limit the PCIe > bandwidth might indeed help. Can you point me to the place where amdgpu decides the PCIe link speed? I'd like to try to tweak it a little bit to see if that helps at all. > OTOH this also indicates a similar potential for improvement by using > larger transfers in Mesa and/or the kernel. Yes, that sounds like it would be worth looking into. Out of curiosity, is there a performace decrease with small transfers on a "normal" PCIe port too, or is this specific to TB3? Best regards, Tim _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-03 11:04 ` Timur Kristóf @ 2019-07-04 8:26 ` Michel Dänzer 2019-07-05 9:17 ` Timur Kristóf 2019-07-05 13:36 ` Alex Deucher 0 siblings, 2 replies; 34+ messages in thread From: Michel Dänzer @ 2019-07-04 8:26 UTC (permalink / raw) To: Timur Kristóf, Mika Westerberg, Alex Deucher Cc: michael.jamet, dri-devel On 2019-07-03 1:04 p.m., Timur Kristóf wrote: > >>> There may be other factors, yes. I can't offer a good explanation >>> on >>> what exactly is happening, but it's pretty clear that amdgpu can't >>> take >>> full advantage of the TB3 link, so it seemed like a good idea to >>> start >>> investigating this first. >> >> Yeah, actually it would be consistent with ~16-32 KB granularity >> transfers based on your measurements above, which is plausible. So >> making sure that the driver doesn't artificially limit the PCIe >> bandwidth might indeed help. > > Can you point me to the place where amdgpu decides the PCIe link speed? > I'd like to try to tweak it a little bit to see if that helps at all. I'm not sure offhand, Alex or anyone? >> OTOH this also indicates a similar potential for improvement by using >> larger transfers in Mesa and/or the kernel. > > Yes, that sounds like it would be worth looking into. > > Out of curiosity, is there a performace decrease with small transfers > on a "normal" PCIe port too, or is this specific to TB3? It's not TB3 specific. With a "normal" 8 GT/s x16 port, I get between ~256 MB/s for 4 KB transfers and ~12 GB/s for 4 MB transfers (even larger transfers seem slightly slower again). This also looks consistent with your measurements in that the practical limit seems to be around 75% of the theoretical bandwidth. -- Earthling Michel Dänzer | https://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-04 8:26 ` Michel Dänzer @ 2019-07-05 9:17 ` Timur Kristóf 2019-07-05 13:36 ` Alex Deucher 1 sibling, 0 replies; 34+ messages in thread From: Timur Kristóf @ 2019-07-05 9:17 UTC (permalink / raw) To: Michel Dänzer, Mika Westerberg, Alex Deucher Cc: michael.jamet, dri-devel > > Can you point me to the place where amdgpu decides the PCIe link > > speed? > > I'd like to try to tweak it a little bit to see if that helps at > > all. > > I'm not sure offhand, Alex or anyone? Thus far, I started by looking at how the pp_dpm_pcie sysfs interface works, and found smu7_hwmgr which seems to be the only hwmgr that actually outputs anything on PP_PCIE: https://github.com/torvalds/linux/blob/a2d635decbfa9c1e4ae15cb05b68b2559f7f827c/drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c#L4462 However, its output is definitely incorrect. It tells me that the supported PCIe modes are: cat /sys/class/drm/card1/device/pp_dpm_pcie 0: 2.5GT/s, x8 1: 8.0GT/s, x16 It allows me to change between these two modes, but the change doesn't seem to have any actual effect on the transfer speeds. Neither of those modes actually makes sense. Amdgpu doesn't seem to be aware of the fact that it runs on a x4 link. In fact, the smu7_get_current_pcie_lane_number function even has an assertion: PP_ASSERT_WITH_CODE((7 >= link_width), On the other hand: cat /sys/class/drm/card1/device/current_link_width 4 So I don't understand how it can even work with PCIe x4, why doesn't that assertion get triggered on my system? > > Out of curiosity, is there a performace decrease with small > > transfers > > on a "normal" PCIe port too, or is this specific to TB3? > > It's not TB3 specific. With a "normal" 8 GT/s x16 port, I get between > ~256 MB/s for 4 KB transfers and ~12 GB/s for 4 MB transfers (even > larger transfers seem slightly slower again). This also looks > consistent > with your measurements in that the practical limit seems to be around > 75% of the theoretical bandwidth. Sounds like your idea to try to optimize mesa to use larger transfers is a good idea, then. Best regards, Tim _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-04 8:26 ` Michel Dänzer 2019-07-05 9:17 ` Timur Kristóf @ 2019-07-05 13:36 ` Alex Deucher 2019-07-18 9:11 ` Timur Kristóf 1 sibling, 1 reply; 34+ messages in thread From: Alex Deucher @ 2019-07-05 13:36 UTC (permalink / raw) To: Michel Dänzer Cc: michael.jamet, Mika Westerberg, dri-devel, Timur Kristóf On Thu, Jul 4, 2019 at 6:55 AM Michel Dänzer <michel@daenzer.net> wrote: > > On 2019-07-03 1:04 p.m., Timur Kristóf wrote: > > > >>> There may be other factors, yes. I can't offer a good explanation > >>> on > >>> what exactly is happening, but it's pretty clear that amdgpu can't > >>> take > >>> full advantage of the TB3 link, so it seemed like a good idea to > >>> start > >>> investigating this first. > >> > >> Yeah, actually it would be consistent with ~16-32 KB granularity > >> transfers based on your measurements above, which is plausible. So > >> making sure that the driver doesn't artificially limit the PCIe > >> bandwidth might indeed help. > > > > Can you point me to the place where amdgpu decides the PCIe link speed? > > I'd like to try to tweak it a little bit to see if that helps at all. > > I'm not sure offhand, Alex or anyone? amdgpu_device_get_pcie_info() in amdgpu_device.c. > > > >> OTOH this also indicates a similar potential for improvement by using > >> larger transfers in Mesa and/or the kernel. > > > > Yes, that sounds like it would be worth looking into. > > > > Out of curiosity, is there a performace decrease with small transfers > > on a "normal" PCIe port too, or is this specific to TB3? > > It's not TB3 specific. With a "normal" 8 GT/s x16 port, I get between > ~256 MB/s for 4 KB transfers and ~12 GB/s for 4 MB transfers (even > larger transfers seem slightly slower again). This also looks consistent > with your measurements in that the practical limit seems to be around > 75% of the theoretical bandwidth. > > > -- > Earthling Michel Dänzer | https://www.amd.com > Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-05 13:36 ` Alex Deucher @ 2019-07-18 9:11 ` Timur Kristóf 2019-07-18 13:50 ` Alex Deucher 0 siblings, 1 reply; 34+ messages in thread From: Timur Kristóf @ 2019-07-18 9:11 UTC (permalink / raw) To: Alex Deucher, Michel Dänzer Cc: michael.jamet, Mika Westerberg, dri-devel On Fri, 2019-07-05 at 09:36 -0400, Alex Deucher wrote: > On Thu, Jul 4, 2019 at 6:55 AM Michel Dänzer <michel@daenzer.net> > wrote: > > On 2019-07-03 1:04 p.m., Timur Kristóf wrote: > > > > > There may be other factors, yes. I can't offer a good > > > > > explanation > > > > > on > > > > > what exactly is happening, but it's pretty clear that amdgpu > > > > > can't > > > > > take > > > > > full advantage of the TB3 link, so it seemed like a good idea > > > > > to > > > > > start > > > > > investigating this first. > > > > > > > > Yeah, actually it would be consistent with ~16-32 KB > > > > granularity > > > > transfers based on your measurements above, which is plausible. > > > > So > > > > making sure that the driver doesn't artificially limit the PCIe > > > > bandwidth might indeed help. > > > > > > Can you point me to the place where amdgpu decides the PCIe link > > > speed? > > > I'd like to try to tweak it a little bit to see if that helps at > > > all. > > > > I'm not sure offhand, Alex or anyone? > > amdgpu_device_get_pcie_info() in amdgpu_device.c. Hi Alex, I took a look at amdgpu_device_get_pcie_info() and found that it uses pcie_bandwidth_available to determine the capabilities of the PCIe port. However, pcie_bandwidth_available gives you only the current bandwidth as set by the PCIe link status register, not the maximum capability. I think something along these lines would fix it: https://pastebin.com/LscEMKMc It seems to me that the PCIe capabilities are only used in a few places in the code, so this patch fixes pp_dpm_pcie. However it doesn't affect the actual performance. What do you think? Best regards, Tim _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-18 9:11 ` Timur Kristóf @ 2019-07-18 13:50 ` Alex Deucher [not found] ` <172a41d97d383a8989ebd213bb4230a2df4d636d.camel@gmail.com> 0 siblings, 1 reply; 34+ messages in thread From: Alex Deucher @ 2019-07-18 13:50 UTC (permalink / raw) To: Timur Kristóf Cc: michael.jamet, Michel Dänzer, Mika Westerberg, dri-devel On Thu, Jul 18, 2019 at 5:11 AM Timur Kristóf <timur.kristof@gmail.com> wrote: > > On Fri, 2019-07-05 at 09:36 -0400, Alex Deucher wrote: > > On Thu, Jul 4, 2019 at 6:55 AM Michel Dänzer <michel@daenzer.net> > > wrote: > > > On 2019-07-03 1:04 p.m., Timur Kristóf wrote: > > > > > > There may be other factors, yes. I can't offer a good > > > > > > explanation > > > > > > on > > > > > > what exactly is happening, but it's pretty clear that amdgpu > > > > > > can't > > > > > > take > > > > > > full advantage of the TB3 link, so it seemed like a good idea > > > > > > to > > > > > > start > > > > > > investigating this first. > > > > > > > > > > Yeah, actually it would be consistent with ~16-32 KB > > > > > granularity > > > > > transfers based on your measurements above, which is plausible. > > > > > So > > > > > making sure that the driver doesn't artificially limit the PCIe > > > > > bandwidth might indeed help. > > > > > > > > Can you point me to the place where amdgpu decides the PCIe link > > > > speed? > > > > I'd like to try to tweak it a little bit to see if that helps at > > > > all. > > > > > > I'm not sure offhand, Alex or anyone? > > > > amdgpu_device_get_pcie_info() in amdgpu_device.c. > > > Hi Alex, > > I took a look at amdgpu_device_get_pcie_info() and found that it uses > pcie_bandwidth_available to determine the capabilities of the PCIe > port. However, pcie_bandwidth_available gives you only the current > bandwidth as set by the PCIe link status register, not the maximum > capability. > > I think something along these lines would fix it: > https://pastebin.com/LscEMKMc > > It seems to me that the PCIe capabilities are only used in a few places > in the code, so this patch fixes pp_dpm_pcie. However it doesn't affect > the actual performance. > > What do you think? I think we want the current bandwidth. The GPU can only control the speed of its local link. If there are upstream links that are slower than its local link, it doesn't make sense to run the local link at faster speeds because it will burn extra power it will just run into a bottleneck at the next link. In general, most systems negotiate the fastest link speed supported by both ends at power up. Alex > > Best regards, > Tim > _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <172a41d97d383a8989ebd213bb4230a2df4d636d.camel@gmail.com>]
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? [not found] ` <172a41d97d383a8989ebd213bb4230a2df4d636d.camel@gmail.com> @ 2019-07-19 14:29 ` Alex Deucher 0 siblings, 0 replies; 34+ messages in thread From: Alex Deucher @ 2019-07-19 14:29 UTC (permalink / raw) To: Timur Kristóf Cc: michael.jamet, Michel Dänzer, Mika Westerberg, dri-devel On Thu, Jul 18, 2019 at 10:38 AM Timur Kristóf <timur.kristof@gmail.com> wrote: > > > > > > > > I took a look at amdgpu_device_get_pcie_info() and found that it > > > uses > > > pcie_bandwidth_available to determine the capabilities of the PCIe > > > port. However, pcie_bandwidth_available gives you only the current > > > bandwidth as set by the PCIe link status register, not the maximum > > > capability. > > > > > > I think something along these lines would fix it: > > > https://pastebin.com/LscEMKMc > > > > > > It seems to me that the PCIe capabilities are only used in a few > > > places > > > in the code, so this patch fixes pp_dpm_pcie. However it doesn't > > > affect > > > the actual performance. > > > > > > What do you think? > > > > I think we want the current bandwidth. The GPU can only control the > > speed of its local link. If there are upstream links that are slower > > than its local link, it doesn't make sense to run the local link at > > faster speeds because it will burn extra power it will just run into > > a > > bottleneck at the next link. In general, most systems negotiate the > > fastest link speed supported by both ends at power up. > > > > Alex > > Currently, if the GPU connected to a TB3 port, the driver thinks that > 2.5 GT/s is the best speed that it can use, even though the hardware > itself uses 8 GT/s. So what the driver thinks is inconsistent with what > the hardware does. This messes up pp_dpm_pcie. > > As far as I understand, PCIe bridge devices can change their link speed > in runtime based on how they are used or what power state they are in, > so it makes sense here to request the best speed they are capable of. I > might be wrong about that. I don't know of any bridges off hand that change their link speeds on demand. That said, I'm certainly not a PCI expert. Our GPUs for instance have a micro-controller on them which changes the speed on demand. Presumably other devices would need something similar. > > If you think this change is undesireable, then maybe it would be worth > to follow Mika's suggestion and add something along the lines of > dev->is_thunderbolt so that the correct available bandwidth could still > be determined. Ideally, it would be added to the core pci helpers so that each driver that uses them doesn't have to duplicate the same functionality. Alex > > Tim > _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-03 8:07 ` Michel Dänzer 2019-07-03 11:04 ` Timur Kristóf @ 2019-07-03 18:44 ` Marek Olšák 2019-07-05 9:27 ` Timur Kristóf 1 sibling, 1 reply; 34+ messages in thread From: Marek Olšák @ 2019-07-03 18:44 UTC (permalink / raw) To: Michel Dänzer Cc: michael.jamet, Mika Westerberg, dri-devel, Timur Kristóf [-- Attachment #1.1: Type: text/plain, Size: 3922 bytes --] You can run: AMD_DEBUG=testdmaperf glxgears It tests transfer sizes of up to 128 MB, and it tests ~60 slightly different methods of transfering data. Marek On Wed, Jul 3, 2019 at 4:07 AM Michel Dänzer <michel@daenzer.net> wrote: > On 2019-07-02 11:49 a.m., Timur Kristóf wrote: > > On Tue, 2019-07-02 at 10:09 +0200, Michel Dänzer wrote: > >> On 2019-07-01 6:01 p.m., Timur Kristóf wrote: > >>> On Mon, 2019-07-01 at 16:54 +0200, Michel Dänzer wrote: > >>>> On 2019-06-28 2:21 p.m., Timur Kristóf wrote: > >>>>> I haven't found a good way to measure the maximum PCIe > >>>>> throughput > >>>>> between the CPU and GPU, > >>>> > >>>> amdgpu.benchmark=3 > >>>> > >>>> on the kernel command line will measure throughput for various > >>>> transfer > >>>> sizes during driver initialization. > >>> > >>> Thanks, I will definitely try that. > >>> Is this the only way to do this, or is there a way to benchmark it > >>> after it already booted? > >> > >> The former. At least in theory, it's possible to unload the amdgpu > >> module while nothing is using it, then load it again. > > > > Okay, so I booted my system with amdgpu.benchmark=3 > > You can find the full dmesg log here: https://pastebin.com/zN9FYGw4 > > > > The result is between 1-5 Gbit / sec depending on the transfer size > > (the higher the better), which corresponds to neither the 8 Gbit / sec > > that the kernel thinks it is limited to, nor the 20 Gbit / sec which I > > measured earlier with pcie_bw. > > 5 Gbit/s throughput could be consistent with 8 Gbit/s theoretical > bandwidth, due to various overhead. > > > > Since pcie_bw only shows the maximum PCIe packet size (and not the > > actual size), could it be that it's so inaccurate that the 20 Gbit / > > sec is a fluke? > > Seems likely or at least plausible. > > > >>>>> but I did take a look at AMD's sysfs interface at > >>>>> /sys/class/drm/card1/device/pcie_bw which while running the > >>>>> bottlenecked > >>>>> game. The highest throughput I saw there was only 2.43 Gbit > >>>>> /sec. > >>>> > >>>> PCIe bandwidth generally isn't a bottleneck for games, since they > >>>> don't > >>>> constantly transfer large data volumes across PCIe, but store > >>>> them in > >>>> the GPU's local VRAM, which is connected at much higher > >>>> bandwidth. > >>> > >>> There are reasons why I think the problem is the bandwidth: > >>> 1. The same issues don't happen when the GPU is not used with a TB3 > >>> enclosure. > >>> 2. In case of radeonsi, the problem was mitigated once Marek's SDMA > >>> patch was merged, which hugely reduces the PCIe bandwidth use. > >>> 3. In less optimized cases (for example D9VK), the problem is still > >>> very noticable. > >> > >> However, since you saw as much as ~20 Gbit/s under different > >> circumstances, the 2.43 Gbit/s used by this game clearly isn't a hard > >> limit; there must be other limiting factors. > > > > There may be other factors, yes. I can't offer a good explanation on > > what exactly is happening, but it's pretty clear that amdgpu can't take > > full advantage of the TB3 link, so it seemed like a good idea to start > > investigating this first. > > Yeah, actually it would be consistent with ~16-32 KB granularity > transfers based on your measurements above, which is plausible. So > making sure that the driver doesn't artificially limit the PCIe > bandwidth might indeed help. > > OTOH this also indicates a similar potential for improvement by using > larger transfers in Mesa and/or the kernel. > > > -- > Earthling Michel Dänzer | https://www.amd.com > Libre software enthusiast | Mesa and X developer > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel [-- Attachment #1.2: Type: text/html, Size: 5371 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-03 18:44 ` Marek Olšák @ 2019-07-05 9:27 ` Timur Kristóf 2019-07-05 15:35 ` Marek Olšák 0 siblings, 1 reply; 34+ messages in thread From: Timur Kristóf @ 2019-07-05 9:27 UTC (permalink / raw) To: Marek Olšák, Michel Dänzer Cc: michael.jamet, Mika Westerberg, dri-devel On Wed, 2019-07-03 at 14:44 -0400, Marek Olšák wrote: > You can run: > AMD_DEBUG=testdmaperf glxgears > > It tests transfer sizes of up to 128 MB, and it tests ~60 slightly > different methods of transfering data. > > Marek Thanks Marek, I didn't know about that option. Tried it, here is the output: https://pastebin.com/raw/9SAAbbAA I'm not quite sure how to interpret the numbers, they are inconsistent with the results from both pcie_bw and amdgpu.benchmark, for example GTT->VRAM at a 128 KB is around 1400 MB/s (I assume that is megabytes / sec, right?). It is also weird that unlike amdgpu.benchmark, the larger than 128 KB transfers didn't actually get slightly slower. Michel, can you make sense of this? Best regards, Tim _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-05 9:27 ` Timur Kristóf @ 2019-07-05 15:35 ` Marek Olšák 2019-07-05 16:01 ` Timur Kristóf [not found] ` <8f0c2d7780430d40dd1e17a82484d236eae3f981.camel@gmail.com> 0 siblings, 2 replies; 34+ messages in thread From: Marek Olšák @ 2019-07-05 15:35 UTC (permalink / raw) To: Timur Kristóf Cc: michael.jamet, Michel Dänzer, Mika Westerberg, dri-devel [-- Attachment #1.1: Type: text/plain, Size: 806 bytes --] On Fri, Jul 5, 2019 at 5:27 AM Timur Kristóf <timur.kristof@gmail.com> wrote: > On Wed, 2019-07-03 at 14:44 -0400, Marek Olšák wrote: > > You can run: > > AMD_DEBUG=testdmaperf glxgears > > > > It tests transfer sizes of up to 128 MB, and it tests ~60 slightly > > different methods of transfering data. > > > > Marek > > > Thanks Marek, I didn't know about that option. > Tried it, here is the output: https://pastebin.com/raw/9SAAbbAA > > I'm not quite sure how to interpret the numbers, they are inconsistent > with the results from both pcie_bw and amdgpu.benchmark, for example > GTT->VRAM at a 128 KB is around 1400 MB/s (I assume that is megabytes / > sec, right?). > Based on the SDMA results, you have 2.4 GB/s. For 128KB, it's 2.2 GB/s for GTT->VRAM copies. Marek [-- Attachment #1.2: Type: text/html, Size: 1281 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-05 15:35 ` Marek Olšák @ 2019-07-05 16:01 ` Timur Kristóf [not found] ` <8f0c2d7780430d40dd1e17a82484d236eae3f981.camel@gmail.com> 1 sibling, 0 replies; 34+ messages in thread From: Timur Kristóf @ 2019-07-05 16:01 UTC (permalink / raw) To: maraeo; +Cc: michael.jamet, michel, mika.westerberg, dri-devel On Friday, 5 July 2019, Marek Olšák wrote: > On Fri, Jul 5, 2019 at 5:27 AM Timur Kristóf <timur.kristof@gmail.com> > wrote: > > > On Wed, 2019-07-03 at 14:44 -0400, Marek Olšák wrote: > > > You can run: > > > AMD_DEBUG=testdmaperf glxgears > > > > > > It tests transfer sizes of up to 128 MB, and it tests ~60 slightly > > > different methods of transfering data. > > > > > > Marek > > > > > > Thanks Marek, I didn't know about that option. > > Tried it, here is the output: https://pastebin.com/raw/9SAAbbAA > > > > I'm not quite sure how to interpret the numbers, they are inconsistent > > with the results from both pcie_bw and amdgpu.benchmark, for example > > GTT->VRAM at a 128 KB is around 1400 MB/s (I assume that is megabytes / > > sec, right?). > > > > Based on the SDMA results, you have 2.4 GB/s. For 128KB, it's 2.2 GB/s for > GTT->VRAM copies. > > Marek That's interesting, AFAIU that would be 17.6 Gbit/sec. But how can that be so much faster than the 5 Gbit/sec result from amdgpu.benchmark? _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <8f0c2d7780430d40dd1e17a82484d236eae3f981.camel@gmail.com>]
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? [not found] ` <8f0c2d7780430d40dd1e17a82484d236eae3f981.camel@gmail.com> @ 2019-07-18 10:29 ` Michel Dänzer 2019-07-22 9:39 ` Timur Kristóf 0 siblings, 1 reply; 34+ messages in thread From: Michel Dänzer @ 2019-07-18 10:29 UTC (permalink / raw) To: Timur Kristóf, Marek Olšák Cc: michael.jamet, Mika Westerberg, dri-devel On 2019-07-18 11:06 a.m., Timur Kristóf wrote: >>> Thanks Marek, I didn't know about that option. >>> Tried it, here is the output: https://pastebin.com/raw/9SAAbbAA >>> >>> I'm not quite sure how to interpret the numbers, they are >>> inconsistent >>> with the results from both pcie_bw and amdgpu.benchmark, for >>> example >>> GTT->VRAM at a 128 KB is around 1400 MB/s (I assume that is >>> megabytes / >>> sec, right?). >> >> Based on the SDMA results, you have 2.4 GB/s. For 128KB, it's 2.2 >> GB/s for GTT->VRAM copies. > > In the meantime I had a chat with Michel on IRC and he suggested that > maybe amdgpu.benchmark=3 gives lower results because it uses a less > than optimal way to do the benchmark. > > Looking at the results from the mesa benchmark a bit more closely, I > see that the SDMA can do: > VRAM->GTT: 3087 MB/s = 24 Gbit/sec > GTT->VRAM: 2433 MB/s = 19 Gbit/sec > > So on Polaris at least, the SDMA is the fastest, and the other transfer > methods can't match it. I also did the same test on Navi, where it's > different: all other transfer methods are much closer to the SDMA, but > the max speed is still around 20-24 Gbit / sec. > > I still have a few questions: > > 1. Why is the GTT->VRAM copy so much slower than the VRAM->GTT copy? > > 2. Why is the bus limited to 24 Gbit/sec? I would expect the > Thunderbolt port to give me at least 32 Gbit/sec for PCIe traffic. That's unrealistic I'm afraid. As I said on IRC, from the GPU POV there's an 8 GT/s x4 PCIe link, so ~29.8 Gbit/s (= 32 billion bit/s; I missed this nuance on IRC) is the theoretical raw bandwidth. However, in practice that's not achievable due to various overhead[0], and I'm only seeing up to ~90% utilization of the theoretical bandwidth with a "normal" x16 link as well. I wouldn't expect higher utilization without seeing some evidence to suggest it's possible. [0] According to https://www.tested.com/tech/457440-theoretical-vs-actual-bandwidth-pci-express-and-thunderbolt/ , PCIe 3.0 uses 1.54% of the raw bandwidth for its internal encoding. Also keep in mind all CPU<->GPU communication has to go through the PCIe link, e.g. for programming the transfers, in-band signalling from the GPU to the PCIe port where the data is being transferred to/from, ... -- Earthling Michel Dänzer | https://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-18 10:29 ` Michel Dänzer @ 2019-07-22 9:39 ` Timur Kristóf 2019-07-23 8:11 ` Michel Dänzer 0 siblings, 1 reply; 34+ messages in thread From: Timur Kristóf @ 2019-07-22 9:39 UTC (permalink / raw) To: Michel Dänzer, Marek Olšák Cc: michael.jamet, Mika Westerberg, dri-devel > > > > 1. Why is the GTT->VRAM copy so much slower than the VRAM->GTT > > copy? > > > > 2. Why is the bus limited to 24 Gbit/sec? I would expect the > > Thunderbolt port to give me at least 32 Gbit/sec for PCIe traffic. > > That's unrealistic I'm afraid. As I said on IRC, from the GPU POV > there's an 8 GT/s x4 PCIe link, so ~29.8 Gbit/s (= 32 billion bit/s; > I > missed this nuance on IRC) is the theoretical raw bandwidth. However, > in > practice that's not achievable due to various overhead[0], and I'm > only > seeing up to ~90% utilization of the theoretical bandwidth with a > "normal" x16 link as well. I wouldn't expect higher utilization > without > seeing some evidence to suggest it's possible. > > > [0] According to > https://www.tested.com/tech/457440-theoretical-vs-actual-bandwidth-pci-express-and-thunderbolt/ > , PCIe 3.0 uses 1.54% of the raw bandwidth for its internal encoding. > Also keep in mind all CPU<->GPU communication has to go through the > PCIe > link, e.g. for programming the transfers, in-band signalling from the > GPU to the PCIe port where the data is being transferred to/from, ... Good point, I used 1024 and not 1000. My mistake. There is something else: In the same benchmark there is a "fill->GTT ,SDMA" row which has a 4035 MB/s number. If that traffic goes through the TB3 interface then we just found our 32 Gbit/sec. Now the question is, if I understand this correctly and the SDMA can indeed do 32 Gbit/sec for "fill->GTT", then why can't it do the same with other kinds of transfers? Not sure if there is a good answer to that question though. Also I still don't fully understand why GTT->VRAM is slower than VRAM- >GTT, when the bandwidth is clearly available. Best regards, Tim Side note: with regards to that 1.5% figure, the TB3 tech brief[0] explicitly mentions this and says that it isn't carried over: "the underlying protocol uses some data to provide encoding overhead which is not carried over the Thunderbolt 3 link reducing the consumed bandwidth by roughly 20 percent (DisplayPort) or 1.5 percent (PCI Express Gen 3)" [0] https://thunderbolttechnology.net/sites/default/files/Thunderbolt3_TechBrief_FINAL.pdf _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? 2019-07-22 9:39 ` Timur Kristóf @ 2019-07-23 8:11 ` Michel Dänzer 0 siblings, 0 replies; 34+ messages in thread From: Michel Dänzer @ 2019-07-23 8:11 UTC (permalink / raw) To: Timur Kristóf, Marek Olšák Cc: michael.jamet, Mika Westerberg, dri-devel On 2019-07-22 11:39 a.m., Timur Kristóf wrote: >>> >>> 1. Why is the GTT->VRAM copy so much slower than the VRAM->GTT >>> copy? >>> >>> 2. Why is the bus limited to 24 Gbit/sec? I would expect the >>> Thunderbolt port to give me at least 32 Gbit/sec for PCIe traffic. >> >> That's unrealistic I'm afraid. As I said on IRC, from the GPU POV >> there's an 8 GT/s x4 PCIe link, so ~29.8 Gbit/s (= 32 billion bit/s; >> I >> missed this nuance on IRC) is the theoretical raw bandwidth. However, >> in >> practice that's not achievable due to various overhead[0], and I'm >> only >> seeing up to ~90% utilization of the theoretical bandwidth with a >> "normal" x16 link as well. I wouldn't expect higher utilization >> without >> seeing some evidence to suggest it's possible. >> >> >> [0] According to >> https://www.tested.com/tech/457440-theoretical-vs-actual-bandwidth-pci-express-and-thunderbolt/ >> , PCIe 3.0 uses 1.54% of the raw bandwidth for its internal encoding. >> Also keep in mind all CPU<->GPU communication has to go through the >> PCIe >> link, e.g. for programming the transfers, in-band signalling from the >> GPU to the PCIe port where the data is being transferred to/from, ... > > Good point, I used 1024 and not 1000. My mistake. > > There is something else: > In the same benchmark there is a "fill->GTT ,SDMA" row which has a > 4035 MB/s number. If that traffic goes through the TB3 interface then > we just found our 32 Gbit/sec. The GPU is only connected to the host via PCIe, there's nowhere else it could go through. > Now the question is, if I understand this correctly and the SDMA can > indeed do 32 Gbit/sec for "fill->GTT", then why can't it do the same > with other kinds of transfers? Not sure if there is a good answer to > that question though. > > Also I still don't fully understand why GTT->VRAM is slower than VRAM- >> GTT, when the bandwidth is clearly available. While those are interesting questions at some level, I don't think they will get us closer to solving your problem. It comes down to identifying inefficient transfers across PCIe and optimizing them. > Side note: with regards to that 1.5% figure, the TB3 tech brief[0] > explicitly mentions this and says that it isn't carried over: "the > underlying protocol uses some data to provide encoding overhead which > is not carried over the Thunderbolt 3 link reducing the consumed > bandwidth by roughly 20 percent (DisplayPort) or 1.5 percent (PCI > Express Gen 3)" That just means the internal TB3 link only carries the payload data from the PCIe link, not the 1.5% of bits used for the PCIe encoding. TB3 cannot magically make the PCIe link itself work without the encoding. -- Earthling Michel Dänzer | https://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2019-07-23 8:11 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-06-28 10:23 Why is Thunderbolt 3 limited to 2.5 GT/s on Linux? Timur Kristóf 2019-06-28 10:32 ` Mika Westerberg 2019-06-28 11:08 ` Timur Kristóf 2019-06-28 11:34 ` Mika Westerberg 2019-06-28 12:21 ` Timur Kristóf 2019-06-28 12:53 ` Mika Westerberg 2019-06-28 13:33 ` Timur Kristóf 2019-06-28 14:14 ` Mika Westerberg 2019-06-28 14:53 ` Timur Kristóf 2019-07-01 11:44 ` Mika Westerberg 2019-07-01 14:25 ` Timur Kristóf 2019-07-01 14:28 ` Alex Deucher 2019-07-01 14:38 ` Timur Kristóf 2019-07-01 14:46 ` Alex Deucher 2019-07-01 15:10 ` Mika Westerberg 2019-07-01 14:54 ` Michel Dänzer 2019-07-01 16:01 ` Timur Kristóf 2019-07-02 8:09 ` Michel Dänzer 2019-07-02 9:49 ` Timur Kristóf 2019-07-03 8:07 ` Michel Dänzer 2019-07-03 11:04 ` Timur Kristóf 2019-07-04 8:26 ` Michel Dänzer 2019-07-05 9:17 ` Timur Kristóf 2019-07-05 13:36 ` Alex Deucher 2019-07-18 9:11 ` Timur Kristóf 2019-07-18 13:50 ` Alex Deucher [not found] ` <172a41d97d383a8989ebd213bb4230a2df4d636d.camel@gmail.com> 2019-07-19 14:29 ` Alex Deucher 2019-07-03 18:44 ` Marek Olšák 2019-07-05 9:27 ` Timur Kristóf 2019-07-05 15:35 ` Marek Olšák 2019-07-05 16:01 ` Timur Kristóf [not found] ` <8f0c2d7780430d40dd1e17a82484d236eae3f981.camel@gmail.com> 2019-07-18 10:29 ` Michel Dänzer 2019-07-22 9:39 ` Timur Kristóf 2019-07-23 8:11 ` Michel Dänzer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.