All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios
@ 2024-03-18  6:52 Kurt Kartaltepe
  2024-03-18 13:36 ` Alex Deucher
  0 siblings, 1 reply; 12+ messages in thread
From: Kurt Kartaltepe @ 2024-03-18  6:52 UTC (permalink / raw)
  To: amd-gfx; +Cc: kkartaltepe

These checks prevent using amdgpu with the pcie=assign-busses parameter
which will re-address devices from their acpi values.

Signed-off-by: Kurt Kartaltepe <kkartaltepe@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
index 618e469e3622..932ce13ad232 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
@@ -386,9 +386,6 @@ static bool amdgpu_acpi_vfct_bios(struct amdgpu_device *adev)
 		}
 
 		if (vhdr->ImageLength &&
-		    vhdr->PCIBus == adev->pdev->bus->number &&
-		    vhdr->PCIDevice == PCI_SLOT(adev->pdev->devfn) &&
-		    vhdr->PCIFunction == PCI_FUNC(adev->pdev->devfn) &&
 		    vhdr->VendorID == adev->pdev->vendor &&
 		    vhdr->DeviceID == adev->pdev->device) {
 			adev->bios = kmemdup(&vbios->VbiosContent,
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios
  2024-03-18  6:52 [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios Kurt Kartaltepe
@ 2024-03-18 13:36 ` Alex Deucher
  2024-03-18 14:19   ` Kurt Kartaltepe
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Deucher @ 2024-03-18 13:36 UTC (permalink / raw)
  To: Kurt Kartaltepe; +Cc: amd-gfx

On Mon, Mar 18, 2024 at 4:47 AM Kurt Kartaltepe <kkartaltepe@gmail.com> wrote:
>
> These checks prevent using amdgpu with the pcie=assign-busses parameter
> which will re-address devices from their acpi values.
>
> Signed-off-by: Kurt Kartaltepe <kkartaltepe@gmail.com>

This will likely break multi-GPU functionality.  The BDF values are
how the sbios/driver differentiates between the VFCT images.  If you
have multiple GPUs in the system, the driver won't be able to figure
out which one goes with which GPU an you may end up assigning the
wrong image to the wrong device.

Alex


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
> index 618e469e3622..932ce13ad232 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
> @@ -386,9 +386,6 @@ static bool amdgpu_acpi_vfct_bios(struct amdgpu_device *adev)
>                 }
>
>                 if (vhdr->ImageLength &&
> -                   vhdr->PCIBus == adev->pdev->bus->number &&
> -                   vhdr->PCIDevice == PCI_SLOT(adev->pdev->devfn) &&
> -                   vhdr->PCIFunction == PCI_FUNC(adev->pdev->devfn) &&
>                     vhdr->VendorID == adev->pdev->vendor &&
>                     vhdr->DeviceID == adev->pdev->device) {
>                         adev->bios = kmemdup(&vbios->VbiosContent,
> --
> 2.44.0
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios
  2024-03-18 13:36 ` Alex Deucher
@ 2024-03-18 14:19   ` Kurt Kartaltepe
  2024-03-18 15:42     ` Alex Deucher
  0 siblings, 1 reply; 12+ messages in thread
From: Kurt Kartaltepe @ 2024-03-18 14:19 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx

On Mon, Mar 18, 2024 at 6:37 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Mon, Mar 18, 2024 at 4:47 AM Kurt Kartaltepe <kkartaltepe@gmail.com> wrote:
> >
> > These checks prevent using amdgpu with the pcie=assign-busses parameter
> > which will re-address devices from their acpi values.
> >
> > Signed-off-by: Kurt Kartaltepe <kkartaltepe@gmail.com>
>
> This will likely break multi-GPU functionality.  The BDF values are
> how the sbios/driver differentiates between the VFCT images.  If you
> have multiple GPUs in the system, the driver won't be able to figure
> out which one goes with which GPU an you may end up assigning the
> wrong image to the wrong device.
>
> Alex

The vendor and device portions must be correct in the existing
kernels, so device type differentiation should already work without
BDF values.

So does that mean the concern is images are different for devices with
the same vendor:device pairs? There are sites out there dedicated to
dumping AMD's video roms which seem to suggest all discrete devices
would be fine loading the same rom. Is there another platform you are
thinking of where devices with the same vendor:device values would
need different images?

(Sorry this is my first patch to the mailing list and I am replying
with gmail, I hope it doesnt break things).

--Kurt Kartaltepe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios
  2024-03-18 14:19   ` Kurt Kartaltepe
@ 2024-03-18 15:42     ` Alex Deucher
  2024-03-18 16:06       ` Kurt Kartaltepe
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Deucher @ 2024-03-18 15:42 UTC (permalink / raw)
  To: Kurt Kartaltepe; +Cc: amd-gfx

On Mon, Mar 18, 2024 at 10:19 AM Kurt Kartaltepe <kkartaltepe@gmail.com> wrote:
>
> On Mon, Mar 18, 2024 at 6:37 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> >
> > On Mon, Mar 18, 2024 at 4:47 AM Kurt Kartaltepe <kkartaltepe@gmail.com> wrote:
> > >
> > > These checks prevent using amdgpu with the pcie=assign-busses parameter
> > > which will re-address devices from their acpi values.
> > >
> > > Signed-off-by: Kurt Kartaltepe <kkartaltepe@gmail.com>
> >
> > This will likely break multi-GPU functionality.  The BDF values are
> > how the sbios/driver differentiates between the VFCT images.  If you
> > have multiple GPUs in the system, the driver won't be able to figure
> > out which one goes with which GPU an you may end up assigning the
> > wrong image to the wrong device.
> >
> > Alex
>
> The vendor and device portions must be correct in the existing
> kernels, so device type differentiation should already work without
> BDF values.
>
> So does that mean the concern is images are different for devices with
> the same vendor:device pairs? There are sites out there dedicated to
> dumping AMD's video roms which seem to suggest all discrete devices
> would be fine loading the same rom. Is there another platform you are
> thinking of where devices with the same vendor:device values would
> need different images?

That is incorrect.  The vbios images are board specific.  Using the
wrong image can cause a lot of problems.  The vbios exists to handle
board specific design variations (e.g., the number and type of display
connectors, the i2c/aux channel mappings, board specific clock and
voltage settings, etc.).  The PCI DID just indicates the chip used on
the board.  The actual board design varies with each AIB vendor (e.g.,
Sapphire and XFX both make 7900XTX boards, but they can have very
different configurations.

Alex

>
> (Sorry this is my first patch to the mailing list and I am replying
> with gmail, I hope it doesnt break things).
>
> --Kurt Kartaltepe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios
  2024-03-18 15:42     ` Alex Deucher
@ 2024-03-18 16:06       ` Kurt Kartaltepe
  2024-03-18 19:52         ` Alex Deucher
  0 siblings, 1 reply; 12+ messages in thread
From: Kurt Kartaltepe @ 2024-03-18 16:06 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx

On Mon, Mar 18, 2024 at 8:42 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Mon, Mar 18, 2024 at 10:19 AM Kurt Kartaltepe <kkartaltepe@gmail.com> wrote:
> >
> > On Mon, Mar 18, 2024 at 6:37 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> > >
> > > On Mon, Mar 18, 2024 at 4:47 AM Kurt Kartaltepe <kkartaltepe@gmail.com> wrote:
> > > >
> > > > These checks prevent using amdgpu with the pcie=assign-busses parameter
> > > > which will re-address devices from their acpi values.
> > > >
> > > > Signed-off-by: Kurt Kartaltepe <kkartaltepe@gmail.com>
> > >
> > > This will likely break multi-GPU functionality.  The BDF values are
> > > how the sbios/driver differentiates between the VFCT images.  If you
> > > have multiple GPUs in the system, the driver won't be able to figure
> > > out which one goes with which GPU an you may end up assigning the
> > > wrong image to the wrong device.
> > >
> > > Alex
> >
> > The vendor and device portions must be correct in the existing
> > kernels, so device type differentiation should already work without
> > BDF values.
> >
> > So does that mean the concern is images are different for devices with
> > the same vendor:device pairs? There are sites out there dedicated to
> > dumping AMD's video roms which seem to suggest all discrete devices
> > would be fine loading the same rom. Is there another platform you are
> > thinking of where devices with the same vendor:device values would
> > need different images?
>
> That is incorrect.  The vbios images are board specific.  Using the
> wrong image can cause a lot of problems.  The vbios exists to handle
> board specific design variations (e.g., the number and type of display
> connectors, the i2c/aux channel mappings, board specific clock and
> voltage settings, etc.).  The PCI DID just indicates the chip used on
> the board.  The actual board design varies with each AIB vendor (e.g.,
> Sapphire and XFX both make 7900XTX boards, but they can have very
> different configurations.

Thanks for the explanation, that makes sense.

Is my understanding correct that IGPUs (my case) simply won't have
vbios available in any other mechanism. If so perhaps this isnt
feasible in amdgpu as the BDF information is lost in reassignment.

--Kurt Kartaltepe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios
  2024-03-18 16:06       ` Kurt Kartaltepe
@ 2024-03-18 19:52         ` Alex Deucher
  2024-03-18 19:57           ` Alex Deucher
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Deucher @ 2024-03-18 19:52 UTC (permalink / raw)
  To: Kurt Kartaltepe; +Cc: amd-gfx

On Mon, Mar 18, 2024 at 12:06 PM Kurt Kartaltepe <kkartaltepe@gmail.com> wrote:
>
> On Mon, Mar 18, 2024 at 8:42 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> >
> > On Mon, Mar 18, 2024 at 10:19 AM Kurt Kartaltepe <kkartaltepe@gmail.com> wrote:
> > >
> > > On Mon, Mar 18, 2024 at 6:37 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> > > >
> > > > On Mon, Mar 18, 2024 at 4:47 AM Kurt Kartaltepe <kkartaltepe@gmail.com> wrote:
> > > > >
> > > > > These checks prevent using amdgpu with the pcie=assign-busses parameter
> > > > > which will re-address devices from their acpi values.
> > > > >
> > > > > Signed-off-by: Kurt Kartaltepe <kkartaltepe@gmail.com>
> > > >
> > > > This will likely break multi-GPU functionality.  The BDF values are
> > > > how the sbios/driver differentiates between the VFCT images.  If you
> > > > have multiple GPUs in the system, the driver won't be able to figure
> > > > out which one goes with which GPU an you may end up assigning the
> > > > wrong image to the wrong device.
> > > >
> > > > Alex
> > >
> > > The vendor and device portions must be correct in the existing
> > > kernels, so device type differentiation should already work without
> > > BDF values.
> > >
> > > So does that mean the concern is images are different for devices with
> > > the same vendor:device pairs? There are sites out there dedicated to
> > > dumping AMD's video roms which seem to suggest all discrete devices
> > > would be fine loading the same rom. Is there another platform you are
> > > thinking of where devices with the same vendor:device values would
> > > need different images?
> >
> > That is incorrect.  The vbios images are board specific.  Using the
> > wrong image can cause a lot of problems.  The vbios exists to handle
> > board specific design variations (e.g., the number and type of display
> > connectors, the i2c/aux channel mappings, board specific clock and
> > voltage settings, etc.).  The PCI DID just indicates the chip used on
> > the board.  The actual board design varies with each AIB vendor (e.g.,
> > Sapphire and XFX both make 7900XTX boards, but they can have very
> > different configurations.
>
> Thanks for the explanation, that makes sense.
>
> Is my understanding correct that IGPUs (my case) simply won't have
> vbios available in any other mechanism. If so perhaps this isnt
> feasible in amdgpu as the BDF information is lost in reassignment.

Depends on the platform, but recent ones use VFCT.  That said, there
should only ever be one IGPU in the system so I think we could just
rely on the VID and DID for APUs in this case and check everything for
dGPUs.

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios
  2024-03-18 19:52         ` Alex Deucher
@ 2024-03-18 19:57           ` Alex Deucher
  2024-03-19  1:55             ` Kurt Kartaltepe
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Deucher @ 2024-03-18 19:57 UTC (permalink / raw)
  To: Kurt Kartaltepe; +Cc: amd-gfx

On Mon, Mar 18, 2024 at 3:52 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Mon, Mar 18, 2024 at 12:06 PM Kurt Kartaltepe <kkartaltepe@gmail.com> wrote:
> >
> > On Mon, Mar 18, 2024 at 8:42 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> > >
> > > On Mon, Mar 18, 2024 at 10:19 AM Kurt Kartaltepe <kkartaltepe@gmail.com> wrote:
> > > >
> > > > On Mon, Mar 18, 2024 at 6:37 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> > > > >
> > > > > On Mon, Mar 18, 2024 at 4:47 AM Kurt Kartaltepe <kkartaltepe@gmail.com> wrote:
> > > > > >
> > > > > > These checks prevent using amdgpu with the pcie=assign-busses parameter
> > > > > > which will re-address devices from their acpi values.
> > > > > >
> > > > > > Signed-off-by: Kurt Kartaltepe <kkartaltepe@gmail.com>
> > > > >
> > > > > This will likely break multi-GPU functionality.  The BDF values are
> > > > > how the sbios/driver differentiates between the VFCT images.  If you
> > > > > have multiple GPUs in the system, the driver won't be able to figure
> > > > > out which one goes with which GPU an you may end up assigning the
> > > > > wrong image to the wrong device.
> > > > >
> > > > > Alex
> > > >
> > > > The vendor and device portions must be correct in the existing
> > > > kernels, so device type differentiation should already work without
> > > > BDF values.
> > > >
> > > > So does that mean the concern is images are different for devices with
> > > > the same vendor:device pairs? There are sites out there dedicated to
> > > > dumping AMD's video roms which seem to suggest all discrete devices
> > > > would be fine loading the same rom. Is there another platform you are
> > > > thinking of where devices with the same vendor:device values would
> > > > need different images?
> > >
> > > That is incorrect.  The vbios images are board specific.  Using the
> > > wrong image can cause a lot of problems.  The vbios exists to handle
> > > board specific design variations (e.g., the number and type of display
> > > connectors, the i2c/aux channel mappings, board specific clock and
> > > voltage settings, etc.).  The PCI DID just indicates the chip used on
> > > the board.  The actual board design varies with each AIB vendor (e.g.,
> > > Sapphire and XFX both make 7900XTX boards, but they can have very
> > > different configurations.
> >
> > Thanks for the explanation, that makes sense.
> >
> > Is my understanding correct that IGPUs (my case) simply won't have
> > vbios available in any other mechanism. If so perhaps this isnt
> > feasible in amdgpu as the BDF information is lost in reassignment.
>
> Depends on the platform, but recent ones use VFCT.  That said, there
> should only ever be one IGPU in the system so I think we could just
> rely on the VID and DID for APUs in this case and check everything for
> dGPUs.

Is there a reason why you need this option?  Even beyond this, I could
envision other problems related to APUs and ACPI if these changed.

Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios
  2024-03-18 19:57           ` Alex Deucher
@ 2024-03-19  1:55             ` Kurt Kartaltepe
  2024-03-19  9:54               ` Christian König
  0 siblings, 1 reply; 12+ messages in thread
From: Kurt Kartaltepe @ 2024-03-19  1:55 UTC (permalink / raw)
  To: Alex Deucher; +Cc: amd-gfx

On Mon, Mar 18, 2024 at 12:57 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Mon, Mar 18, 2024 at 3:52 PM Alex Deucher <alexdeucher@gmail.com> wrote:
> >
...
> > Depends on the platform, but recent ones use VFCT.  That said, there
> > should only ever be one IGPU in the system so I think we could just
> > rely on the VID and DID for APUs in this case and check everything for
> > dGPUs.
>
> Is there a reason why you need this option?  Even beyond this, I could
> envision other problems related to APUs and ACPI if these changed.
>
> Alex

So there are multiple factors in play. I am trying to make use of the
lovely usb4/tb3 controllers on the 7940HS with the reportedly Intel
Tamales Module 2 pci/pci bridge over the usb4 interface. This provides
a handy way to expand the pcie bus but configuring ACPI and pcie
topology isn't generally an option on consumer BIOS (unless you want
to enlighten me). This leaves us in the situation where the bios can
enumerate devices poorly resulting in inaccessible devices due to
address conflicts. To resolve address conflicts the only option I'm
aware of is pci=assign-busses, maybe this could also be configured at
runtime but assign-busses seemed nice in some ways.

I havnt experienced any issues with the APU (graphics, hardware
encoders/decoders) but I do think assign-busses might be renumbering
again after suspend/resume/pci rescans but I need to debug further,
maybe suspend/resume are just broken when ACPI addresses are wrong.
Obviously the graphics user space (compositors, mesa might be working
as expected) dont handle the device switching addresses while in use,
for amdgpu kernel side I haven't inspected deeply yet.

I'm not sure if this is the right approach to solving the problem, and
given your input i'm considering it may be better, though not
upstreamable, to implement renumbering only for specified devices like
this pci bridge or investigate runtime management of the pci bus
addresses. The current assign-busses implementation is quite the big
hammer admittedly.

--Kurt Kartaltepe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios
  2024-03-19  1:55             ` Kurt Kartaltepe
@ 2024-03-19  9:54               ` Christian König
  2024-03-19 15:04                 ` Kurt Kartaltepe
  0 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2024-03-19  9:54 UTC (permalink / raw)
  To: Kurt Kartaltepe, Alex Deucher; +Cc: amd-gfx

Am 19.03.24 um 02:55 schrieb Kurt Kartaltepe:
> On Mon, Mar 18, 2024 at 12:57 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>> On Mon, Mar 18, 2024 at 3:52 PM Alex Deucher <alexdeucher@gmail.com> wrote:
> ...
>>> Depends on the platform, but recent ones use VFCT.  That said, there
>>> should only ever be one IGPU in the system so I think we could just
>>> rely on the VID and DID for APUs in this case and check everything for
>>> dGPUs.
>> Is there a reason why you need this option?  Even beyond this, I could
>> envision other problems related to APUs and ACPI if these changed.
>>
>> Alex
> So there are multiple factors in play. I am trying to make use of the
> lovely usb4/tb3 controllers on the 7940HS with the reportedly Intel
> Tamales Module 2 pci/pci bridge over the usb4 interface. This provides
> a handy way to expand the pcie bus but configuring ACPI and pcie
> topology isn't generally an option on consumer BIOS (unless you want
> to enlighten me). This leaves us in the situation where the bios can
> enumerate devices poorly resulting in inaccessible devices due to
> address conflicts. To resolve address conflicts the only option I'm
> aware of is pci=assign-busses, maybe this could also be configured at
> runtime but assign-busses seemed nice in some ways.

Well what problems do you run into? The ACPI and BIOS assignments 
usually work much better than whatever the Linux PCI subsystem comes up 
with.

The PCI subsystem in the Linux kernel for example can't handle back to 
back resources behind multiple downstream bridges.

So when the BIOS fails to assign something it's extremely unlikely that 
the Linux kernel will do the right thing either.

Regards,
Christian.

>
> I havnt experienced any issues with the APU (graphics, hardware
> encoders/decoders) but I do think assign-busses might be renumbering
> again after suspend/resume/pci rescans but I need to debug further,
> maybe suspend/resume are just broken when ACPI addresses are wrong.
> Obviously the graphics user space (compositors, mesa might be working
> as expected) dont handle the device switching addresses while in use,
> for amdgpu kernel side I haven't inspected deeply yet.
>
> I'm not sure if this is the right approach to solving the problem, and
> given your input i'm considering it may be better, though not
> upstreamable, to implement renumbering only for specified devices like
> this pci bridge or investigate runtime management of the pci bus
> addresses. The current assign-busses implementation is quite the big
> hammer admittedly.
>
> --Kurt Kartaltepe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios
  2024-03-19  9:54               ` Christian König
@ 2024-03-19 15:04                 ` Kurt Kartaltepe
  2024-03-20 13:31                   ` Christian König
  0 siblings, 1 reply; 12+ messages in thread
From: Kurt Kartaltepe @ 2024-03-19 15:04 UTC (permalink / raw)
  To: Christian König; +Cc: Alex Deucher, amd-gfx

On Tue, Mar 19, 2024 at 2:54 AM Christian König
<christian.koenig@amd.com> wrote:
>
>
> Well what problems do you run into? The ACPI and BIOS assignments
> usually work much better than whatever the Linux PCI subsystem comes up
> with.

Perhaps its easier to show the lspci output for the BIOS assignment
and we can agree it's far from helpful

           +-04.1-[64-c3]----00.0-[65-68]--+-01.0-[66]----00.0-[67]----00.0
 Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge
DD 2018]
           |                               +-02.0-[67]--
           |                               \-04.0-[68]--

In this case the bios has assigned the upstream port 65-68, for its 3
downstreams 66,67,68, and then assigned the upstream port of the
device's own bridge to 67.

In this case not only did BIOS produce an invalid topology but it also
does not provide any space at the first upstream or downstream ports
which the current PCI implementation would require to assign bus
numbers if I understand it correctly.

>
> The PCI subsystem in the Linux kernel for example can't handle back to
> back resources behind multiple downstream bridges.
>
> So when the BIOS fails to assign something it's extremely unlikely that
> the Linux kernel will do the right thing either.

I'm not sure this is still the case, the PCI subsystem with realloc
(and assign-busses for x86) deals with enumerating this topology which
reports multiple bridges just fine. The same configuration as above
produces this bus numbering (with hpbussize=20)

           +-04.1-[24-66]----00.0-[25-66]--+-01.0-[26-45]----00.0-[27-29]--+-01.0-[28]----00.0
 Intel Corporation DG2 [Arc A750]
           |                               |
    \-04.0-[29]----00.0  Intel Corporation DG2 Audio Controller
           |                               +-02.0-[46]----00.0  Intel
Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge DD 2018]
           |                               \-04.0-[47-66]--

The Linux kernel doesnt do the right thing without these features, and
these are not the default. So you may be right that by default it does
not recover from the situation of well.


Given the bus allocation at the root port I can imagine a more
aggressive than default but less aggressive than `assign-busses`
reallocation scheme could deal with both preserving root allocations
like the APU and renumbering things behind upstream ports. That might
be a better approach than renumbering even the root bus devices.

>
> Regards,
> Christian.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios
  2024-03-19 15:04                 ` Kurt Kartaltepe
@ 2024-03-20 13:31                   ` Christian König
  2024-03-20 14:24                     ` Kurt Kartaltepe
  0 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2024-03-20 13:31 UTC (permalink / raw)
  To: Kurt Kartaltepe, Christian König; +Cc: Alex Deucher, amd-gfx

[-- Attachment #1: Type: text/plain, Size: 3838 bytes --]

Am 19.03.24 um 16:04 schrieb Kurt Kartaltepe:
> On Tue, Mar 19, 2024 at 2:54 AM Christian König
> <christian.koenig@amd.com>  wrote:
>>
>> Well what problems do you run into? The ACPI and BIOS assignments
>> usually work much better than whatever the Linux PCI subsystem comes up
>> with.
> Perhaps its easier to show the lspci output for the BIOS assignment
> and we can agree it's far from helpful
>
>             +-04.1-[64-c3]----00.0-[65-68]--+-01.0-[66]----00.0-[67]----00.0
>   Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge
> DD 2018]
>             |                               +-02.0-[67]--
>             |                               \-04.0-[68]--
>
> In this case the bios has assigned the upstream port 65-68, for its 3
> downstreams 66,67,68, and then assigned the upstream port of the
> device's own bridge to 67.
>
> In this case not only did BIOS produce an invalid topology but it also
> does not provide any space at the first upstream or downstream ports
> which the current PCI implementation would require to assign bus
> numbers if I understand it correctly.

Can you provide the full output of lspci -vvvv. As far as I can see that 
doesn't looks so invalid to me.

>> The PCI subsystem in the Linux kernel for example can't handle back to
>> back resources behind multiple downstream bridges.
>>
>> So when the BIOS fails to assign something it's extremely unlikely that
>> the Linux kernel will do the right thing either.
> I'm not sure this is still the case, the PCI subsystem with realloc
> (and assign-busses for x86) deals with enumerating this topology which
> reports multiple bridges just fine.

Well that is just a very very old workaround for a buggy BIOS on 20 year 
old laptops. The last reference I could find for hardware which actually 
needed it is this:

commit 8c4b2cf9af9b4ecc29d4f0ec4ecc8e94dc4432d7
Author: Bernhard Kaindl <bk@suse.de>
Date:   Sat Feb 18 01:36:55 2006 -0800

     [PATCH] PCI: PCI/Cardbus cards hidden, needs pci=assign-busses to fix


So as far as I know nobody had to use that in ages and I wouldn't expect 
that this option actually works correctly on any modern hardware.

Especially not anything PCIe based since it messes up the ACPI to PCIe 
device mappings. That amdgpu doesn't work is just the tip of the iceberg 
here.

>   The same configuration as above
> produces this bus numbering (with hpbussize=20)
>
>             +-04.1-[24-66]----00.0-[25-66]--+-01.0-[26-45]----00.0-[27-29]--+-01.0-[28]----00.0
>   Intel Corporation DG2 [Arc A750]
>             |                               |
>      \-04.0-[29]----00.0  Intel Corporation DG2 Audio Controller
>             |                               +-02.0-[46]----00.0  Intel
> Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge DD 2018]
>             |                               \-04.0-[47-66]--
>
> The Linux kernel doesnt do the right thing without these features, and
> these are not the default. So you may be right that by default it does
> not recover from the situation of well.
>
>
> Given the bus allocation at the root port I can imagine a more
> aggressive than default but less aggressive than `assign-busses`
> reallocation scheme could deal with both preserving root allocations
> like the APU and renumbering things behind upstream ports. That might
> be a better approach than renumbering even the root bus devices.

The bus assignment code in the PCI subsystem is made to support hotplug, 
not completely re-number the root hubs from scratch. That is just a hack 
somebody came up with two decades ago to get some Cardbus slots in 
laptops working.

I'm not sure yet what's going wrong with the Thunderbold controller, but 
completely re-assigning bus numbers is certainly the wrong approach.

Regards,
Christian.

>
>> Regards,
>> Christian.

[-- Attachment #2: Type: text/html, Size: 5359 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios
  2024-03-20 13:31                   ` Christian König
@ 2024-03-20 14:24                     ` Kurt Kartaltepe
  0 siblings, 0 replies; 12+ messages in thread
From: Kurt Kartaltepe @ 2024-03-20 14:24 UTC (permalink / raw)
  To: Christian König; +Cc: Christian König, Alex Deucher, amd-gfx

On Wed, Mar 20, 2024 at 6:31 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:

> Can you provide the full output of lspci -vvvv. As far as I can see that doesn't looks so invalid to me.

I've added the relevant pci probing debug output without assign-busses
and the lspci -vvvv for a boot with all devices visible.
https://gist.github.com/kkartaltepe/2f01f33c7e7af33cf0d28678e91f50fb

> Well that is just a very very old workaround for a buggy BIOS on 20 year old laptops. The last reference I could find for hardware which actually needed it is this:
>
> commit 8c4b2cf9af9b4ecc29d4f0ec4ecc8e94dc4432d7
> Author: Bernhard Kaindl <bk@suse.de>
> Date:   Sat Feb 18 01:36:55 2006 -0800
>
>     [PATCH] PCI: PCI/Cardbus cards hidden, needs pci=assign-busses to fix
>
>
> So as far as I know nobody had to use that in ages and I wouldn't expect that this option actually works correctly on any modern hardware.
>
> Especially not anything PCIe based since it messes up the ACPI to PCIe device mappings. That amdgpu doesn't work is just the tip of the iceberg here.
>
> The bus assignment code in the PCI subsystem is made to support hotplug, not completely re-number the root hubs from scratch. That is just a hack somebody came up with two decades ago to get some Cardbus slots in laptops working.
>
> I'm not sure yet what's going wrong with the Thunderbold controller, but completely re-assigning bus numbers is certainly the wrong approach.

I was referring to the work outlined in
https://ostconf.com/system/attachments/files/000/001/698/original/Sergei_Miroshnichenko_linux_piter_2019_presentation.pdf?1570136708
for nvme enclosures. Which maybe referncing more the movable BARs than
the renumbering that occurs with assign-busses, but also on power with
device trees which may behave differently as it mentions assign-busses
to get this same renumbering of buses. This makes me think at least
modern non-x86 devices expect to behave this way, which may not be
relevant to ACPI/x86 systems but at least this shared pci code should
be solid.

> I'm not sure yet what's going wrong with the Thunderbold controller, but completely re-assigning bus numbers is certainly the wrong approach.

I agree, it is just what is currently available in the kernel. A less
disruptive approach seems needed.

--Kurt Kartaltepe

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-03-21  9:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-18  6:52 [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios Kurt Kartaltepe
2024-03-18 13:36 ` Alex Deucher
2024-03-18 14:19   ` Kurt Kartaltepe
2024-03-18 15:42     ` Alex Deucher
2024-03-18 16:06       ` Kurt Kartaltepe
2024-03-18 19:52         ` Alex Deucher
2024-03-18 19:57           ` Alex Deucher
2024-03-19  1:55             ` Kurt Kartaltepe
2024-03-19  9:54               ` Christian König
2024-03-19 15:04                 ` Kurt Kartaltepe
2024-03-20 13:31                   ` Christian König
2024-03-20 14:24                     ` Kurt Kartaltepe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.