All of lore.kernel.org
 help / color / mirror / Atom feed
* Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash
@ 2022-06-26 18:54 Linus Torvalds
  2022-06-27  8:02 ` Javier Martinez Canillas
  2022-06-27  8:56 ` Thomas Zimmermann
  0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2022-06-26 18:54 UTC (permalink / raw)
  To: Alex Deucher, Christian König, Javier Martinez Canillas,
	Thomas Zimmermann, Zack Rusin, Hans de Goede
  Cc: dri-devel, amd-gfx list

So this has been going on for a while, and it's quite annoying.

At bootup, my main desktop (Threadripper 3970X with radeon graphics)
now complains about

  resource sanity check: requesting [mem 0xd0000000-0xdfffffff], which
spans more than BOOTFB [mem 0xd0000000-0xd02fffff]

and then gives me a nasty callchain that is basically the amdgpu probe
sequence ending in amdgpu_bo_init() doing the
arch_io_reserve_memtype_wc() which is then what complains.

That "BOOTFB" resource is from sysfb_simplefb.c, and I think what
started triggering this is commit c96898342c38 ("drivers/firmware:
Don't mark as busy the simple-framebuffer IO resource").

Because it turns out that that removed the IORESOURCE_BUSY, which in
turn is what makes the resource conflict code complain about it now,
because

                /*
                 * if a resource is "BUSY", it's not a hardware resource
                 * but a driver mapping of such a resource; we don't want
                 * to warn for those; some drivers legitimately map only
                 * partial hardware resources. (example: vesafb)
                 */

so the issue is that now the resource code - correctly - says "hey,
you have *two* conflicting driver mappings".

And that commit claims it did it because "which can lead to drivers
requesting the same memory resource to fail", but - once again - the
link in the commit that might actually tell more is just one of those
useless patch submission links again.

So who knows why that commit was actually done, but it's causing annoyance.

If simplefb is actually still using that frame buffer, it's a problem.
If it isn't, then maybe that resource should have been released?

I really think that commit c96898342c38 is buggy. It talks about "let
drivers to request it as busy instead", but then it registers a
resource that isn't actually a proper real resource. It's just a
random incomplete chunk of the actual real thing, so it will still
interfere with resource allocation, and in fact now interferes even
with that "set memtype" because of this valid warning.

             Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash
  2022-06-26 18:54 Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash Linus Torvalds
@ 2022-06-27  8:02 ` Javier Martinez Canillas
  2022-06-27 17:25     ` Linus Torvalds
  2022-06-27  8:56 ` Thomas Zimmermann
  1 sibling, 1 reply; 8+ messages in thread
From: Javier Martinez Canillas @ 2022-06-27  8:02 UTC (permalink / raw)
  To: Linus Torvalds, Alex Deucher, Christian König,
	Thomas Zimmermann, Zack Rusin, Hans de Goede
  Cc: dri-devel, amd-gfx list

Hello Linus,

On 6/26/22 20:54, Linus Torvalds wrote:
> So this has been going on for a while, and it's quite annoying.
> 
> At bootup, my main desktop (Threadripper 3970X with radeon graphics)
> now complains about
> 
>   resource sanity check: requesting [mem 0xd0000000-0xdfffffff], which
> spans more than BOOTFB [mem 0xd0000000-0xd02fffff]
> 
> and then gives me a nasty callchain that is basically the amdgpu probe
> sequence ending in amdgpu_bo_init() doing the
> arch_io_reserve_memtype_wc() which is then what complains.
> 
> That "BOOTFB" resource is from sysfb_simplefb.c, and I think what
> started triggering this is commit c96898342c38 ("drivers/firmware:
> Don't mark as busy the simple-framebuffer IO resource").
> 
> Because it turns out that that removed the IORESOURCE_BUSY, which in
> turn is what makes the resource conflict code complain about it now,
> because
> 
>                 /*
>                  * if a resource is "BUSY", it's not a hardware resource
>                  * but a driver mapping of such a resource; we don't want
>                  * to warn for those; some drivers legitimately map only
>                  * partial hardware resources. (example: vesafb)
>                  */
> 
> so the issue is that now the resource code - correctly - says "hey,
> you have *two* conflicting driver mappings".
> 
> And that commit claims it did it because "which can lead to drivers
> requesting the same memory resource to fail", but - once again - the
> link in the commit that might actually tell more is just one of those
> useless patch submission links again.
> 
> So who knows why that commit was actually done, but it's causing annoyance.
>

The flag was dropped because it was causing drivers that requested their
memory resource with pci_request_region() to fail with -EBUSY (e.g: the
vmwgfx driver):

https://www.spinics.net/lists/dri-devel/msg329672.html
 
> If simplefb is actually still using that frame buffer, it's a problem.
> If it isn't, then maybe that resource should have been released?
>

It's supposed to be released once amdgpu asks for conflicting framebuffers
to be removed calling drm_aperture_remove_conflicting_pci_framebuffers().

I'm not familiar with the amdgpu driver, but maybe that call has to be done
earlier before the arch_io_reserve_memtype_wc() ?
 
> I really think that commit c96898342c38 is buggy. It talks about "let
> drivers to request it as busy instead", but then it registers a
> resource that isn't actually a proper real resource. It's just a
> random incomplete chunk of the actual real thing, so it will still
> interfere with resource allocation, and in fact now interferes even
> with that "set memtype" because of this valid warning.
> 

That registered resource is what the firmware provides for drivers to use
the system framebuffer for scan-out. It's not the real thing, that's true
since a native driver would kick it out (leading to a resource release)
and register the real aperture.

-- 
Best regards,

Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash
  2022-06-26 18:54 Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash Linus Torvalds
  2022-06-27  8:02 ` Javier Martinez Canillas
@ 2022-06-27  8:56 ` Thomas Zimmermann
  1 sibling, 0 replies; 8+ messages in thread
From: Thomas Zimmermann @ 2022-06-27  8:56 UTC (permalink / raw)
  To: Linus Torvalds, Alex Deucher, Christian König,
	Javier Martinez Canillas, Zack Rusin, Hans de Goede
  Cc: dri-devel, amd-gfx list


[-- Attachment #1.1.1: Type: text/plain, Size: 3090 bytes --]

Hi

Am 26.06.22 um 20:54 schrieb Linus Torvalds:
> So this has been going on for a while, and it's quite annoying.
> 
> At bootup, my main desktop (Threadripper 3970X with radeon graphics)
> now complains about
> 
>    resource sanity check: requesting [mem 0xd0000000-0xdfffffff], which
> spans more than BOOTFB [mem 0xd0000000-0xd02fffff]
> 
> and then gives me a nasty callchain that is basically the amdgpu probe
> sequence ending in amdgpu_bo_init() doing the
> arch_io_reserve_memtype_wc() which is then what complains.
> 
> That "BOOTFB" resource is from sysfb_simplefb.c, and I think what
> started triggering this is commit c96898342c38 ("drivers/firmware:
> Don't mark as busy the simple-framebuffer IO resource").
> 
> Because it turns out that that removed the IORESOURCE_BUSY, which in
> turn is what makes the resource conflict code complain about it now,
> because
> 
>                  /*
>                   * if a resource is "BUSY", it's not a hardware resource
>                   * but a driver mapping of such a resource; we don't want
>                   * to warn for those; some drivers legitimately map only
>                   * partial hardware resources. (example: vesafb)
>                   */
> 
> so the issue is that now the resource code - correctly - says "hey,
> you have *two* conflicting driver mappings".
> 
> And that commit claims it did it because "which can lead to drivers
> requesting the same memory resource to fail", but - once again - the
> link in the commit that might actually tell more is just one of those
> useless patch submission links again.
> 
> So who knows why that commit was actually done, but it's causing annoyance.
> 
> If simplefb is actually still using that frame buffer, it's a problem.
> If it isn't, then maybe that resource should have been released?

As Javier said, that resource is the framebuffer that's set up by the 
firmware. It should be gone after the call to 
drm_aperture_remove_conflicting_pci_framebuffers(). [1] The call to 
amdgpu_bo_init() runs afterwards, so that removal apparently failed.

Is the BOOTFB entry still listed in /proc/iomem after the system 
finished booting?

Attached is a (totally untested) patch to manually point amdgpu to the 
right location. Does it fix the problem?

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v5.18.7/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2077

> 
> I really think that commit c96898342c38 is buggy. It talks about "let
> drivers to request it as busy instead", but then it registers a
> resource that isn't actually a proper real resource. It's just a
> random incomplete chunk of the actual real thing, so it will still
> interfere with resource allocation, and in fact now interferes even
> with that "set memtype" because of this valid warning.
> 
>               Linus

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #1.1.2: 0001-drm-amdgpu-Remove-firmware-framebuffer-without-PCI-h.patch --]
[-- Type: text/x-patch, Size: 1192 bytes --]

From c37f0fa8e763c471ddaccc08da9ec9bb1a715451 Mon Sep 17 00:00:00 2001
From: Thomas Zimmermann <tzimmermann@suse.de>
Date: Mon, 27 Jun 2022 10:51:44 +0200
Subject: [PATCH] drm/amdgpu: Remove firmware framebuffer without PCI helper

The DRM aperture helper for PCI devices fails to remove the firmware
framebuffer's device. Manually tell it where to look.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 46ef57b07c15..e00318ff66ff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2073,7 +2073,8 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
 	is_fw_fb = amdgpu_is_fw_framebuffer(base, size);
 
 	/* Get rid of things like offb */
-	ret = drm_aperture_remove_conflicting_pci_framebuffers(pdev, &amdgpu_kms_driver);
+	ret = drm_aperture_remove_conflicting_framebuffers(base, size, is_fw_fb,
+							   &amdgpu_kms_driver);
 	if (ret)
 		return ret;
 
-- 
2.36.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash
  2022-06-27  8:02 ` Javier Martinez Canillas
@ 2022-06-27 17:25     ` Linus Torvalds
  0 siblings, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2022-06-27 17:25 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: amd-gfx list, Hans de Goede, dri-devel, Thomas Zimmermann,
	Alex Deucher, Christian König

On Mon, Jun 27, 2022 at 1:02 AM Javier Martinez Canillas
<javierm@redhat.com> wrote:
>
> The flag was dropped because it was causing drivers that requested their
> memory resource with pci_request_region() to fail with -EBUSY (e.g: the
> vmwgfx driver):
>
> https://www.spinics.net/lists/dri-devel/msg329672.html

See, *that* link would have been useful in the commit.

Rather than the useless link it has.

Anyway, removing the busy bit just made things worse.

> > If simplefb is actually still using that frame buffer, it's a problem.
> > If it isn't, then maybe that resource should have been released?
>
> It's supposed to be released once amdgpu asks for conflicting framebuffers
> to be removed calling drm_aperture_remove_conflicting_pci_framebuffers().

That most definitely doesn't happen. This is on a running system:

  [torvalds@ryzen linux]$ cat /proc/iomem | grep BOOTFB
        00000000-00000000 : BOOTFB

so I suspect that the BUSY bit was never the problem - even for
vmwgfx). The problem was that simplefb doesn't remove its resource.

Guys, the *reason* for resource management is to catch people that
trample over each other's resources.

You literally basically disabled the code that checked for it by
removing the BUSY flag, and just continued to have conflicting
resources.

That isn't a "fix", that is literally "we are ignoring and breaking
the whole reason that the resource tree exists, but we'll still use it
for no good reason".

Yeah, yeah, most modern drivers ignore the IO resource tree, because
they end up working on another resource level entirely: they work on
not the IO resources, but on the "driver level" instead, and just
attach to PCI devices.

So these days, few enough drivers even care about the IO resource
tree, and it's mostly used for (a) legacy devices (think ISA) and (b)
the actual bus resource handling (so the PCI code itself uses it to
sort out resource use and avoid conflicts, but PCI drivers themselves
generally then don't care, because the bus has "taken care of it".

So that's why the amdgpu driver itself doesn't care about resource
allocations, and we only get a warning for that memory type case, not
for any deeper resource case.

And apparently the vmwgfx driver still uses that legacy "let's claim
all PCI resources in the resource tree" instead of just claiming the
device itself. Which is why it hit this whole BOOTFB resource thing
even harder.

But the real bug is that BOOTFB seems to claim this resource even
after it is done with it and other drivers want to take over.

Not the BUSY bit.

                     Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash
@ 2022-06-27 17:25     ` Linus Torvalds
  0 siblings, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2022-06-27 17:25 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: amd-gfx list, Hans de Goede, dri-devel, Thomas Zimmermann,
	Alex Deucher, Christian König, Zack Rusin

On Mon, Jun 27, 2022 at 1:02 AM Javier Martinez Canillas
<javierm@redhat.com> wrote:
>
> The flag was dropped because it was causing drivers that requested their
> memory resource with pci_request_region() to fail with -EBUSY (e.g: the
> vmwgfx driver):
>
> https://www.spinics.net/lists/dri-devel/msg329672.html

See, *that* link would have been useful in the commit.

Rather than the useless link it has.

Anyway, removing the busy bit just made things worse.

> > If simplefb is actually still using that frame buffer, it's a problem.
> > If it isn't, then maybe that resource should have been released?
>
> It's supposed to be released once amdgpu asks for conflicting framebuffers
> to be removed calling drm_aperture_remove_conflicting_pci_framebuffers().

That most definitely doesn't happen. This is on a running system:

  [torvalds@ryzen linux]$ cat /proc/iomem | grep BOOTFB
        00000000-00000000 : BOOTFB

so I suspect that the BUSY bit was never the problem - even for
vmwgfx). The problem was that simplefb doesn't remove its resource.

Guys, the *reason* for resource management is to catch people that
trample over each other's resources.

You literally basically disabled the code that checked for it by
removing the BUSY flag, and just continued to have conflicting
resources.

That isn't a "fix", that is literally "we are ignoring and breaking
the whole reason that the resource tree exists, but we'll still use it
for no good reason".

Yeah, yeah, most modern drivers ignore the IO resource tree, because
they end up working on another resource level entirely: they work on
not the IO resources, but on the "driver level" instead, and just
attach to PCI devices.

So these days, few enough drivers even care about the IO resource
tree, and it's mostly used for (a) legacy devices (think ISA) and (b)
the actual bus resource handling (so the PCI code itself uses it to
sort out resource use and avoid conflicts, but PCI drivers themselves
generally then don't care, because the bus has "taken care of it".

So that's why the amdgpu driver itself doesn't care about resource
allocations, and we only get a warning for that memory type case, not
for any deeper resource case.

And apparently the vmwgfx driver still uses that legacy "let's claim
all PCI resources in the resource tree" instead of just claiming the
device itself. Which is why it hit this whole BOOTFB resource thing
even harder.

But the real bug is that BOOTFB seems to claim this resource even
after it is done with it and other drivers want to take over.

Not the BUSY bit.

                     Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash
  2022-06-27 17:25     ` Linus Torvalds
  (?)
@ 2022-06-28  8:43     ` Thomas Zimmermann
  2022-06-28 12:41       ` Jocelyn Falempe
  -1 siblings, 1 reply; 8+ messages in thread
From: Thomas Zimmermann @ 2022-06-28  8:43 UTC (permalink / raw)
  To: Linus Torvalds, Javier Martinez Canillas
  Cc: Alex Deucher, Hans de Goede, dri-devel, amd-gfx list,
	Christian König


[-- Attachment #1.1: Type: text/plain, Size: 5046 bytes --]

Hi

Am 27.06.22 um 19:25 schrieb Linus Torvalds:
> On Mon, Jun 27, 2022 at 1:02 AM Javier Martinez Canillas
> <javierm@redhat.com> wrote:
>>
>> The flag was dropped because it was causing drivers that requested their
>> memory resource with pci_request_region() to fail with -EBUSY (e.g: the
>> vmwgfx driver):
>>
>> https://www.spinics.net/lists/dri-devel/msg329672.html
> 
> See, *that* link would have been useful in the commit.
> 
> Rather than the useless link it has.
> 
> Anyway, removing the busy bit just made things worse.
> 
>>> If simplefb is actually still using that frame buffer, it's a problem.
>>> If it isn't, then maybe that resource should have been released?
>>
>> It's supposed to be released once amdgpu asks for conflicting framebuffers
>> to be removed calling drm_aperture_remove_conflicting_pci_framebuffers().
> 
> That most definitely doesn't happen. This is on a running system:
> 
>    [torvalds@ryzen linux]$ cat /proc/iomem | grep BOOTFB
>          00000000-00000000 : BOOTFB
> 
> so I suspect that the BUSY bit was never the problem - even for
> vmwgfx). The problem was that simplefb doesn't remove its resource.
> 
> Guys, the *reason* for resource management is to catch people that
> trample over each other's resources.
> 
> You literally basically disabled the code that checked for it by
> removing the BUSY flag, and just continued to have conflicting
> resources.
> 
> That isn't a "fix", that is literally "we are ignoring and breaking
> the whole reason that the resource tree exists, but we'll still use it
> for no good reason".

The EFI/VESA framebuffer is represented by a platform device. The BUSY 
flag we removed is in the 'sysfb' code that creates this device. The 
BOOTFB resource you see in your /proc/iomem is the framebuffer memory. 
The code is in sysfb_create_simplefb() [1]

Later during boot a device driver, 'simplefb' or 'simpledrm', binds to 
the device and reserves the framebuffer memory for rendering into it. 
For example in simpledrm. [2] At that point a BUSY flag is set for that 
reservation.

> 
> Yeah, yeah, most modern drivers ignore the IO resource tree, because
> they end up working on another resource level entirely: they work on
> not the IO resources, but on the "driver level" instead, and just
> attach to PCI devices.
> 
> So these days, few enough drivers even care about the IO resource
> tree, and it's mostly used for (a) legacy devices (think ISA) and (b)
> the actual bus resource handling (so the PCI code itself uses it to
> sort out resource use and avoid conflicts, but PCI drivers themselves
> generally then don't care, because the bus has "taken care of it".
> 
> So that's why the amdgpu driver itself doesn't care about resource
> allocations, and we only get a warning for that memory type case, not
> for any deeper resource case.
> 
> And apparently the vmwgfx driver still uses that legacy "let's claim
> all PCI resources in the resource tree" instead of just claiming the
> device itself. Which is why it hit this whole BOOTFB resource thing
> even harder.
> 
> But the real bug is that BOOTFB seems to claim this resource even
> after it is done with it and other drivers want to take over.

Once amdgpu wants to take over, it has to remove the the platform device 
that represents the EFI framebuffer. It does so by calling the 
drm_aperture_ function, which in turn calls 
platform_device_unregister(). Afterwards, the platform device, driver 
and BOOTFB range are supposed to be entirely gone.

Unfortunately, this currently only works if a driver is bound to the 
platform device. Without simpledrm or simplefb, amdgpu won't find the 
platform device to remove.

I guess, what happens on your system is that sysfb create a device for 
the EFI framebuffer and then amdgpu comes and doesn't find it for 
removal. And later you see these warnings because BOOTFB is still around.

Javier already provided patches for this scenario, which are in the DRM 
tree. From drm-next, please cherry-pick

   0949ee75da6c ("firmware: sysfb: Make sysfb_create_simplefb() return a 
pdev pointer")

   bc824922b264 ("firmware: sysfb: Add sysfb_disable() helper function")

   873eb3b11860 ("fbdev: Disable sysfb device registration when removing 
conflicting FBs")

for testing. With these patches, amdgpu will find the sysfb device and 
unregister it.

The patches are queued up for the next merge window. If they resolve the 
issue, we'll already send with the next round of fixes.

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/latest/source/drivers/firmware/sysfb_simplefb.c#L115
[2] 
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/tiny/simpledrm.c#L544

> 
> Not the BUSY bit.
> 
>                       Linus

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash
  2022-06-28  8:43     ` Thomas Zimmermann
@ 2022-06-28 12:41       ` Jocelyn Falempe
  2022-06-29  8:20         ` Javier Martinez Canillas
  0 siblings, 1 reply; 8+ messages in thread
From: Jocelyn Falempe @ 2022-06-28 12:41 UTC (permalink / raw)
  To: Thomas Zimmermann, Linus Torvalds, Javier Martinez Canillas
  Cc: Alex Deucher, Hans de Goede, amd-gfx list, dri-devel,
	Christian König

On 28/06/2022 10:43, Thomas Zimmermann wrote:
> Hi
> 
> Am 27.06.22 um 19:25 schrieb Linus Torvalds:
>> On Mon, Jun 27, 2022 at 1:02 AM Javier Martinez Canillas
>> <javierm@redhat.com> wrote:
>>>
>>> The flag was dropped because it was causing drivers that requested their
>>> memory resource with pci_request_region() to fail with -EBUSY (e.g: the
>>> vmwgfx driver):
>>>
>>> https://www.spinics.net/lists/dri-devel/msg329672.html
>>
>> See, *that* link would have been useful in the commit.
>>
>> Rather than the useless link it has.
>>
>> Anyway, removing the busy bit just made things worse.
>>
>>>> If simplefb is actually still using that frame buffer, it's a problem.
>>>> If it isn't, then maybe that resource should have been released?
>>>
>>> It's supposed to be released once amdgpu asks for conflicting 
>>> framebuffers
>>> to be removed calling 
>>> drm_aperture_remove_conflicting_pci_framebuffers().
>>
>> That most definitely doesn't happen. This is on a running system:
>>
>>    [torvalds@ryzen linux]$ cat /proc/iomem | grep BOOTFB
>>          00000000-00000000 : BOOTFB
>>
>> so I suspect that the BUSY bit was never the problem - even for
>> vmwgfx). The problem was that simplefb doesn't remove its resource.
>>
>> Guys, the *reason* for resource management is to catch people that
>> trample over each other's resources.
>>
>> You literally basically disabled the code that checked for it by
>> removing the BUSY flag, and just continued to have conflicting
>> resources.
>>
>> That isn't a "fix", that is literally "we are ignoring and breaking
>> the whole reason that the resource tree exists, but we'll still use it
>> for no good reason".
> 
> The EFI/VESA framebuffer is represented by a platform device. The BUSY 
> flag we removed is in the 'sysfb' code that creates this device. The 
> BOOTFB resource you see in your /proc/iomem is the framebuffer memory. 
> The code is in sysfb_create_simplefb() [1]
> 
> Later during boot a device driver, 'simplefb' or 'simpledrm', binds to 
> the device and reserves the framebuffer memory for rendering into it. 
> For example in simpledrm. [2] At that point a BUSY flag is set for that 
> reservation.
> 
>>
>> Yeah, yeah, most modern drivers ignore the IO resource tree, because
>> they end up working on another resource level entirely: they work on
>> not the IO resources, but on the "driver level" instead, and just
>> attach to PCI devices.
>>
>> So these days, few enough drivers even care about the IO resource
>> tree, and it's mostly used for (a) legacy devices (think ISA) and (b)
>> the actual bus resource handling (so the PCI code itself uses it to
>> sort out resource use and avoid conflicts, but PCI drivers themselves
>> generally then don't care, because the bus has "taken care of it".
>>
>> So that's why the amdgpu driver itself doesn't care about resource
>> allocations, and we only get a warning for that memory type case, not
>> for any deeper resource case.
>>
>> And apparently the vmwgfx driver still uses that legacy "let's claim
>> all PCI resources in the resource tree" instead of just claiming the
>> device itself. Which is why it hit this whole BOOTFB resource thing
>> even harder.
>>
>> But the real bug is that BOOTFB seems to claim this resource even
>> after it is done with it and other drivers want to take over.
> 
> Once amdgpu wants to take over, it has to remove the the platform device 
> that represents the EFI framebuffer. It does so by calling the 
> drm_aperture_ function, which in turn calls 
> platform_device_unregister(). Afterwards, the platform device, driver 
> and BOOTFB range are supposed to be entirely gone.
> 
> Unfortunately, this currently only works if a driver is bound to the 
> platform device. Without simpledrm or simplefb, amdgpu won't find the 
> platform device to remove.
> 
> I guess, what happens on your system is that sysfb create a device for 
> the EFI framebuffer and then amdgpu comes and doesn't find it for 
> removal. And later you see these warnings because BOOTFB is still around.
> 
> Javier already provided patches for this scenario, which are in the DRM 
> tree. From drm-next, please cherry-pick
> 
>    0949ee75da6c ("firmware: sysfb: Make sysfb_create_simplefb() return a 
> pdev pointer")
> 
>    bc824922b264 ("firmware: sysfb: Add sysfb_disable() helper function")
> 
>    873eb3b11860 ("fbdev: Disable sysfb device registration when removing 
> conflicting FBs")
> 
> for testing. With these patches, amdgpu will find the sysfb device and 
> unregister it.
> 
> The patches are queued up for the next merge window. If they resolve the 
> issue, we'll already send with the next round of fixes.

I was able to reproduce the warning with kernel v5.19-rc4, a radeon GPU 
and the following config:

CONFIG_SYSFB=y
CONFIG_SYSFB_SIMPLEFB=y
# CONFIG_DRM_SIMPLEDRM is not set
# CONFIG_FB_SIMPLE is not set

After applying the 3 patches you mentioned, the issue is resolved. (at 
least on my setup).

Best regards,

-- 

Jocelyn

> 
> Best regards
> Thomas
> 
> [1] 
> https://elixir.bootlin.com/linux/latest/source/drivers/firmware/sysfb_simplefb.c#L115 
> 
> [2] 
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/tiny/simpledrm.c#L544 
> 
> 
>>
>> Not the BUSY bit.
>>
>>                       Linus
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash
  2022-06-28 12:41       ` Jocelyn Falempe
@ 2022-06-29  8:20         ` Javier Martinez Canillas
  0 siblings, 0 replies; 8+ messages in thread
From: Javier Martinez Canillas @ 2022-06-29  8:20 UTC (permalink / raw)
  To: Jocelyn Falempe, Thomas Zimmermann, Linus Torvalds
  Cc: Alex Deucher, Hans de Goede, amd-gfx list, dri-devel,
	Christian König

On 6/28/22 14:41, Jocelyn Falempe wrote:
> On 28/06/2022 10:43, Thomas Zimmermann wrote:

[snip]

>>
>> Unfortunately, this currently only works if a driver is bound to the 
>> platform device. Without simpledrm or simplefb, amdgpu won't find the 
>> platform device to remove.
>>
>> I guess, what happens on your system is that sysfb create a device for 
>> the EFI framebuffer and then amdgpu comes and doesn't find it for 
>> removal. And later you see these warnings because BOOTFB is still around.
>>
>> Javier already provided patches for this scenario, which are in the DRM 
>> tree. From drm-next, please cherry-pick
>>
>>    0949ee75da6c ("firmware: sysfb: Make sysfb_create_simplefb() return a 
>> pdev pointer")
>>
>>    bc824922b264 ("firmware: sysfb: Add sysfb_disable() helper function")
>>
>>    873eb3b11860 ("fbdev: Disable sysfb device registration when removing 
>> conflicting FBs")
>>
>> for testing. With these patches, amdgpu will find the sysfb device and 
>> unregister it.
>>
>> The patches are queued up for the next merge window. If they resolve the 
>> issue, we'll already send with the next round of fixes.
> 
> I was able to reproduce the warning with kernel v5.19-rc4, a radeon GPU 
> and the following config:
>
> CONFIG_SYSFB=y
> CONFIG_SYSFB_SIMPLEFB=y
> # CONFIG_DRM_SIMPLEDRM is not set
> # CONFIG_FB_SIMPLE is not set
>

Yes, and this could happen even if both drivers (e.g: simplefb and amdgpu)
are enabled but built-in. For example, if amdgpu probes before simplefb.

> After applying the 3 patches you mentioned, the issue is resolved. (at 
> least on my setup).
>

Thanks for testing Jocelyn! I've pushed the patches to drm-misc-fixes now.

-- 
Best regards,

Javier Martinez Canillas
Linux Engineering
Red Hat


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-06-29  8:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-26 18:54 Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash Linus Torvalds
2022-06-27  8:02 ` Javier Martinez Canillas
2022-06-27 17:25   ` Linus Torvalds
2022-06-27 17:25     ` Linus Torvalds
2022-06-28  8:43     ` Thomas Zimmermann
2022-06-28 12:41       ` Jocelyn Falempe
2022-06-29  8:20         ` Javier Martinez Canillas
2022-06-27  8:56 ` Thomas Zimmermann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.