All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Zimmermann <tzimmermann@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Javier Martinez Canillas <javierm@redhat.com>
Cc: "Alex Deucher" <alexander.deucher@amd.com>,
	"Hans de Goede" <hdegoede@redhat.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash
Date: Tue, 28 Jun 2022 10:43:04 +0200	[thread overview]
Message-ID: <561af3c0-c7cf-3580-ce35-320cb13a037c@suse.de> (raw)
In-Reply-To: <CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_DoJA@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 5046 bytes --]

Hi

Am 27.06.22 um 19:25 schrieb Linus Torvalds:
> On Mon, Jun 27, 2022 at 1:02 AM Javier Martinez Canillas
> <javierm@redhat.com> wrote:
>>
>> The flag was dropped because it was causing drivers that requested their
>> memory resource with pci_request_region() to fail with -EBUSY (e.g: the
>> vmwgfx driver):
>>
>> https://www.spinics.net/lists/dri-devel/msg329672.html
> 
> See, *that* link would have been useful in the commit.
> 
> Rather than the useless link it has.
> 
> Anyway, removing the busy bit just made things worse.
> 
>>> If simplefb is actually still using that frame buffer, it's a problem.
>>> If it isn't, then maybe that resource should have been released?
>>
>> It's supposed to be released once amdgpu asks for conflicting framebuffers
>> to be removed calling drm_aperture_remove_conflicting_pci_framebuffers().
> 
> That most definitely doesn't happen. This is on a running system:
> 
>    [torvalds@ryzen linux]$ cat /proc/iomem | grep BOOTFB
>          00000000-00000000 : BOOTFB
> 
> so I suspect that the BUSY bit was never the problem - even for
> vmwgfx). The problem was that simplefb doesn't remove its resource.
> 
> Guys, the *reason* for resource management is to catch people that
> trample over each other's resources.
> 
> You literally basically disabled the code that checked for it by
> removing the BUSY flag, and just continued to have conflicting
> resources.
> 
> That isn't a "fix", that is literally "we are ignoring and breaking
> the whole reason that the resource tree exists, but we'll still use it
> for no good reason".

The EFI/VESA framebuffer is represented by a platform device. The BUSY 
flag we removed is in the 'sysfb' code that creates this device. The 
BOOTFB resource you see in your /proc/iomem is the framebuffer memory. 
The code is in sysfb_create_simplefb() [1]

Later during boot a device driver, 'simplefb' or 'simpledrm', binds to 
the device and reserves the framebuffer memory for rendering into it. 
For example in simpledrm. [2] At that point a BUSY flag is set for that 
reservation.

> 
> Yeah, yeah, most modern drivers ignore the IO resource tree, because
> they end up working on another resource level entirely: they work on
> not the IO resources, but on the "driver level" instead, and just
> attach to PCI devices.
> 
> So these days, few enough drivers even care about the IO resource
> tree, and it's mostly used for (a) legacy devices (think ISA) and (b)
> the actual bus resource handling (so the PCI code itself uses it to
> sort out resource use and avoid conflicts, but PCI drivers themselves
> generally then don't care, because the bus has "taken care of it".
> 
> So that's why the amdgpu driver itself doesn't care about resource
> allocations, and we only get a warning for that memory type case, not
> for any deeper resource case.
> 
> And apparently the vmwgfx driver still uses that legacy "let's claim
> all PCI resources in the resource tree" instead of just claiming the
> device itself. Which is why it hit this whole BOOTFB resource thing
> even harder.
> 
> But the real bug is that BOOTFB seems to claim this resource even
> after it is done with it and other drivers want to take over.

Once amdgpu wants to take over, it has to remove the the platform device 
that represents the EFI framebuffer. It does so by calling the 
drm_aperture_ function, which in turn calls 
platform_device_unregister(). Afterwards, the platform device, driver 
and BOOTFB range are supposed to be entirely gone.

Unfortunately, this currently only works if a driver is bound to the 
platform device. Without simpledrm or simplefb, amdgpu won't find the 
platform device to remove.

I guess, what happens on your system is that sysfb create a device for 
the EFI framebuffer and then amdgpu comes and doesn't find it for 
removal. And later you see these warnings because BOOTFB is still around.

Javier already provided patches for this scenario, which are in the DRM 
tree. From drm-next, please cherry-pick

   0949ee75da6c ("firmware: sysfb: Make sysfb_create_simplefb() return a 
pdev pointer")

   bc824922b264 ("firmware: sysfb: Add sysfb_disable() helper function")

   873eb3b11860 ("fbdev: Disable sysfb device registration when removing 
conflicting FBs")

for testing. With these patches, amdgpu will find the sysfb device and 
unregister it.

The patches are queued up for the next merge window. If they resolve the 
issue, we'll already send with the next round of fixes.

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/latest/source/drivers/firmware/sysfb_simplefb.c#L115
[2] 
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/tiny/simpledrm.c#L544

> 
> Not the BUSY bit.
> 
>                       Linus

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

  reply	other threads:[~2022-06-28  8:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-26 18:54 Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash Linus Torvalds
2022-06-27  8:02 ` Javier Martinez Canillas
2022-06-27 17:25   ` Linus Torvalds
2022-06-27 17:25     ` Linus Torvalds
2022-06-28  8:43     ` Thomas Zimmermann [this message]
2022-06-28 12:41       ` Jocelyn Falempe
2022-06-29  8:20         ` Javier Martinez Canillas
2022-06-27  8:56 ` Thomas Zimmermann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561af3c0-c7cf-3580-ce35-320cb13a037c@suse.de \
    --to=tzimmermann@suse.de \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hdegoede@redhat.com \
    --cc=javierm@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.