Re: Couple of issues with amdgpu on my WX4100

From: "Christian König" <christian.koenig@amd.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: David Airlie <airlied@linux.ie>,
	Alex Deucher <alexander.deucher@amd.com>,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
	Maxim Levitsky <mlevitsk@redhat.com>
Subject: Re: Couple of issues with amdgpu on my WX4100
Date: Mon, 4 Jan 2021 21:13:53 +0100	[thread overview]
Message-ID: <158aa1bf-cff5-d3ce-758f-3afcd4a15cae@amd.com> (raw)
In-Reply-To: <20210104114335.3f87ff27@omen.home>

Am 04.01.21 um 19:43 schrieb Alex Williamson:
> On Mon, 4 Jan 2021 18:39:33 +0100
> Christian König <christian.koenig@amd.com> wrote:
>
>> Am 04.01.21 um 17:45 schrieb Alex Williamson:
>>> On Mon, 4 Jan 2021 12:34:34 +0100
>>> Christian König <christian.koenig@amd.com> wrote:
>>>   
>>> [SNIP]
>> That's a rather bad idea. See our GPUs for example return way more than
>> they actually need.
>>
>> E.g. a Polaris usually returns 4GiB even when only 2GiB are installed,
>> because 4GiB is just the maximum amount of RAM you can put together with
>> the ASIC on a board.
> Would the driver fail or misbehave if the BAR is sized larger than the
> amount of memory on the card or is memory size determined independently
> of BAR size?

Uff, good question. I have no idea.

At least the Linux driver should behave well, but no idea about the 
Windows driver stack.

>> Some devices even return a mask of all 1 even when they need only 2MiB,
>> resulting in nearly 1TiB of wasted address space with this approach.
> Ugh.  I'm afraid to ask why a device with a 2MiB BAR would implement a
> REBAR capability, but I guess we really can't make any assumptions
> about the breadth of SKUs that ASIC might support (or sanity of the
> designers).

It's a standard feature for FPGAs these days since how much BAR you need 
depends on what you load on it, and that in turn usually only happens 
after the OS is already started and you fire up your development 
environment.

> We could probe to determine the maximum size the host can support and
> potentially emulate the capability to remove sizes that we can't
> allocate, but without any ability for the device to reject a size
> advertised as supported via the capability protocol it makes me nervous
> how we can guarantee the resources are available when the user
> re-configures the device.  That might mean we'd need to reserve the
> resources, up to what the host can support, regardless of what the
> device can actually use.  I'm not sure how else to know how much to
> reserve without device specific code in vfio-pci.  Thanks,

Well in the FPGA case I outlined above you don't really know how much 
BAR you need until the setup is completed.

E.g. you could need one BAR with just 2MiB and another with 128GB, or 
two with 64GB or.... That's the reason why somebody came up with the 
REBAR standard in the first place.

I think I can summarize that static resizing might work for some devices 
like our GPUs, but it doesn't solve the problem in general.

Regards,
Christian.

>
> Alex
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel