* [PATCH] drm/radeon: fix asic initialization for virtualized environments
@ 2016-06-13 19:45 ` Alex Deucher
0 siblings, 0 replies; 8+ messages in thread
From: Alex Deucher @ 2016-06-13 19:45 UTC (permalink / raw)
To: dri-devel; +Cc: Alex Deucher, Andres Rodriguez, Alex Williamson, stable
When executing in a PCI passthrough based virtuzliation environment, the
hypervisor will usually attempt to send a PCIe bus reset signal to the
ASIC when the VM reboots. In this scenario, the card is not correctly
initialized, but we still consider it to be posted. Therefore, in a
passthrough based environemnt we should always post the card to guarantee
it is in a good state for driver initialization.
Ported from amdgpu commit:
amdgpu: fix asic initialization for virtualized environments
Cc: Andres Rodriguez <andres.rodriguez@amd.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
---
drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index e61c763..21c44b2 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -630,6 +630,23 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc)
/*
* GPU helpers function.
*/
+
+/**
+ * radeon_device_is_virtual - check if we are running is a virtual environment
+ *
+ * Check if the asic has been passed through to a VM (all asics).
+ * Used at driver startup.
+ * Returns true if virtual or false if not.
+ */
+static bool radeon_device_is_virtual(void)
+{
+#ifdef CONFIG_X86
+ return boot_cpu_has(X86_FEATURE_HYPERVISOR);
+#else
+ return false;
+#endif
+}
+
/**
* radeon_card_posted - check if the hw has already been initialized
*
@@ -643,6 +660,10 @@ bool radeon_card_posted(struct radeon_device *rdev)
{
uint32_t reg;
+ /* for pass through, always force asic_init */
+ if (radeon_device_is_virtual())
+ return false;
+
/* required for EFI mode on macbook2,1 which uses an r5xx asic */
if (efi_enabled(EFI_BOOT) &&
(rdev->pdev->subsystem_vendor == PCI_VENDOR_ID_APPLE) &&
--
2.5.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] drm/radeon: fix asic initialization for virtualized environments
@ 2016-06-13 19:45 ` Alex Deucher
0 siblings, 0 replies; 8+ messages in thread
From: Alex Deucher @ 2016-06-13 19:45 UTC (permalink / raw)
To: dri-devel; +Cc: Alex Deucher, Alex Williamson, Andres Rodriguez, stable
When executing in a PCI passthrough based virtuzliation environment, the
hypervisor will usually attempt to send a PCIe bus reset signal to the
ASIC when the VM reboots. In this scenario, the card is not correctly
initialized, but we still consider it to be posted. Therefore, in a
passthrough based environemnt we should always post the card to guarantee
it is in a good state for driver initialization.
Ported from amdgpu commit:
amdgpu: fix asic initialization for virtualized environments
Cc: Andres Rodriguez <andres.rodriguez@amd.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
---
drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index e61c763..21c44b2 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -630,6 +630,23 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc)
/*
* GPU helpers function.
*/
+
+/**
+ * radeon_device_is_virtual - check if we are running is a virtual environment
+ *
+ * Check if the asic has been passed through to a VM (all asics).
+ * Used at driver startup.
+ * Returns true if virtual or false if not.
+ */
+static bool radeon_device_is_virtual(void)
+{
+#ifdef CONFIG_X86
+ return boot_cpu_has(X86_FEATURE_HYPERVISOR);
+#else
+ return false;
+#endif
+}
+
/**
* radeon_card_posted - check if the hw has already been initialized
*
@@ -643,6 +660,10 @@ bool radeon_card_posted(struct radeon_device *rdev)
{
uint32_t reg;
+ /* for pass through, always force asic_init */
+ if (radeon_device_is_virtual())
+ return false;
+
/* required for EFI mode on macbook2,1 which uses an r5xx asic */
if (efi_enabled(EFI_BOOT) &&
(rdev->pdev->subsystem_vendor == PCI_VENDOR_ID_APPLE) &&
--
2.5.5
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] drm/radeon: fix asic initialization for virtualized environments
2016-06-13 19:45 ` Alex Deucher
(?)
@ 2016-06-13 20:10 ` Alex Williamson
2016-06-15 6:23 ` Alex Deucher
-1 siblings, 1 reply; 8+ messages in thread
From: Alex Williamson @ 2016-06-13 20:10 UTC (permalink / raw)
To: Alex Deucher; +Cc: dri-devel, Alex Deucher, Andres Rodriguez, stable
On Mon, 13 Jun 2016 15:45:20 -0400
Alex Deucher <alexdeucher@gmail.com> wrote:
> When executing in a PCI passthrough based virtuzliation environment, the
> hypervisor will usually attempt to send a PCIe bus reset signal to the
> ASIC when the VM reboots. In this scenario, the card is not correctly
> initialized, but we still consider it to be posted. Therefore, in a
> passthrough based environemnt we should always post the card to guarantee
> it is in a good state for driver initialization.
>
> Ported from amdgpu commit:
> amdgpu: fix asic initialization for virtualized environments
>
> Cc: Andres Rodriguez <andres.rodriguez@amd.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Cc: stable@vger.kernel.org
> ---
> drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
Thanks, I expect it's an improvement, though it's always a bit
disappointing when a driver starts modifying its behavior based on
what might be a transient feature of the platform, in this case a
hypervisor platform. For instance, why does our bus reset and video
ROM execution result in a different state than a physical BIOS doing
the same? Can't this condition occur regardless of a hypervisor,
perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or perhaps
simply a system BIOS optimized to post a limited set of devices.
Detection based on some state of the device rather than an expectation
based on what the device is running on seems preferable. I suspect
Andres' patch for amdgpu only affects newer devices, which pretty much
all suffer reset issues, at least under QEMU/VFIO, but I wonder how this
patch affects existing working devices, like 6, 7, and some 8-series.
Anyway, if this is the solution to the poor behavior we've seen with
assigned AMD cards, maybe someone could request the same for the closed
drivers, including Windows. Thanks,
Alex
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index e61c763..21c44b2 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -630,6 +630,23 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc)
> /*
> * GPU helpers function.
> */
> +
> +/**
> + * radeon_device_is_virtual - check if we are running is a virtual environment
> + *
> + * Check if the asic has been passed through to a VM (all asics).
> + * Used at driver startup.
> + * Returns true if virtual or false if not.
> + */
> +static bool radeon_device_is_virtual(void)
> +{
> +#ifdef CONFIG_X86
> + return boot_cpu_has(X86_FEATURE_HYPERVISOR);
> +#else
> + return false;
> +#endif
> +}
> +
> /**
> * radeon_card_posted - check if the hw has already been initialized
> *
> @@ -643,6 +660,10 @@ bool radeon_card_posted(struct radeon_device *rdev)
> {
> uint32_t reg;
>
> + /* for pass through, always force asic_init */
> + if (radeon_device_is_virtual())
> + return false;
> +
> /* required for EFI mode on macbook2,1 which uses an r5xx asic */
> if (efi_enabled(EFI_BOOT) &&
> (rdev->pdev->subsystem_vendor == PCI_VENDOR_ID_APPLE) &&
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] drm/radeon: fix asic initialization for virtualized environments
2016-06-13 20:10 ` Alex Williamson
@ 2016-06-15 6:23 ` Alex Deucher
2016-06-15 16:45 ` Alex Williamson
0 siblings, 1 reply; 8+ messages in thread
From: Alex Deucher @ 2016-06-15 6:23 UTC (permalink / raw)
To: Alex Williamson
Cc: Maling list - DRI developers, Alex Deucher, Andres Rodriguez, for 3.8
On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Mon, 13 Jun 2016 15:45:20 -0400
> Alex Deucher <alexdeucher@gmail.com> wrote:
>
>> When executing in a PCI passthrough based virtuzliation environment, the
>> hypervisor will usually attempt to send a PCIe bus reset signal to the
>> ASIC when the VM reboots. In this scenario, the card is not correctly
>> initialized, but we still consider it to be posted. Therefore, in a
>> passthrough based environemnt we should always post the card to guarantee
>> it is in a good state for driver initialization.
>>
>> Ported from amdgpu commit:
>> amdgpu: fix asic initialization for virtualized environments
>>
>> Cc: Andres Rodriguez <andres.rodriguez@amd.com>
>> Cc: Alex Williamson <alex.williamson@redhat.com>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> Cc: stable@vger.kernel.org
>> ---
>> drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
>> 1 file changed, 21 insertions(+)
>
> Thanks, I expect it's an improvement, though it's always a bit
> disappointing when a driver starts modifying its behavior based on
> what might be a transient feature of the platform, in this case a
> hypervisor platform. For instance, why does our bus reset and video
> ROM execution result in a different state than a physical BIOS doing
> the same? Can't this condition occur regardless of a hypervisor,
Just doing a pci reset is not enough on newer cards. The hw handling
pci resets changed in CI and more of the logic moved to the driver.
That does a limited reset, but not the registers that the driver
checks to determine whether or not the asic has been posted so the
driver skips posting and leaves the hw in a bad reset state.
> perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or perhaps
> simply a system BIOS optimized to post a limited set of devices.
We can tell if a card has never been posted and properly post it.
Where it's tricky is when a card has been posted and has subsequently
been pci reset on CI and newer hw. I'm not sure of a good way to
detect this particular scenario. Generally this is mainly done for
qemu/kvm.
> Detection based on some state of the device rather than an expectation
> based on what the device is running on seems preferable. I suspect
> Andres' patch for amdgpu only affects newer devices, which pretty much
> all suffer reset issues, at least under QEMU/VFIO, but I wonder how this
> patch affects existing working devices, like 6, 7, and some 8-series.
Posting the asic at init time should be safe on all asics.
> Anyway, if this is the solution to the poor behavior we've seen with
> assigned AMD cards, maybe someone could request the same for the closed
> drivers, including Windows. Thanks,
The closed drivers already do this.
Alex
>
> Alex
>
>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
>> index e61c763..21c44b2 100644
>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>> @@ -630,6 +630,23 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc)
>> /*
>> * GPU helpers function.
>> */
>> +
>> +/**
>> + * radeon_device_is_virtual - check if we are running is a virtual environment
>> + *
>> + * Check if the asic has been passed through to a VM (all asics).
>> + * Used at driver startup.
>> + * Returns true if virtual or false if not.
>> + */
>> +static bool radeon_device_is_virtual(void)
>> +{
>> +#ifdef CONFIG_X86
>> + return boot_cpu_has(X86_FEATURE_HYPERVISOR);
>> +#else
>> + return false;
>> +#endif
>> +}
>> +
>> /**
>> * radeon_card_posted - check if the hw has already been initialized
>> *
>> @@ -643,6 +660,10 @@ bool radeon_card_posted(struct radeon_device *rdev)
>> {
>> uint32_t reg;
>>
>> + /* for pass through, always force asic_init */
>> + if (radeon_device_is_virtual())
>> + return false;
>> +
>> /* required for EFI mode on macbook2,1 which uses an r5xx asic */
>> if (efi_enabled(EFI_BOOT) &&
>> (rdev->pdev->subsystem_vendor == PCI_VENDOR_ID_APPLE) &&
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] drm/radeon: fix asic initialization for virtualized environments
2016-06-15 6:23 ` Alex Deucher
@ 2016-06-15 16:45 ` Alex Williamson
0 siblings, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2016-06-15 16:45 UTC (permalink / raw)
To: Alex Deucher
Cc: Maling list - DRI developers, Alex Deucher, Andres Rodriguez, for 3.8
On Wed, 15 Jun 2016 02:23:37 -0400
Alex Deucher <alexdeucher@gmail.com> wrote:
> On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Mon, 13 Jun 2016 15:45:20 -0400
> > Alex Deucher <alexdeucher@gmail.com> wrote:
> >
> >> When executing in a PCI passthrough based virtuzliation environment, the
> >> hypervisor will usually attempt to send a PCIe bus reset signal to the
> >> ASIC when the VM reboots. In this scenario, the card is not correctly
> >> initialized, but we still consider it to be posted. Therefore, in a
> >> passthrough based environemnt we should always post the card to guarantee
> >> it is in a good state for driver initialization.
> >>
> >> Ported from amdgpu commit:
> >> amdgpu: fix asic initialization for virtualized environments
> >>
> >> Cc: Andres Rodriguez <andres.rodriguez@amd.com>
> >> Cc: Alex Williamson <alex.williamson@redhat.com>
> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >> Cc: stable@vger.kernel.org
> >> ---
> >> drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
> >> 1 file changed, 21 insertions(+)
> >
> > Thanks, I expect it's an improvement, though it's always a bit
> > disappointing when a driver starts modifying its behavior based on
> > what might be a transient feature of the platform, in this case a
> > hypervisor platform. For instance, why does our bus reset and video
> > ROM execution result in a different state than a physical BIOS doing
> > the same? Can't this condition occur regardless of a hypervisor,
>
> Just doing a pci reset is not enough on newer cards. The hw handling
> pci resets changed in CI and more of the logic moved to the driver.
Gag, please relay my disapproval to your hardware folks.
> That does a limited reset, but not the registers that the driver
> checks to determine whether or not the asic has been posted so the
> driver skips posting and leaves the hw in a bad reset state.
>
> > perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or perhaps
> > simply a system BIOS optimized to post a limited set of devices.
>
> We can tell if a card has never been posted and properly post it.
> Where it's tricky is when a card has been posted and has subsequently
> been pci reset on CI and newer hw. I'm not sure of a good way to
> detect this particular scenario. Generally this is mainly done for
> qemu/kvm.
How do you tell if a card has never been posted? Is it something we
could easily toggle after a bus reset?
> > Detection based on some state of the device rather than an expectation
> > based on what the device is running on seems preferable. I suspect
> > Andres' patch for amdgpu only affects newer devices, which pretty much
> > all suffer reset issues, at least under QEMU/VFIO, but I wonder how this
> > patch affects existing working devices, like 6, 7, and some 8-series.
>
> Posting the asic at init time should be safe on all asics.
>
> > Anyway, if this is the solution to the poor behavior we've seen with
> > assigned AMD cards, maybe someone could request the same for the closed
> > drivers, including Windows. Thanks,
>
> The closed drivers already do this.
Hmm, that's not terribly encouraging then since the majority of users
are running Windows guests for the purpose of creating a gaming VM and
still experiencing reset issues with the closed drivers there. Thanks,
Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] drm/radeon: fix asic initialization for virtualized environments
@ 2016-06-15 16:45 ` Alex Williamson
0 siblings, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2016-06-15 16:45 UTC (permalink / raw)
To: Alex Deucher
Cc: Alex Deucher, Andres Rodriguez, for 3.8, Maling list - DRI developers
On Wed, 15 Jun 2016 02:23:37 -0400
Alex Deucher <alexdeucher@gmail.com> wrote:
> On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Mon, 13 Jun 2016 15:45:20 -0400
> > Alex Deucher <alexdeucher@gmail.com> wrote:
> >
> >> When executing in a PCI passthrough based virtuzliation environment, the
> >> hypervisor will usually attempt to send a PCIe bus reset signal to the
> >> ASIC when the VM reboots. In this scenario, the card is not correctly
> >> initialized, but we still consider it to be posted. Therefore, in a
> >> passthrough based environemnt we should always post the card to guarantee
> >> it is in a good state for driver initialization.
> >>
> >> Ported from amdgpu commit:
> >> amdgpu: fix asic initialization for virtualized environments
> >>
> >> Cc: Andres Rodriguez <andres.rodriguez@amd.com>
> >> Cc: Alex Williamson <alex.williamson@redhat.com>
> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >> Cc: stable@vger.kernel.org
> >> ---
> >> drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
> >> 1 file changed, 21 insertions(+)
> >
> > Thanks, I expect it's an improvement, though it's always a bit
> > disappointing when a driver starts modifying its behavior based on
> > what might be a transient feature of the platform, in this case a
> > hypervisor platform. For instance, why does our bus reset and video
> > ROM execution result in a different state than a physical BIOS doing
> > the same? Can't this condition occur regardless of a hypervisor,
>
> Just doing a pci reset is not enough on newer cards. The hw handling
> pci resets changed in CI and more of the logic moved to the driver.
Gag, please relay my disapproval to your hardware folks.
> That does a limited reset, but not the registers that the driver
> checks to determine whether or not the asic has been posted so the
> driver skips posting and leaves the hw in a bad reset state.
>
> > perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or perhaps
> > simply a system BIOS optimized to post a limited set of devices.
>
> We can tell if a card has never been posted and properly post it.
> Where it's tricky is when a card has been posted and has subsequently
> been pci reset on CI and newer hw. I'm not sure of a good way to
> detect this particular scenario. Generally this is mainly done for
> qemu/kvm.
How do you tell if a card has never been posted? Is it something we
could easily toggle after a bus reset?
> > Detection based on some state of the device rather than an expectation
> > based on what the device is running on seems preferable. I suspect
> > Andres' patch for amdgpu only affects newer devices, which pretty much
> > all suffer reset issues, at least under QEMU/VFIO, but I wonder how this
> > patch affects existing working devices, like 6, 7, and some 8-series.
>
> Posting the asic at init time should be safe on all asics.
>
> > Anyway, if this is the solution to the poor behavior we've seen with
> > assigned AMD cards, maybe someone could request the same for the closed
> > drivers, including Windows. Thanks,
>
> The closed drivers already do this.
Hmm, that's not terribly encouraging then since the majority of users
are running Windows guests for the purpose of creating a gaming VM and
still experiencing reset issues with the closed drivers there. Thanks,
Alex
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] drm/radeon: fix asic initialization for virtualized environments
2016-06-15 16:45 ` Alex Williamson
(?)
@ 2016-06-15 17:00 ` Alex Deucher
2016-06-16 15:43 ` Rodriguez, Andres
-1 siblings, 1 reply; 8+ messages in thread
From: Alex Deucher @ 2016-06-15 17:00 UTC (permalink / raw)
To: Alex Williamson
Cc: Maling list - DRI developers, Alex Deucher, Andres Rodriguez, for 3.8
On Wed, Jun 15, 2016 at 12:45 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Wed, 15 Jun 2016 02:23:37 -0400
> Alex Deucher <alexdeucher@gmail.com> wrote:
>
>> On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> > On Mon, 13 Jun 2016 15:45:20 -0400
>> > Alex Deucher <alexdeucher@gmail.com> wrote:
>> >
>> >> When executing in a PCI passthrough based virtuzliation environment, the
>> >> hypervisor will usually attempt to send a PCIe bus reset signal to the
>> >> ASIC when the VM reboots. In this scenario, the card is not correctly
>> >> initialized, but we still consider it to be posted. Therefore, in a
>> >> passthrough based environemnt we should always post the card to guarantee
>> >> it is in a good state for driver initialization.
>> >>
>> >> Ported from amdgpu commit:
>> >> amdgpu: fix asic initialization for virtualized environments
>> >>
>> >> Cc: Andres Rodriguez <andres.rodriguez@amd.com>
>> >> Cc: Alex Williamson <alex.williamson@redhat.com>
>> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> >> Cc: stable@vger.kernel.org
>> >> ---
>> >> drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
>> >> 1 file changed, 21 insertions(+)
>> >
>> > Thanks, I expect it's an improvement, though it's always a bit
>> > disappointing when a driver starts modifying its behavior based on
>> > what might be a transient feature of the platform, in this case a
>> > hypervisor platform. For instance, why does our bus reset and video
>> > ROM execution result in a different state than a physical BIOS doing
>> > the same? Can't this condition occur regardless of a hypervisor,
>>
>> Just doing a pci reset is not enough on newer cards. The hw handling
>> pci resets changed in CI and more of the logic moved to the driver.
>
> Gag, please relay my disapproval to your hardware folks.
>
>> That does a limited reset, but not the registers that the driver
>> checks to determine whether or not the asic has been posted so the
>> driver skips posting and leaves the hw in a bad reset state.
>>
>> > perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or perhaps
>> > simply a system BIOS optimized to post a limited set of devices.
>>
>> We can tell if a card has never been posted and properly post it.
>> Where it's tricky is when a card has been posted and has subsequently
>> been pci reset on CI and newer hw. I'm not sure of a good way to
>> detect this particular scenario. Generally this is mainly done for
>> qemu/kvm.
>
> How do you tell if a card has never been posted? Is it something we
> could easily toggle after a bus reset?
We check CONFIG_MEMSIZE which is a scratch register set by the
asic_init command table to tell the driver how much vram is on the
board.
>
>> > Detection based on some state of the device rather than an expectation
>> > based on what the device is running on seems preferable. I suspect
>> > Andres' patch for amdgpu only affects newer devices, which pretty much
>> > all suffer reset issues, at least under QEMU/VFIO, but I wonder how this
>> > patch affects existing working devices, like 6, 7, and some 8-series.
>>
>> Posting the asic at init time should be safe on all asics.
>>
>> > Anyway, if this is the solution to the poor behavior we've seen with
>> > assigned AMD cards, maybe someone could request the same for the closed
>> > drivers, including Windows. Thanks,
>>
>> The closed drivers already do this.
>
> Hmm, that's not terribly encouraging then since the majority of users
> are running Windows guests for the purpose of creating a gaming VM and
> still experiencing reset issues with the closed drivers there. Thanks,
I'll have to check with the windows team to see how much validation
they do with the windows driver as a qemu/kvm guest. It could be that
they don't properly detect that as a virtual case.
Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH] drm/radeon: fix asic initialization for virtualized environments
2016-06-15 17:00 ` Alex Deucher
@ 2016-06-16 15:43 ` Rodriguez, Andres
0 siblings, 0 replies; 8+ messages in thread
From: Rodriguez, Andres @ 2016-06-16 15:43 UTC (permalink / raw)
To: Alex Deucher, Alex Williamson
Cc: Deucher, Alexander, for 3.8, Maling list - DRI developers
> -----Original Message-----
> From: Alex Deucher [mailto:alexdeucher@gmail.com]
> Sent: June-15-16 1:00 PM
> To: Alex Williamson
> Cc: Maling list - DRI developers; Deucher, Alexander; Rodriguez, Andres; for
> 3.8
> Subject: Re: [PATCH] drm/radeon: fix asic initialization for virtualized
> environments
>
> On Wed, Jun 15, 2016 at 12:45 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Wed, 15 Jun 2016 02:23:37 -0400
> > Alex Deucher <alexdeucher@gmail.com> wrote:
> >
> >> On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson
> >> <alex.williamson@redhat.com> wrote:
> >> > On Mon, 13 Jun 2016 15:45:20 -0400
> >> > Alex Deucher <alexdeucher@gmail.com> wrote:
> >> >
> >> >> When executing in a PCI passthrough based virtuzliation
> >> >> environment, the hypervisor will usually attempt to send a PCIe
> >> >> bus reset signal to the ASIC when the VM reboots. In this
> >> >> scenario, the card is not correctly initialized, but we still
> >> >> consider it to be posted. Therefore, in a passthrough based
> >> >> environemnt we should always post the card to guarantee it is in a
> good state for driver initialization.
> >> >>
> >> >> Ported from amdgpu commit:
> >> >> amdgpu: fix asic initialization for virtualized environments
> >> >>
> >> >> Cc: Andres Rodriguez <andres.rodriguez@amd.com>
> >> >> Cc: Alex Williamson <alex.williamson@redhat.com>
> >> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >> >> Cc: stable@vger.kernel.org
> >> >> ---
> >> >> drivers/gpu/drm/radeon/radeon_device.c | 21
> +++++++++++++++++++++
> >> >> 1 file changed, 21 insertions(+)
> >> >
> >> > Thanks, I expect it's an improvement, though it's always a bit
> >> > disappointing when a driver starts modifying its behavior based on
> >> > what might be a transient feature of the platform, in this case a
> >> > hypervisor platform. For instance, why does our bus reset and
> >> > video ROM execution result in a different state than a physical
> >> > BIOS doing the same? Can't this condition occur regardless of a
> >> > hypervisor,
> >>
> >> Just doing a pci reset is not enough on newer cards. The hw handling
> >> pci resets changed in CI and more of the logic moved to the driver.
> >
> > Gag, please relay my disapproval to your hardware folks.
> >
> >> That does a limited reset, but not the registers that the driver
> >> checks to determine whether or not the asic has been posted so the
> >> driver skips posting and leaves the hw in a bad reset state.
> >>
> >> > perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or
> >> > perhaps simply a system BIOS optimized to post a limited set of devices.
> >>
> >> We can tell if a card has never been posted and properly post it.
> >> Where it's tricky is when a card has been posted and has subsequently
> >> been pci reset on CI and newer hw. I'm not sure of a good way to
> >> detect this particular scenario. Generally this is mainly done for
> >> qemu/kvm.
> >
> > How do you tell if a card has never been posted? Is it something we
> > could easily toggle after a bus reset?
>
> We check CONFIG_MEMSIZE which is a scratch register set by the asic_init
> command table to tell the driver how much vram is on the board.
>
Yeah, detecting a specific virtualization environment is something that I don't
really like. Specially since there isn't a nice generic way to do this for all
environments (i.e. in this patch I used x86 specific functionality). If there is a
generic approach that works on multiple host CPU architectures do let me know.
Another approach I considered was adding an option, amdgpu.always_post
that would shift the responsibility onto the user. But I don't think it's really fair
to have that expectation, and at some point you start falling into config hell.
As Alex mentioned, we don't really have a good mechanism for detecting
when we need to post due to how the HW handles PCIe bus reset. Hopefully
we can get this fixed for future ASICs.
> >
> >> > Detection based on some state of the device rather than an
> >> > expectation based on what the device is running on seems
> >> > preferable. I suspect Andres' patch for amdgpu only affects newer
> >> > devices, which pretty much all suffer reset issues, at least under
> >> > QEMU/VFIO, but I wonder how this patch affects existing working
> devices, like 6, 7, and some 8-series.
> >>
> >> Posting the asic at init time should be safe on all asics.
> >>
> >> > Anyway, if this is the solution to the poor behavior we've seen
> >> > with assigned AMD cards, maybe someone could request the same for
> >> > the closed drivers, including Windows. Thanks,
> >>
> >> The closed drivers already do this.
> >
> > Hmm, that's not terribly encouraging then since the majority of users
> > are running Windows guests for the purpose of creating a gaming VM and
> > still experiencing reset issues with the closed drivers there.
> > Thanks,
>
> I'll have to check with the windows team to see how much validation they do
> with the windows driver as a qemu/kvm guest. It could be that they don't
> properly detect that as a virtual case.
>
> Alex
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-06-16 15:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-13 19:45 [PATCH] drm/radeon: fix asic initialization for virtualized environments Alex Deucher
2016-06-13 19:45 ` Alex Deucher
2016-06-13 20:10 ` Alex Williamson
2016-06-15 6:23 ` Alex Deucher
2016-06-15 16:45 ` Alex Williamson
2016-06-15 16:45 ` Alex Williamson
2016-06-15 17:00 ` Alex Deucher
2016-06-16 15:43 ` Rodriguez, Andres
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.