All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/radeon: fix asic initialization for virtualized environments
@ 2016-06-13 19:45 ` Alex Deucher
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Deucher @ 2016-06-13 19:45 UTC (permalink / raw)
  To: dri-devel; +Cc: Alex Deucher, Andres Rodriguez, Alex Williamson, stable

When executing in a PCI passthrough based virtuzliation environment, the
hypervisor will usually attempt to send a PCIe bus reset signal to the
ASIC when the VM reboots. In this scenario, the card is not correctly
initialized, but we still consider it to be posted. Therefore, in a
passthrough based environemnt we should always post the card to guarantee
it is in a good state for driver initialization.

Ported from amdgpu commit:
amdgpu: fix asic initialization for virtualized environments

Cc: Andres Rodriguez <andres.rodriguez@amd.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index e61c763..21c44b2 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -630,6 +630,23 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc)
 /*
  * GPU helpers function.
  */
+
+/**
+ * radeon_device_is_virtual - check if we are running is a virtual environment
+ *
+ * Check if the asic has been passed through to a VM (all asics).
+ * Used at driver startup.
+ * Returns true if virtual or false if not.
+ */
+static bool radeon_device_is_virtual(void)
+{
+#ifdef CONFIG_X86
+	return boot_cpu_has(X86_FEATURE_HYPERVISOR);
+#else
+	return false;
+#endif
+}
+
 /**
  * radeon_card_posted - check if the hw has already been initialized
  *
@@ -643,6 +660,10 @@ bool radeon_card_posted(struct radeon_device *rdev)
 {
 	uint32_t reg;
 
+	/* for pass through, always force asic_init */
+	if (radeon_device_is_virtual())
+		return false;
+
 	/* required for EFI mode on macbook2,1 which uses an r5xx asic */
 	if (efi_enabled(EFI_BOOT) &&
 	    (rdev->pdev->subsystem_vendor == PCI_VENDOR_ID_APPLE) &&
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] drm/radeon: fix asic initialization for virtualized environments
@ 2016-06-13 19:45 ` Alex Deucher
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Deucher @ 2016-06-13 19:45 UTC (permalink / raw)
  To: dri-devel; +Cc: Alex Deucher, Alex Williamson, Andres Rodriguez, stable

When executing in a PCI passthrough based virtuzliation environment, the
hypervisor will usually attempt to send a PCIe bus reset signal to the
ASIC when the VM reboots. In this scenario, the card is not correctly
initialized, but we still consider it to be posted. Therefore, in a
passthrough based environemnt we should always post the card to guarantee
it is in a good state for driver initialization.

Ported from amdgpu commit:
amdgpu: fix asic initialization for virtualized environments

Cc: Andres Rodriguez <andres.rodriguez@amd.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index e61c763..21c44b2 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -630,6 +630,23 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc)
 /*
  * GPU helpers function.
  */
+
+/**
+ * radeon_device_is_virtual - check if we are running is a virtual environment
+ *
+ * Check if the asic has been passed through to a VM (all asics).
+ * Used at driver startup.
+ * Returns true if virtual or false if not.
+ */
+static bool radeon_device_is_virtual(void)
+{
+#ifdef CONFIG_X86
+	return boot_cpu_has(X86_FEATURE_HYPERVISOR);
+#else
+	return false;
+#endif
+}
+
 /**
  * radeon_card_posted - check if the hw has already been initialized
  *
@@ -643,6 +660,10 @@ bool radeon_card_posted(struct radeon_device *rdev)
 {
 	uint32_t reg;
 
+	/* for pass through, always force asic_init */
+	if (radeon_device_is_virtual())
+		return false;
+
 	/* required for EFI mode on macbook2,1 which uses an r5xx asic */
 	if (efi_enabled(EFI_BOOT) &&
 	    (rdev->pdev->subsystem_vendor == PCI_VENDOR_ID_APPLE) &&
-- 
2.5.5

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/radeon: fix asic initialization for virtualized environments
  2016-06-13 19:45 ` Alex Deucher
  (?)
@ 2016-06-13 20:10 ` Alex Williamson
  2016-06-15  6:23   ` Alex Deucher
  -1 siblings, 1 reply; 8+ messages in thread
From: Alex Williamson @ 2016-06-13 20:10 UTC (permalink / raw)
  To: Alex Deucher; +Cc: dri-devel, Alex Deucher, Andres Rodriguez, stable

On Mon, 13 Jun 2016 15:45:20 -0400
Alex Deucher <alexdeucher@gmail.com> wrote:

> When executing in a PCI passthrough based virtuzliation environment, the
> hypervisor will usually attempt to send a PCIe bus reset signal to the
> ASIC when the VM reboots. In this scenario, the card is not correctly
> initialized, but we still consider it to be posted. Therefore, in a
> passthrough based environemnt we should always post the card to guarantee
> it is in a good state for driver initialization.
> 
> Ported from amdgpu commit:
> amdgpu: fix asic initialization for virtualized environments
> 
> Cc: Andres Rodriguez <andres.rodriguez@amd.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)

Thanks, I expect it's an improvement, though it's always a bit
disappointing when a driver starts modifying its behavior based on
what might be a transient feature of the platform, in this case a
hypervisor platform.  For instance, why does our bus reset and video
ROM execution result in a different state than a physical BIOS doing
the same?  Can't this condition occur regardless of a hypervisor,
perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or perhaps
simply a system BIOS optimized to post a limited set of devices.
Detection based on some state of the device rather than an expectation
based on what the device is running on seems preferable.  I suspect
Andres' patch for amdgpu only affects newer devices, which pretty much
all suffer reset issues, at least under QEMU/VFIO, but I wonder how this
patch affects existing working devices, like 6, 7, and some 8-series.
Anyway, if this is the solution to the poor behavior we've seen with
assigned AMD cards, maybe someone could request the same for the closed
drivers, including Windows.  Thanks,

Alex

> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index e61c763..21c44b2 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -630,6 +630,23 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc)
>  /*
>   * GPU helpers function.
>   */
> +
> +/**
> + * radeon_device_is_virtual - check if we are running is a virtual environment
> + *
> + * Check if the asic has been passed through to a VM (all asics).
> + * Used at driver startup.
> + * Returns true if virtual or false if not.
> + */
> +static bool radeon_device_is_virtual(void)
> +{
> +#ifdef CONFIG_X86
> +	return boot_cpu_has(X86_FEATURE_HYPERVISOR);
> +#else
> +	return false;
> +#endif
> +}
> +
>  /**
>   * radeon_card_posted - check if the hw has already been initialized
>   *
> @@ -643,6 +660,10 @@ bool radeon_card_posted(struct radeon_device *rdev)
>  {
>  	uint32_t reg;
>  
> +	/* for pass through, always force asic_init */
> +	if (radeon_device_is_virtual())
> +		return false;
> +
>  	/* required for EFI mode on macbook2,1 which uses an r5xx asic */
>  	if (efi_enabled(EFI_BOOT) &&
>  	    (rdev->pdev->subsystem_vendor == PCI_VENDOR_ID_APPLE) &&


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/radeon: fix asic initialization for virtualized environments
  2016-06-13 20:10 ` Alex Williamson
@ 2016-06-15  6:23   ` Alex Deucher
  2016-06-15 16:45       ` Alex Williamson
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Deucher @ 2016-06-15  6:23 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Maling list - DRI developers, Alex Deucher, Andres Rodriguez, for 3.8

On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Mon, 13 Jun 2016 15:45:20 -0400
> Alex Deucher <alexdeucher@gmail.com> wrote:
>
>> When executing in a PCI passthrough based virtuzliation environment, the
>> hypervisor will usually attempt to send a PCIe bus reset signal to the
>> ASIC when the VM reboots. In this scenario, the card is not correctly
>> initialized, but we still consider it to be posted. Therefore, in a
>> passthrough based environemnt we should always post the card to guarantee
>> it is in a good state for driver initialization.
>>
>> Ported from amdgpu commit:
>> amdgpu: fix asic initialization for virtualized environments
>>
>> Cc: Andres Rodriguez <andres.rodriguez@amd.com>
>> Cc: Alex Williamson <alex.williamson@redhat.com>
>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> Cc: stable@vger.kernel.org
>> ---
>>  drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
>>  1 file changed, 21 insertions(+)
>
> Thanks, I expect it's an improvement, though it's always a bit
> disappointing when a driver starts modifying its behavior based on
> what might be a transient feature of the platform, in this case a
> hypervisor platform.  For instance, why does our bus reset and video
> ROM execution result in a different state than a physical BIOS doing
> the same?  Can't this condition occur regardless of a hypervisor,

Just doing a pci reset is not enough on newer cards.  The hw handling
pci resets changed in CI and more of the logic moved to the driver.
That does a limited reset, but not the registers that the driver
checks to determine whether or not the asic has been posted so the
driver skips posting and leaves the hw in a bad reset state.

> perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or perhaps
> simply a system BIOS optimized to post a limited set of devices.

We can tell if a card has never been posted and properly post it.
Where it's tricky is when a card has been posted and has subsequently
been pci reset on CI and newer hw.  I'm not sure of a good way to
detect this particular scenario.  Generally this is mainly done for
qemu/kvm.

> Detection based on some state of the device rather than an expectation
> based on what the device is running on seems preferable.  I suspect
> Andres' patch for amdgpu only affects newer devices, which pretty much
> all suffer reset issues, at least under QEMU/VFIO, but I wonder how this
> patch affects existing working devices, like 6, 7, and some 8-series.

Posting the asic at init time should be safe on all asics.

> Anyway, if this is the solution to the poor behavior we've seen with
> assigned AMD cards, maybe someone could request the same for the closed
> drivers, including Windows.  Thanks,

The closed drivers already do this.

Alex

>
> Alex
>
>> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
>> index e61c763..21c44b2 100644
>> --- a/drivers/gpu/drm/radeon/radeon_device.c
>> +++ b/drivers/gpu/drm/radeon/radeon_device.c
>> @@ -630,6 +630,23 @@ void radeon_gtt_location(struct radeon_device *rdev, struct radeon_mc *mc)
>>  /*
>>   * GPU helpers function.
>>   */
>> +
>> +/**
>> + * radeon_device_is_virtual - check if we are running is a virtual environment
>> + *
>> + * Check if the asic has been passed through to a VM (all asics).
>> + * Used at driver startup.
>> + * Returns true if virtual or false if not.
>> + */
>> +static bool radeon_device_is_virtual(void)
>> +{
>> +#ifdef CONFIG_X86
>> +     return boot_cpu_has(X86_FEATURE_HYPERVISOR);
>> +#else
>> +     return false;
>> +#endif
>> +}
>> +
>>  /**
>>   * radeon_card_posted - check if the hw has already been initialized
>>   *
>> @@ -643,6 +660,10 @@ bool radeon_card_posted(struct radeon_device *rdev)
>>  {
>>       uint32_t reg;
>>
>> +     /* for pass through, always force asic_init */
>> +     if (radeon_device_is_virtual())
>> +             return false;
>> +
>>       /* required for EFI mode on macbook2,1 which uses an r5xx asic */
>>       if (efi_enabled(EFI_BOOT) &&
>>           (rdev->pdev->subsystem_vendor == PCI_VENDOR_ID_APPLE) &&
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/radeon: fix asic initialization for virtualized environments
  2016-06-15  6:23   ` Alex Deucher
@ 2016-06-15 16:45       ` Alex Williamson
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2016-06-15 16:45 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Maling list - DRI developers, Alex Deucher, Andres Rodriguez, for 3.8

On Wed, 15 Jun 2016 02:23:37 -0400
Alex Deucher <alexdeucher@gmail.com> wrote:

> On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Mon, 13 Jun 2016 15:45:20 -0400
> > Alex Deucher <alexdeucher@gmail.com> wrote:
> >  
> >> When executing in a PCI passthrough based virtuzliation environment, the
> >> hypervisor will usually attempt to send a PCIe bus reset signal to the
> >> ASIC when the VM reboots. In this scenario, the card is not correctly
> >> initialized, but we still consider it to be posted. Therefore, in a
> >> passthrough based environemnt we should always post the card to guarantee
> >> it is in a good state for driver initialization.
> >>
> >> Ported from amdgpu commit:
> >> amdgpu: fix asic initialization for virtualized environments
> >>
> >> Cc: Andres Rodriguez <andres.rodriguez@amd.com>
> >> Cc: Alex Williamson <alex.williamson@redhat.com>
> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >> Cc: stable@vger.kernel.org
> >> ---
> >>  drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
> >>  1 file changed, 21 insertions(+)  
> >
> > Thanks, I expect it's an improvement, though it's always a bit
> > disappointing when a driver starts modifying its behavior based on
> > what might be a transient feature of the platform, in this case a
> > hypervisor platform.  For instance, why does our bus reset and video
> > ROM execution result in a different state than a physical BIOS doing
> > the same?  Can't this condition occur regardless of a hypervisor,  
> 
> Just doing a pci reset is not enough on newer cards.  The hw handling
> pci resets changed in CI and more of the logic moved to the driver.

Gag, please relay my disapproval to your hardware folks.

> That does a limited reset, but not the registers that the driver
> checks to determine whether or not the asic has been posted so the
> driver skips posting and leaves the hw in a bad reset state.
> 
> > perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or perhaps
> > simply a system BIOS optimized to post a limited set of devices.  
> 
> We can tell if a card has never been posted and properly post it.
> Where it's tricky is when a card has been posted and has subsequently
> been pci reset on CI and newer hw.  I'm not sure of a good way to
> detect this particular scenario.  Generally this is mainly done for
> qemu/kvm.

How do you tell if a card has never been posted?  Is it something we
could easily toggle after a bus reset?
 
> > Detection based on some state of the device rather than an expectation
> > based on what the device is running on seems preferable.  I suspect
> > Andres' patch for amdgpu only affects newer devices, which pretty much
> > all suffer reset issues, at least under QEMU/VFIO, but I wonder how this
> > patch affects existing working devices, like 6, 7, and some 8-series.  
> 
> Posting the asic at init time should be safe on all asics.
> 
> > Anyway, if this is the solution to the poor behavior we've seen with
> > assigned AMD cards, maybe someone could request the same for the closed
> > drivers, including Windows.  Thanks,  
> 
> The closed drivers already do this.

Hmm, that's not terribly encouraging then since the majority of users
are running Windows guests for the purpose of creating a gaming VM and
still experiencing reset issues with the closed drivers there.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/radeon: fix asic initialization for virtualized environments
@ 2016-06-15 16:45       ` Alex Williamson
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2016-06-15 16:45 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Alex Deucher, Andres Rodriguez, for 3.8, Maling list - DRI developers

On Wed, 15 Jun 2016 02:23:37 -0400
Alex Deucher <alexdeucher@gmail.com> wrote:

> On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Mon, 13 Jun 2016 15:45:20 -0400
> > Alex Deucher <alexdeucher@gmail.com> wrote:
> >  
> >> When executing in a PCI passthrough based virtuzliation environment, the
> >> hypervisor will usually attempt to send a PCIe bus reset signal to the
> >> ASIC when the VM reboots. In this scenario, the card is not correctly
> >> initialized, but we still consider it to be posted. Therefore, in a
> >> passthrough based environemnt we should always post the card to guarantee
> >> it is in a good state for driver initialization.
> >>
> >> Ported from amdgpu commit:
> >> amdgpu: fix asic initialization for virtualized environments
> >>
> >> Cc: Andres Rodriguez <andres.rodriguez@amd.com>
> >> Cc: Alex Williamson <alex.williamson@redhat.com>
> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >> Cc: stable@vger.kernel.org
> >> ---
> >>  drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
> >>  1 file changed, 21 insertions(+)  
> >
> > Thanks, I expect it's an improvement, though it's always a bit
> > disappointing when a driver starts modifying its behavior based on
> > what might be a transient feature of the platform, in this case a
> > hypervisor platform.  For instance, why does our bus reset and video
> > ROM execution result in a different state than a physical BIOS doing
> > the same?  Can't this condition occur regardless of a hypervisor,  
> 
> Just doing a pci reset is not enough on newer cards.  The hw handling
> pci resets changed in CI and more of the logic moved to the driver.

Gag, please relay my disapproval to your hardware folks.

> That does a limited reset, but not the registers that the driver
> checks to determine whether or not the asic has been posted so the
> driver skips posting and leaves the hw in a bad reset state.
> 
> > perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or perhaps
> > simply a system BIOS optimized to post a limited set of devices.  
> 
> We can tell if a card has never been posted and properly post it.
> Where it's tricky is when a card has been posted and has subsequently
> been pci reset on CI and newer hw.  I'm not sure of a good way to
> detect this particular scenario.  Generally this is mainly done for
> qemu/kvm.

How do you tell if a card has never been posted?  Is it something we
could easily toggle after a bus reset?
 
> > Detection based on some state of the device rather than an expectation
> > based on what the device is running on seems preferable.  I suspect
> > Andres' patch for amdgpu only affects newer devices, which pretty much
> > all suffer reset issues, at least under QEMU/VFIO, but I wonder how this
> > patch affects existing working devices, like 6, 7, and some 8-series.  
> 
> Posting the asic at init time should be safe on all asics.
> 
> > Anyway, if this is the solution to the poor behavior we've seen with
> > assigned AMD cards, maybe someone could request the same for the closed
> > drivers, including Windows.  Thanks,  
> 
> The closed drivers already do this.

Hmm, that's not terribly encouraging then since the majority of users
are running Windows guests for the purpose of creating a gaming VM and
still experiencing reset issues with the closed drivers there.  Thanks,

Alex
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] drm/radeon: fix asic initialization for virtualized environments
  2016-06-15 16:45       ` Alex Williamson
  (?)
@ 2016-06-15 17:00       ` Alex Deucher
  2016-06-16 15:43         ` Rodriguez, Andres
  -1 siblings, 1 reply; 8+ messages in thread
From: Alex Deucher @ 2016-06-15 17:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Maling list - DRI developers, Alex Deucher, Andres Rodriguez, for 3.8

On Wed, Jun 15, 2016 at 12:45 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Wed, 15 Jun 2016 02:23:37 -0400
> Alex Deucher <alexdeucher@gmail.com> wrote:
>
>> On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> > On Mon, 13 Jun 2016 15:45:20 -0400
>> > Alex Deucher <alexdeucher@gmail.com> wrote:
>> >
>> >> When executing in a PCI passthrough based virtuzliation environment, the
>> >> hypervisor will usually attempt to send a PCIe bus reset signal to the
>> >> ASIC when the VM reboots. In this scenario, the card is not correctly
>> >> initialized, but we still consider it to be posted. Therefore, in a
>> >> passthrough based environemnt we should always post the card to guarantee
>> >> it is in a good state for driver initialization.
>> >>
>> >> Ported from amdgpu commit:
>> >> amdgpu: fix asic initialization for virtualized environments
>> >>
>> >> Cc: Andres Rodriguez <andres.rodriguez@amd.com>
>> >> Cc: Alex Williamson <alex.williamson@redhat.com>
>> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>> >> Cc: stable@vger.kernel.org
>> >> ---
>> >>  drivers/gpu/drm/radeon/radeon_device.c | 21 +++++++++++++++++++++
>> >>  1 file changed, 21 insertions(+)
>> >
>> > Thanks, I expect it's an improvement, though it's always a bit
>> > disappointing when a driver starts modifying its behavior based on
>> > what might be a transient feature of the platform, in this case a
>> > hypervisor platform.  For instance, why does our bus reset and video
>> > ROM execution result in a different state than a physical BIOS doing
>> > the same?  Can't this condition occur regardless of a hypervisor,
>>
>> Just doing a pci reset is not enough on newer cards.  The hw handling
>> pci resets changed in CI and more of the logic moved to the driver.
>
> Gag, please relay my disapproval to your hardware folks.
>
>> That does a limited reset, but not the registers that the driver
>> checks to determine whether or not the asic has been posted so the
>> driver skips posting and leaves the hw in a bad reset state.
>>
>> > perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or perhaps
>> > simply a system BIOS optimized to post a limited set of devices.
>>
>> We can tell if a card has never been posted and properly post it.
>> Where it's tricky is when a card has been posted and has subsequently
>> been pci reset on CI and newer hw.  I'm not sure of a good way to
>> detect this particular scenario.  Generally this is mainly done for
>> qemu/kvm.
>
> How do you tell if a card has never been posted?  Is it something we
> could easily toggle after a bus reset?

We check CONFIG_MEMSIZE which is a scratch register set by the
asic_init command table to tell the driver how much vram is on the
board.

>
>> > Detection based on some state of the device rather than an expectation
>> > based on what the device is running on seems preferable.  I suspect
>> > Andres' patch for amdgpu only affects newer devices, which pretty much
>> > all suffer reset issues, at least under QEMU/VFIO, but I wonder how this
>> > patch affects existing working devices, like 6, 7, and some 8-series.
>>
>> Posting the asic at init time should be safe on all asics.
>>
>> > Anyway, if this is the solution to the poor behavior we've seen with
>> > assigned AMD cards, maybe someone could request the same for the closed
>> > drivers, including Windows.  Thanks,
>>
>> The closed drivers already do this.
>
> Hmm, that's not terribly encouraging then since the majority of users
> are running Windows guests for the purpose of creating a gaming VM and
> still experiencing reset issues with the closed drivers there.  Thanks,

I'll have to check with the windows team to see how much validation
they do with the windows driver as a qemu/kvm guest.  It could be that
they don't properly detect that as a virtual case.

Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] drm/radeon: fix asic initialization for virtualized environments
  2016-06-15 17:00       ` Alex Deucher
@ 2016-06-16 15:43         ` Rodriguez, Andres
  0 siblings, 0 replies; 8+ messages in thread
From: Rodriguez, Andres @ 2016-06-16 15:43 UTC (permalink / raw)
  To: Alex Deucher, Alex Williamson
  Cc: Deucher, Alexander, for 3.8, Maling list - DRI developers



> -----Original Message-----
> From: Alex Deucher [mailto:alexdeucher@gmail.com]
> Sent: June-15-16 1:00 PM
> To: Alex Williamson
> Cc: Maling list - DRI developers; Deucher, Alexander; Rodriguez, Andres; for
> 3.8
> Subject: Re: [PATCH] drm/radeon: fix asic initialization for virtualized
> environments
> 
> On Wed, Jun 15, 2016 at 12:45 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Wed, 15 Jun 2016 02:23:37 -0400
> > Alex Deucher <alexdeucher@gmail.com> wrote:
> >
> >> On Mon, Jun 13, 2016 at 4:10 PM, Alex Williamson
> >> <alex.williamson@redhat.com> wrote:
> >> > On Mon, 13 Jun 2016 15:45:20 -0400
> >> > Alex Deucher <alexdeucher@gmail.com> wrote:
> >> >
> >> >> When executing in a PCI passthrough based virtuzliation
> >> >> environment, the hypervisor will usually attempt to send a PCIe
> >> >> bus reset signal to the ASIC when the VM reboots. In this
> >> >> scenario, the card is not correctly initialized, but we still
> >> >> consider it to be posted. Therefore, in a passthrough based
> >> >> environemnt we should always post the card to guarantee it is in a
> good state for driver initialization.
> >> >>
> >> >> Ported from amdgpu commit:
> >> >> amdgpu: fix asic initialization for virtualized environments
> >> >>
> >> >> Cc: Andres Rodriguez <andres.rodriguez@amd.com>
> >> >> Cc: Alex Williamson <alex.williamson@redhat.com>
> >> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >> >> Cc: stable@vger.kernel.org
> >> >> ---
> >> >>  drivers/gpu/drm/radeon/radeon_device.c | 21
> +++++++++++++++++++++
> >> >>  1 file changed, 21 insertions(+)
> >> >
> >> > Thanks, I expect it's an improvement, though it's always a bit
> >> > disappointing when a driver starts modifying its behavior based on
> >> > what might be a transient feature of the platform, in this case a
> >> > hypervisor platform.  For instance, why does our bus reset and
> >> > video ROM execution result in a different state than a physical
> >> > BIOS doing the same?  Can't this condition occur regardless of a
> >> > hypervisor,
> >>
> >> Just doing a pci reset is not enough on newer cards.  The hw handling
> >> pci resets changed in CI and more of the logic moved to the driver.
> >
> > Gag, please relay my disapproval to your hardware folks.
> >
> >> That does a limited reset, but not the registers that the driver
> >> checks to determine whether or not the asic has been posted so the
> >> driver skips posting and leaves the hw in a bad reset state.
> >>
> >> > perhaps a rare hot-add of a GPU, a bare metal kexec reboot, or
> >> > perhaps simply a system BIOS optimized to post a limited set of devices.
> >>
> >> We can tell if a card has never been posted and properly post it.
> >> Where it's tricky is when a card has been posted and has subsequently
> >> been pci reset on CI and newer hw.  I'm not sure of a good way to
> >> detect this particular scenario.  Generally this is mainly done for
> >> qemu/kvm.
> >
> > How do you tell if a card has never been posted?  Is it something we
> > could easily toggle after a bus reset?
> 
> We check CONFIG_MEMSIZE which is a scratch register set by the asic_init
> command table to tell the driver how much vram is on the board.
> 

Yeah, detecting a specific virtualization environment is something that I don't
really like. Specially since there isn't a nice generic way to do this for all
environments (i.e. in this patch I used x86 specific functionality). If there is a 
generic approach that works on multiple host CPU architectures do let me know.

Another approach I considered was adding an option, amdgpu.always_post
that would shift the responsibility onto the user. But I don't think it's really fair
to have that expectation, and at some point you start falling into config hell.

As Alex mentioned, we don't really have a good mechanism for detecting
when we need to post due to how the HW handles PCIe bus reset. Hopefully
we can get this fixed for future ASICs.

> >
> >> > Detection based on some state of the device rather than an
> >> > expectation based on what the device is running on seems
> >> > preferable.  I suspect Andres' patch for amdgpu only affects newer
> >> > devices, which pretty much all suffer reset issues, at least under
> >> > QEMU/VFIO, but I wonder how this patch affects existing working
> devices, like 6, 7, and some 8-series.
> >>
> >> Posting the asic at init time should be safe on all asics.
> >>
> >> > Anyway, if this is the solution to the poor behavior we've seen
> >> > with assigned AMD cards, maybe someone could request the same for
> >> > the closed drivers, including Windows.  Thanks,
> >>
> >> The closed drivers already do this.
> >
> > Hmm, that's not terribly encouraging then since the majority of users
> > are running Windows guests for the purpose of creating a gaming VM and
> > still experiencing reset issues with the closed drivers there.
> > Thanks,
> 
> I'll have to check with the windows team to see how much validation they do
> with the windows driver as a qemu/kvm guest.  It could be that they don't
> properly detect that as a virtual case.
> 
> Alex
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-06-16 15:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-13 19:45 [PATCH] drm/radeon: fix asic initialization for virtualized environments Alex Deucher
2016-06-13 19:45 ` Alex Deucher
2016-06-13 20:10 ` Alex Williamson
2016-06-15  6:23   ` Alex Deucher
2016-06-15 16:45     ` Alex Williamson
2016-06-15 16:45       ` Alex Williamson
2016-06-15 17:00       ` Alex Deucher
2016-06-16 15:43         ` Rodriguez, Andres

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.