linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
@ 2018-10-02  3:20 Alexey Kardashevskiy
  2018-10-15  7:17 ` Alexey Kardashevskiy
  0 siblings, 1 reply; 13+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-02  3:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Reza Arbab, David Gibson

The skiboot firmware has a hot reset handler which fences the NVIDIA V100
GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs:
https://github.com/open-power/skiboot/commit/fca2b2b839a67

Now we are going to pass V100 via VFIO which most certainly involves
KVM guests which are often terminated without getting a chance to offline
GPU RAM so we end up with a running machine with misconfigured memory.
Accessing this memory produces hardware management interrupts (HMI)
which bring the host down.

To suppress HMIs, this wires up this hot reset hook to vfio_pci_disable()
via pci_disable_device() which switches NPU2 to a safe mode and prevents
HMIs.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* updated the commit log
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index cde7102..e37b9cc 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
 		pnv_ioda_release_pe(pe);
 }
 
+static void pnv_npu_disable_device(struct pci_dev *pdev)
+{
+	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
+	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
+
+	if (eehpe && eeh_ops && eeh_ops->reset)
+		eeh_ops->reset(eehpe, EEH_RESET_HOT);
+}
+
 static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
 {
 	struct pnv_phb *phb = hose->private_data;
@@ -3732,6 +3741,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
 	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
 	.dma_set_mask		= pnv_npu_dma_set_mask,
 	.shutdown		= pnv_pci_ioda_shutdown,
+	.disable_device		= pnv_npu_disable_device,
 };
 
 static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
  2018-10-02  3:20 [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2 Alexey Kardashevskiy
@ 2018-10-15  7:17 ` Alexey Kardashevskiy
  2018-10-16  0:38   ` Alistair Popple
  0 siblings, 1 reply; 13+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-15  7:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alistair Popple, Reza Arbab, David Gibson

Ping?


On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
> The skiboot firmware has a hot reset handler which fences the NVIDIA V100
> GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs:
> https://github.com/open-power/skiboot/commit/fca2b2b839a67
> 
> Now we are going to pass V100 via VFIO which most certainly involves
> KVM guests which are often terminated without getting a chance to offline
> GPU RAM so we end up with a running machine with misconfigured memory.
> Accessing this memory produces hardware management interrupts (HMI)
> which bring the host down.
> 
> To suppress HMIs, this wires up this hot reset hook to vfio_pci_disable()
> via pci_disable_device() which switches NPU2 to a safe mode and prevents
> HMIs.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * updated the commit log
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index cde7102..e37b9cc 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
>  		pnv_ioda_release_pe(pe);
>  }
>  
> +static void pnv_npu_disable_device(struct pci_dev *pdev)
> +{
> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
> +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
> +
> +	if (eehpe && eeh_ops && eeh_ops->reset)
> +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
> +}
> +
>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
>  {
>  	struct pnv_phb *phb = hose->private_data;
> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
>  	.dma_set_mask		= pnv_npu_dma_set_mask,
>  	.shutdown		= pnv_pci_ioda_shutdown,
> +	.disable_device		= pnv_npu_disable_device,
>  };
>  
>  static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
  2018-10-15  7:17 ` Alexey Kardashevskiy
@ 2018-10-16  0:38   ` Alistair Popple
  2018-10-16  1:37     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 13+ messages in thread
From: Alistair Popple @ 2018-10-16  0:38 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Reza Arbab, linuxppc-dev, David Gibson

Hi Alexey,

Looking at the skiboot side I think we only fence the NVLink bricks as part of a
PCIe function level reset (FLR) rather than a PCI Hot or Fundamental reset which
I believe is what the code here does. So to fence the bricks you would need to
do either a FLR on the given link or alter Skiboot to fence a given link as part
of a hot reset.

- Alistair

On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy wrote:
> Ping?
> 
> 
> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
> > The skiboot firmware has a hot reset handler which fences the NVIDIA V100
> > GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs:
> > https://github.com/open-power/skiboot/commit/fca2b2b839a67
> > 
> > Now we are going to pass V100 via VFIO which most certainly involves
> > KVM guests which are often terminated without getting a chance to offline
> > GPU RAM so we end up with a running machine with misconfigured memory.
> > Accessing this memory produces hardware management interrupts (HMI)
> > which bring the host down.
> > 
> > To suppress HMIs, this wires up this hot reset hook to vfio_pci_disable()
> > via pci_disable_device() which switches NPU2 to a safe mode and prevents
> > HMIs.
> > 
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> > Changes:
> > v2:
> > * updated the commit log
> > ---
> >  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index cde7102..e37b9cc 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
> >  		pnv_ioda_release_pe(pe);
> >  }
> >  
> > +static void pnv_npu_disable_device(struct pci_dev *pdev)
> > +{
> > +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
> > +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
> > +
> > +	if (eehpe && eeh_ops && eeh_ops->reset)
> > +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
> > +}
> > +
> >  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
> >  {
> >  	struct pnv_phb *phb = hose->private_data;
> > @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
> >  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
> >  	.dma_set_mask		= pnv_npu_dma_set_mask,
> >  	.shutdown		= pnv_pci_ioda_shutdown,
> > +	.disable_device		= pnv_npu_disable_device,
> >  };
> >  
> >  static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
> > 
> 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
  2018-10-16  0:38   ` Alistair Popple
@ 2018-10-16  1:37     ` Alexey Kardashevskiy
  2018-10-16  1:44       ` Alistair Popple
  0 siblings, 1 reply; 13+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-16  1:37 UTC (permalink / raw)
  To: Alistair Popple; +Cc: Reza Arbab, linuxppc-dev, David Gibson



On 16/10/2018 11:38, Alistair Popple wrote:
> Hi Alexey,
> 
> Looking at the skiboot side I think we only fence the NVLink bricks as part of a
> PCIe function level reset (FLR) rather than a PCI Hot or Fundamental reset which
> I believe is what the code here does. So to fence the bricks you would need to
> do either a FLR on the given link or alter Skiboot to fence a given link as part
> of a hot reset.

The code here calls OPAL to execute this code:

https://github.com/open-power/skiboot/commit/fca2b2b839a673a1e52fc6b19ee6d33b2dfbc003

This resets all links on an NPU which is fine for now as we pass GPUs in
groups only. Or I missed something?



> 
> - Alistair
> 
> On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy wrote:
>> Ping?
>>
>>
>> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
>>> The skiboot firmware has a hot reset handler which fences the NVIDIA V100
>>> GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs:
>>> https://github.com/open-power/skiboot/commit/fca2b2b839a67
>>>
>>> Now we are going to pass V100 via VFIO which most certainly involves
>>> KVM guests which are often terminated without getting a chance to offline
>>> GPU RAM so we end up with a running machine with misconfigured memory.
>>> Accessing this memory produces hardware management interrupts (HMI)
>>> which bring the host down.
>>>
>>> To suppress HMIs, this wires up this hot reset hook to vfio_pci_disable()
>>> via pci_disable_device() which switches NPU2 to a safe mode and prevents
>>> HMIs.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>> Changes:
>>> v2:
>>> * updated the commit log
>>> ---
>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
>>>  1 file changed, 10 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index cde7102..e37b9cc 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
>>>  		pnv_ioda_release_pe(pe);
>>>  }
>>>  
>>> +static void pnv_npu_disable_device(struct pci_dev *pdev)
>>> +{
>>> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
>>> +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
>>> +
>>> +	if (eehpe && eeh_ops && eeh_ops->reset)
>>> +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
>>> +}
>>> +
>>>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
>>>  {
>>>  	struct pnv_phb *phb = hose->private_data;
>>> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
>>>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
>>>  	.dma_set_mask		= pnv_npu_dma_set_mask,
>>>  	.shutdown		= pnv_pci_ioda_shutdown,
>>> +	.disable_device		= pnv_npu_disable_device,
>>>  };
>>>  
>>>  static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
>>>
>>
>>
> 
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
  2018-10-16  1:37     ` Alexey Kardashevskiy
@ 2018-10-16  1:44       ` Alistair Popple
  2018-10-16  2:02         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 13+ messages in thread
From: Alistair Popple @ 2018-10-16  1:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Reza Arbab, linuxppc-dev, David Gibson

Hi Alexey,

On Tuesday, 16 October 2018 12:37:49 PM AEDT Alexey Kardashevskiy wrote:
> 
> On 16/10/2018 11:38, Alistair Popple wrote:
> > Hi Alexey,
> > 
> > Looking at the skiboot side I think we only fence the NVLink bricks as part of a
> > PCIe function level reset (FLR) rather than a PCI Hot or Fundamental reset which
> > I believe is what the code here does. So to fence the bricks you would need to
> > do either a FLR on the given link or alter Skiboot to fence a given link as part
> > of a hot reset.
> 
> The code here calls OPAL to execute this code:
> 
> https://github.com/open-power/skiboot/commit/fca2b2b839a673a1e52fc6b19ee6d33b2dfbc003
> 
> This resets all links on an NPU which is fine for now as we pass GPUs in
> groups only. Or I missed something?

From what I can see in
https://github.com/open-power/skiboot/commit/2947eaa14e771e572d4e84bf003318c590c1c7d4
we only fence the bricks in npu2_dev_procedure_reset()
(https://github.com/open-power/skiboot/blob/master/hw/npu2-hw-procedures.c#L937)
which itself is only called from the FLR path and not the code path you point
out above.

The code path above only resets the NTL which I don't believe is sufficient to
prevent HMIs.

- Alistair

>
> 
> 
> > 
> > - Alistair
> > 
> > On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy wrote:
> >> Ping?
> >>
> >>
> >> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
> >>> The skiboot firmware has a hot reset handler which fences the NVIDIA V100
> >>> GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs:
> >>> https://github.com/open-power/skiboot/commit/fca2b2b839a67
> >>>
> >>> Now we are going to pass V100 via VFIO which most certainly involves
> >>> KVM guests which are often terminated without getting a chance to offline
> >>> GPU RAM so we end up with a running machine with misconfigured memory.
> >>> Accessing this memory produces hardware management interrupts (HMI)
> >>> which bring the host down.
> >>>
> >>> To suppress HMIs, this wires up this hot reset hook to vfio_pci_disable()
> >>> via pci_disable_device() which switches NPU2 to a safe mode and prevents
> >>> HMIs.
> >>>
> >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>> ---
> >>> Changes:
> >>> v2:
> >>> * updated the commit log
> >>> ---
> >>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
> >>>  1 file changed, 10 insertions(+)
> >>>
> >>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>> index cde7102..e37b9cc 100644
> >>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
> >>>  		pnv_ioda_release_pe(pe);
> >>>  }
> >>>  
> >>> +static void pnv_npu_disable_device(struct pci_dev *pdev)
> >>> +{
> >>> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
> >>> +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
> >>> +
> >>> +	if (eehpe && eeh_ops && eeh_ops->reset)
> >>> +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
> >>> +}
> >>> +
> >>>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
> >>>  {
> >>>  	struct pnv_phb *phb = hose->private_data;
> >>> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
> >>>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
> >>>  	.dma_set_mask		= pnv_npu_dma_set_mask,
> >>>  	.shutdown		= pnv_pci_ioda_shutdown,
> >>> +	.disable_device		= pnv_npu_disable_device,
> >>>  };
> >>>  
> >>>  static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
> >>>
> >>
> >>
> > 
> > 
> 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
  2018-10-16  1:44       ` Alistair Popple
@ 2018-10-16  2:02         ` Alexey Kardashevskiy
  2018-10-16  2:19           ` Alistair Popple
  0 siblings, 1 reply; 13+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-16  2:02 UTC (permalink / raw)
  To: Alistair Popple; +Cc: Reza Arbab, linuxppc-dev, David Gibson



On 16/10/2018 12:44, Alistair Popple wrote:
> Hi Alexey,
> 
> On Tuesday, 16 October 2018 12:37:49 PM AEDT Alexey Kardashevskiy wrote:
>>
>> On 16/10/2018 11:38, Alistair Popple wrote:
>>> Hi Alexey,
>>>
>>> Looking at the skiboot side I think we only fence the NVLink bricks as part of a
>>> PCIe function level reset (FLR) rather than a PCI Hot or Fundamental reset which
>>> I believe is what the code here does. So to fence the bricks you would need to
>>> do either a FLR on the given link or alter Skiboot to fence a given link as part
>>> of a hot reset.
>>
>> The code here calls OPAL to execute this code:
>>
>> https://github.com/open-power/skiboot/commit/fca2b2b839a673a1e52fc6b19ee6d33b2dfbc003
>>
>> This resets all links on an NPU which is fine for now as we pass GPUs in
>> groups only. Or I missed something?
> 
> From what I can see in
> https://github.com/open-power/skiboot/commit/2947eaa14e771e572d4e84bf003318c590c1c7d4
> we only fence the bricks in npu2_dev_procedure_reset()
> (https://github.com/open-power/skiboot/blob/master/hw/npu2-hw-procedures.c#L937)
> which itself is only called from the FLR path and not the code path you point
> out above.
> 
> The code path above only resets the NTL which I don't believe is sufficient to
> prevent HMIs.

reset_ntl() does what npu2_dev_procedure_reset() does plus more stuff,
there nothing really in npu2_dev_procedure_reset() which reset_ntl()
does not do already from the hardware standpoint. And it did stop HMIs
for me though.

but ok, what will be sufficient then if not reset_ntl()?



> 
> - Alistair
> 
>>
>>
>>
>>>
>>> - Alistair
>>>
>>> On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy wrote:
>>>> Ping?
>>>>
>>>>
>>>> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
>>>>> The skiboot firmware has a hot reset handler which fences the NVIDIA V100
>>>>> GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs:
>>>>> https://github.com/open-power/skiboot/commit/fca2b2b839a67
>>>>>
>>>>> Now we are going to pass V100 via VFIO which most certainly involves
>>>>> KVM guests which are often terminated without getting a chance to offline
>>>>> GPU RAM so we end up with a running machine with misconfigured memory.
>>>>> Accessing this memory produces hardware management interrupts (HMI)
>>>>> which bring the host down.
>>>>>
>>>>> To suppress HMIs, this wires up this hot reset hook to vfio_pci_disable()
>>>>> via pci_disable_device() which switches NPU2 to a safe mode and prevents
>>>>> HMIs.
>>>>>
>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>> ---
>>>>> Changes:
>>>>> v2:
>>>>> * updated the commit log
>>>>> ---
>>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
>>>>>  1 file changed, 10 insertions(+)
>>>>>
>>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>> index cde7102..e37b9cc 100644
>>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
>>>>>  		pnv_ioda_release_pe(pe);
>>>>>  }
>>>>>  
>>>>> +static void pnv_npu_disable_device(struct pci_dev *pdev)
>>>>> +{
>>>>> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
>>>>> +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
>>>>> +
>>>>> +	if (eehpe && eeh_ops && eeh_ops->reset)
>>>>> +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
>>>>> +}
>>>>> +
>>>>>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
>>>>>  {
>>>>>  	struct pnv_phb *phb = hose->private_data;
>>>>> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
>>>>>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
>>>>>  	.dma_set_mask		= pnv_npu_dma_set_mask,
>>>>>  	.shutdown		= pnv_pci_ioda_shutdown,
>>>>> +	.disable_device		= pnv_npu_disable_device,
>>>>>  };
>>>>>  
>>>>>  static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
> 
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
  2018-10-16  2:02         ` Alexey Kardashevskiy
@ 2018-10-16  2:19           ` Alistair Popple
  2018-10-16  2:22             ` Alexey Kardashevskiy
  0 siblings, 1 reply; 13+ messages in thread
From: Alistair Popple @ 2018-10-16  2:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Reza Arbab, linuxppc-dev, David Gibson

> reset_ntl() does what npu2_dev_procedure_reset() does plus more stuff,
> there nothing really in npu2_dev_procedure_reset() which reset_ntl()
> does not do already from the hardware standpoint. And it did stop HMIs
> for me though.
> 
> but ok, what will be sufficient then if not reset_ntl()?

Argh, yes you are correct. Specifically both npu2_dev_procedure_reset() and
reset_ntl() contain:

	/* NTL Reset */
	val = npu2_read(ndev->npu, NPU2_NTL_MISC_CFG1(ndev));
	val |= PPC_BIT(8) | PPC_BIT(9);
	npu2_write(ndev->npu, NPU2_NTL_MISC_CFG1(ndev), val);

Which should fence the brick. However from what I recall there was more to
reliably preventing HMIs than merely fencing the brick. It invovled a sequence
of fencing and flushing the cache with dcbf instructions at the right time which
is why we also have the FLR. Unfortunately I don't know the precise details,
perhaps if we send enough coffee Balbir's way he might be able remind us?

- Alistair

> 
> 
> > 
> > - Alistair
> > 
> >>
> >>
> >>
> >>>
> >>> - Alistair
> >>>
> >>> On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy wrote:
> >>>> Ping?
> >>>>
> >>>>
> >>>> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
> >>>>> The skiboot firmware has a hot reset handler which fences the NVIDIA V100
> >>>>> GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs:
> >>>>> https://github.com/open-power/skiboot/commit/fca2b2b839a67
> >>>>>
> >>>>> Now we are going to pass V100 via VFIO which most certainly involves
> >>>>> KVM guests which are often terminated without getting a chance to offline
> >>>>> GPU RAM so we end up with a running machine with misconfigured memory.
> >>>>> Accessing this memory produces hardware management interrupts (HMI)
> >>>>> which bring the host down.
> >>>>>
> >>>>> To suppress HMIs, this wires up this hot reset hook to vfio_pci_disable()
> >>>>> via pci_disable_device() which switches NPU2 to a safe mode and prevents
> >>>>> HMIs.
> >>>>>
> >>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>>> ---
> >>>>> Changes:
> >>>>> v2:
> >>>>> * updated the commit log
> >>>>> ---
> >>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
> >>>>>  1 file changed, 10 insertions(+)
> >>>>>
> >>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>> index cde7102..e37b9cc 100644
> >>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
> >>>>>  		pnv_ioda_release_pe(pe);
> >>>>>  }
> >>>>>  
> >>>>> +static void pnv_npu_disable_device(struct pci_dev *pdev)
> >>>>> +{
> >>>>> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
> >>>>> +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
> >>>>> +
> >>>>> +	if (eehpe && eeh_ops && eeh_ops->reset)
> >>>>> +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
> >>>>> +}
> >>>>> +
> >>>>>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
> >>>>>  {
> >>>>>  	struct pnv_phb *phb = hose->private_data;
> >>>>> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
> >>>>>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
> >>>>>  	.dma_set_mask		= pnv_npu_dma_set_mask,
> >>>>>  	.shutdown		= pnv_pci_ioda_shutdown,
> >>>>> +	.disable_device		= pnv_npu_disable_device,
> >>>>>  };
> >>>>>  
> >>>>>  static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> > 
> > 
> 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
  2018-10-16  2:19           ` Alistair Popple
@ 2018-10-16  2:22             ` Alexey Kardashevskiy
  2018-10-16  7:32               ` Alistair Popple
  0 siblings, 1 reply; 13+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-16  2:22 UTC (permalink / raw)
  To: Alistair Popple; +Cc: Reza Arbab, linuxppc-dev, David Gibson



On 16/10/2018 13:19, Alistair Popple wrote:
>> reset_ntl() does what npu2_dev_procedure_reset() does plus more stuff,
>> there nothing really in npu2_dev_procedure_reset() which reset_ntl()
>> does not do already from the hardware standpoint. And it did stop HMIs
>> for me though.
>>
>> but ok, what will be sufficient then if not reset_ntl()?
> 
> Argh, yes you are correct. Specifically both npu2_dev_procedure_reset() and
> reset_ntl() contain:
> 
> 	/* NTL Reset */
> 	val = npu2_read(ndev->npu, NPU2_NTL_MISC_CFG1(ndev));
> 	val |= PPC_BIT(8) | PPC_BIT(9);
> 	npu2_write(ndev->npu, NPU2_NTL_MISC_CFG1(ndev), val);
> 
> Which should fence the brick. However from what I recall there was more to
> reliably preventing HMIs than merely fencing the brick. It invovled a sequence
> of fencing and flushing the cache with dcbf instructions at the right time which
> is why we also have the FLR. Unfortunately I don't know the precise details,
> perhaps if we send enough coffee Balbir's way he might be able remind us?


He suggested and ack'ed that skiboot patch, I can repeat beers^wcoffee
but it won't change much ;)



> 
> - Alistair
> 
>>
>>
>>>
>>> - Alistair
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> - Alistair
>>>>>
>>>>> On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy wrote:
>>>>>> Ping?
>>>>>>
>>>>>>
>>>>>> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
>>>>>>> The skiboot firmware has a hot reset handler which fences the NVIDIA V100
>>>>>>> GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs:
>>>>>>> https://github.com/open-power/skiboot/commit/fca2b2b839a67
>>>>>>>
>>>>>>> Now we are going to pass V100 via VFIO which most certainly involves
>>>>>>> KVM guests which are often terminated without getting a chance to offline
>>>>>>> GPU RAM so we end up with a running machine with misconfigured memory.
>>>>>>> Accessing this memory produces hardware management interrupts (HMI)
>>>>>>> which bring the host down.
>>>>>>>
>>>>>>> To suppress HMIs, this wires up this hot reset hook to vfio_pci_disable()
>>>>>>> via pci_disable_device() which switches NPU2 to a safe mode and prevents
>>>>>>> HMIs.
>>>>>>>
>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>> ---
>>>>>>> Changes:
>>>>>>> v2:
>>>>>>> * updated the commit log
>>>>>>> ---
>>>>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
>>>>>>>  1 file changed, 10 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>> index cde7102..e37b9cc 100644
>>>>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
>>>>>>>  		pnv_ioda_release_pe(pe);
>>>>>>>  }
>>>>>>>  
>>>>>>> +static void pnv_npu_disable_device(struct pci_dev *pdev)
>>>>>>> +{
>>>>>>> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
>>>>>>> +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
>>>>>>> +
>>>>>>> +	if (eehpe && eeh_ops && eeh_ops->reset)
>>>>>>> +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
>>>>>>> +}
>>>>>>> +
>>>>>>>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
>>>>>>>  {
>>>>>>>  	struct pnv_phb *phb = hose->private_data;
>>>>>>> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
>>>>>>>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
>>>>>>>  	.dma_set_mask		= pnv_npu_dma_set_mask,
>>>>>>>  	.shutdown		= pnv_pci_ioda_shutdown,
>>>>>>> +	.disable_device		= pnv_npu_disable_device,
>>>>>>>  };
>>>>>>>  
>>>>>>>  static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
> 
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
  2018-10-16  2:22             ` Alexey Kardashevskiy
@ 2018-10-16  7:32               ` Alistair Popple
  2018-10-16  7:55                 ` Alexey Kardashevskiy
  0 siblings, 1 reply; 13+ messages in thread
From: Alistair Popple @ 2018-10-16  7:32 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Reza Arbab, linuxppc-dev, David Gibson

On Tuesday, 16 October 2018 1:22:53 PM AEDT Alexey Kardashevskiy wrote:
> 
> On 16/10/2018 13:19, Alistair Popple wrote:
> >> reset_ntl() does what npu2_dev_procedure_reset() does plus more stuff,
> >> there nothing really in npu2_dev_procedure_reset() which reset_ntl()
> >> does not do already from the hardware standpoint. And it did stop HMIs
> >> for me though.
> >>
> >> but ok, what will be sufficient then if not reset_ntl()?
> > 
> > Argh, yes you are correct. Specifically both npu2_dev_procedure_reset() and
> > reset_ntl() contain:
> > 
> > 	/* NTL Reset */
> > 	val = npu2_read(ndev->npu, NPU2_NTL_MISC_CFG1(ndev));
> > 	val |= PPC_BIT(8) | PPC_BIT(9);
> > 	npu2_write(ndev->npu, NPU2_NTL_MISC_CFG1(ndev), val);
> > 
> > Which should fence the brick. However from what I recall there was more to
> > reliably preventing HMIs than merely fencing the brick. It invovled a sequence
> > of fencing and flushing the cache with dcbf instructions at the right time which
> > is why we also have the FLR. Unfortunately I don't know the precise details,
> > perhaps if we send enough coffee Balbir's way he might be able remind us?
> 
> 
> He suggested and ack'ed that skiboot patch, I can repeat beers^wcoffee
> but it won't change much ;)

Ha. Couldn't hurt ;)

I was pretty sure flushing the caches was an important part of the sequence to
avoid HMI's. I believe you are trying to deal with unexpected guest terminations
which means the driver won't have a chance to flush the caches prior to
termination so wouldn't you also need to do that somewhere? Unless the driver
does it at startup?

- Alistair

> >
> > - Alistair
> > 
> >>
> >>
> >>>
> >>> - Alistair
> >>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> - Alistair
> >>>>>
> >>>>> On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy wrote:
> >>>>>> Ping?
> >>>>>>
> >>>>>>
> >>>>>> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
> >>>>>>> The skiboot firmware has a hot reset handler which fences the NVIDIA V100
> >>>>>>> GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs:
> >>>>>>> https://github.com/open-power/skiboot/commit/fca2b2b839a67
> >>>>>>>
> >>>>>>> Now we are going to pass V100 via VFIO which most certainly involves
> >>>>>>> KVM guests which are often terminated without getting a chance to offline
> >>>>>>> GPU RAM so we end up with a running machine with misconfigured memory.
> >>>>>>> Accessing this memory produces hardware management interrupts (HMI)
> >>>>>>> which bring the host down.
> >>>>>>>
> >>>>>>> To suppress HMIs, this wires up this hot reset hook to vfio_pci_disable()
> >>>>>>> via pci_disable_device() which switches NPU2 to a safe mode and prevents
> >>>>>>> HMIs.
> >>>>>>>
> >>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>>>>> ---
> >>>>>>> Changes:
> >>>>>>> v2:
> >>>>>>> * updated the commit log
> >>>>>>> ---
> >>>>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
> >>>>>>>  1 file changed, 10 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>>>> index cde7102..e37b9cc 100644
> >>>>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>>>> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
> >>>>>>>  		pnv_ioda_release_pe(pe);
> >>>>>>>  }
> >>>>>>>  
> >>>>>>> +static void pnv_npu_disable_device(struct pci_dev *pdev)
> >>>>>>> +{
> >>>>>>> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
> >>>>>>> +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
> >>>>>>> +
> >>>>>>> +	if (eehpe && eeh_ops && eeh_ops->reset)
> >>>>>>> +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
> >>>>>>>  {
> >>>>>>>  	struct pnv_phb *phb = hose->private_data;
> >>>>>>> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
> >>>>>>>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
> >>>>>>>  	.dma_set_mask		= pnv_npu_dma_set_mask,
> >>>>>>>  	.shutdown		= pnv_pci_ioda_shutdown,
> >>>>>>> +	.disable_device		= pnv_npu_disable_device,
> >>>>>>>  };
> >>>>>>>  
> >>>>>>>  static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> > 
> > 
> 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
  2018-10-16  7:32               ` Alistair Popple
@ 2018-10-16  7:55                 ` Alexey Kardashevskiy
  2018-10-18  1:05                   ` Alistair Popple
  0 siblings, 1 reply; 13+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-16  7:55 UTC (permalink / raw)
  To: Alistair Popple; +Cc: Reza Arbab, linuxppc-dev, David Gibson



On 16/10/2018 18:32, Alistair Popple wrote:
> On Tuesday, 16 October 2018 1:22:53 PM AEDT Alexey Kardashevskiy wrote:
>>
>> On 16/10/2018 13:19, Alistair Popple wrote:
>>>> reset_ntl() does what npu2_dev_procedure_reset() does plus more stuff,
>>>> there nothing really in npu2_dev_procedure_reset() which reset_ntl()
>>>> does not do already from the hardware standpoint. And it did stop HMIs
>>>> for me though.
>>>>
>>>> but ok, what will be sufficient then if not reset_ntl()?
>>>
>>> Argh, yes you are correct. Specifically both npu2_dev_procedure_reset() and
>>> reset_ntl() contain:
>>>
>>> 	/* NTL Reset */
>>> 	val = npu2_read(ndev->npu, NPU2_NTL_MISC_CFG1(ndev));
>>> 	val |= PPC_BIT(8) | PPC_BIT(9);
>>> 	npu2_write(ndev->npu, NPU2_NTL_MISC_CFG1(ndev), val);
>>>
>>> Which should fence the brick. However from what I recall there was more to
>>> reliably preventing HMIs than merely fencing the brick. It invovled a sequence
>>> of fencing and flushing the cache with dcbf instructions at the right time which
>>> is why we also have the FLR. Unfortunately I don't know the precise details,
>>> perhaps if we send enough coffee Balbir's way he might be able remind us?
>>
>>
>> He suggested and ack'ed that skiboot patch, I can repeat beers^wcoffee
>> but it won't change much ;)
> 
> Ha. Couldn't hurt ;)
> 
> I was pretty sure flushing the caches was an important part of the sequence to
> avoid HMI's. I believe you are trying to deal with unexpected guest terminations
> which means the driver won't have a chance to flush the caches prior to
> termination so

Correct.

> wouldn't you also need to do that somewhere? Unless the driver
> does it at startup?


VFIO performs GPU reset so I'd expect the GPUs to flush its caches
without any software interactions. Am I hoping for too much here?




> 
> - Alistair
> 
>>>
>>> - Alistair
>>>
>>>>
>>>>
>>>>>
>>>>> - Alistair
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> - Alistair
>>>>>>>
>>>>>>> On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy wrote:
>>>>>>>> Ping?
>>>>>>>>
>>>>>>>>
>>>>>>>> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
>>>>>>>>> The skiboot firmware has a hot reset handler which fences the NVIDIA V100
>>>>>>>>> GPU RAM on Witherspoons and makes accesses no-op instead of throwing HMIs:
>>>>>>>>> https://github.com/open-power/skiboot/commit/fca2b2b839a67
>>>>>>>>>
>>>>>>>>> Now we are going to pass V100 via VFIO which most certainly involves
>>>>>>>>> KVM guests which are often terminated without getting a chance to offline
>>>>>>>>> GPU RAM so we end up with a running machine with misconfigured memory.
>>>>>>>>> Accessing this memory produces hardware management interrupts (HMI)
>>>>>>>>> which bring the host down.
>>>>>>>>>
>>>>>>>>> To suppress HMIs, this wires up this hot reset hook to vfio_pci_disable()
>>>>>>>>> via pci_disable_device() which switches NPU2 to a safe mode and prevents
>>>>>>>>> HMIs.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>> ---
>>>>>>>>> Changes:
>>>>>>>>> v2:
>>>>>>>>> * updated the commit log
>>>>>>>>> ---
>>>>>>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
>>>>>>>>>  1 file changed, 10 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>>>> index cde7102..e37b9cc 100644
>>>>>>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>>>> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
>>>>>>>>>  		pnv_ioda_release_pe(pe);
>>>>>>>>>  }
>>>>>>>>>  
>>>>>>>>> +static void pnv_npu_disable_device(struct pci_dev *pdev)
>>>>>>>>> +{
>>>>>>>>> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
>>>>>>>>> +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
>>>>>>>>> +
>>>>>>>>> +	if (eehpe && eeh_ops && eeh_ops->reset)
>>>>>>>>> +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
>>>>>>>>>  {
>>>>>>>>>  	struct pnv_phb *phb = hose->private_data;
>>>>>>>>> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
>>>>>>>>>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
>>>>>>>>>  	.dma_set_mask		= pnv_npu_dma_set_mask,
>>>>>>>>>  	.shutdown		= pnv_pci_ioda_shutdown,
>>>>>>>>> +	.disable_device		= pnv_npu_disable_device,
>>>>>>>>>  };
>>>>>>>>>  
>>>>>>>>>  static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
> 
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
  2018-10-16  7:55                 ` Alexey Kardashevskiy
@ 2018-10-18  1:05                   ` Alistair Popple
  2018-10-19  1:20                     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 13+ messages in thread
From: Alistair Popple @ 2018-10-18  1:05 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Reza Arbab, linuxppc-dev, David Gibson

Hi Alexey,

> > wouldn't you also need to do that somewhere? Unless the driver
> > does it at startup?
> 
> VFIO performs GPU reset so I'd expect the GPUs to flush its caches
> without any software interactions. Am I hoping for too much here?

Sadly you are. It's not the GPU caches that need flushing, it's the CPU caches. 
This needs to happen as part of the reset sequence, so I guess you would need 
to add it to the VFIO driver.

- Alistair

> 
> > - Alistair
> > 
> >>> - Alistair
> >>> 
> >>>>> - Alistair
> >>>>> 
> >>>>>>> - Alistair
> >>>>>>> 
> >>>>>>> On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy 
wrote:
> >>>>>>>> Ping?
> >>>>>>>> 
> >>>>>>>> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
> >>>>>>>>> The skiboot firmware has a hot reset handler which fences the
> >>>>>>>>> NVIDIA V100
> >>>>>>>>> GPU RAM on Witherspoons and makes accesses no-op instead of
> >>>>>>>>> throwing HMIs:
> >>>>>>>>> https://github.com/open-power/skiboot/commit/fca2b2b839a67
> >>>>>>>>> 
> >>>>>>>>> Now we are going to pass V100 via VFIO which most certainly
> >>>>>>>>> involves
> >>>>>>>>> KVM guests which are often terminated without getting a chance to
> >>>>>>>>> offline
> >>>>>>>>> GPU RAM so we end up with a running machine with misconfigured
> >>>>>>>>> memory.
> >>>>>>>>> Accessing this memory produces hardware management interrupts
> >>>>>>>>> (HMI)
> >>>>>>>>> which bring the host down.
> >>>>>>>>> 
> >>>>>>>>> To suppress HMIs, this wires up this hot reset hook to
> >>>>>>>>> vfio_pci_disable()
> >>>>>>>>> via pci_disable_device() which switches NPU2 to a safe mode and
> >>>>>>>>> prevents
> >>>>>>>>> HMIs.
> >>>>>>>>> 
> >>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>>>>>>> ---
> >>>>>>>>> Changes:
> >>>>>>>>> v2:
> >>>>>>>>> * updated the commit log
> >>>>>>>>> ---
> >>>>>>>>> 
> >>>>>>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
> >>>>>>>>>  1 file changed, 10 insertions(+)
> >>>>>>>>> 
> >>>>>>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>>>>>> b/arch/powerpc/platforms/powernv/pci-ioda.c index
> >>>>>>>>> cde7102..e37b9cc 100644
> >>>>>>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>>>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>>>>>> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct
> >>>>>>>>> pci_dev *pdev)>>>>>>>>> 
> >>>>>>>>>  		pnv_ioda_release_pe(pe);
> >>>>>>>>>  
> >>>>>>>>>  }
> >>>>>>>>> 
> >>>>>>>>> +static void pnv_npu_disable_device(struct pci_dev *pdev)
> >>>>>>>>> +{
> >>>>>>>>> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
> >>>>>>>>> +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
> >>>>>>>>> +
> >>>>>>>>> +	if (eehpe && eeh_ops && eeh_ops->reset)
> >>>>>>>>> +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> 
> >>>>>>>>>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
> >>>>>>>>>  {
> >>>>>>>>>  
> >>>>>>>>>  	struct pnv_phb *phb = hose->private_data;
> >>>>>>>>> 
> >>>>>>>>> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops
> >>>>>>>>> pnv_npu_ioda_controller_ops = {>>>>>>>>> 
> >>>>>>>>>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
> >>>>>>>>>  	.dma_set_mask		= pnv_npu_dma_set_mask,
> >>>>>>>>>  	.shutdown		= pnv_pci_ioda_shutdown,
> >>>>>>>>> 
> >>>>>>>>> +	.disable_device		= pnv_npu_disable_device,
> >>>>>>>>> 
> >>>>>>>>>  };
> >>>>>>>>>  
> >>>>>>>>>  static const struct pci_controller_ops
> >>>>>>>>>  pnv_npu_ocapi_ioda_controller_ops = {



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
  2018-10-18  1:05                   ` Alistair Popple
@ 2018-10-19  1:20                     ` Alexey Kardashevskiy
  2018-10-19  1:47                       ` Alistair Popple
  0 siblings, 1 reply; 13+ messages in thread
From: Alexey Kardashevskiy @ 2018-10-19  1:20 UTC (permalink / raw)
  To: Alistair Popple; +Cc: Reza Arbab, linuxppc-dev, David Gibson



On 18/10/2018 12:05, Alistair Popple wrote:
> Hi Alexey,
> 
>>> wouldn't you also need to do that somewhere? Unless the driver
>>> does it at startup?
>>
>> VFIO performs GPU reset so I'd expect the GPUs to flush its caches
>> without any software interactions. Am I hoping for too much here?
> 
> Sadly you are. It's not the GPU caches that need flushing, it's the CPU caches. 
> This needs to happen as part of the reset sequence, so I guess you would need 
> to add it to the VFIO driver.

Well, ok. Caches need flushing, will look into this but this fencing is
still needed, is not it?



> 
> - Alistair
> 
>>
>>> - Alistair
>>>
>>>>> - Alistair
>>>>>
>>>>>>> - Alistair
>>>>>>>
>>>>>>>>> - Alistair
>>>>>>>>>
>>>>>>>>> On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy 
> wrote:
>>>>>>>>>> Ping?
>>>>>>>>>>
>>>>>>>>>> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
>>>>>>>>>>> The skiboot firmware has a hot reset handler which fences the
>>>>>>>>>>> NVIDIA V100
>>>>>>>>>>> GPU RAM on Witherspoons and makes accesses no-op instead of
>>>>>>>>>>> throwing HMIs:
>>>>>>>>>>> https://github.com/open-power/skiboot/commit/fca2b2b839a67
>>>>>>>>>>>
>>>>>>>>>>> Now we are going to pass V100 via VFIO which most certainly
>>>>>>>>>>> involves
>>>>>>>>>>> KVM guests which are often terminated without getting a chance to
>>>>>>>>>>> offline
>>>>>>>>>>> GPU RAM so we end up with a running machine with misconfigured
>>>>>>>>>>> memory.
>>>>>>>>>>> Accessing this memory produces hardware management interrupts
>>>>>>>>>>> (HMI)
>>>>>>>>>>> which bring the host down.
>>>>>>>>>>>
>>>>>>>>>>> To suppress HMIs, this wires up this hot reset hook to
>>>>>>>>>>> vfio_pci_disable()
>>>>>>>>>>> via pci_disable_device() which switches NPU2 to a safe mode and
>>>>>>>>>>> prevents
>>>>>>>>>>> HMIs.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>>> ---
>>>>>>>>>>> Changes:
>>>>>>>>>>> v2:
>>>>>>>>>>> * updated the commit log
>>>>>>>>>>> ---
>>>>>>>>>>>
>>>>>>>>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
>>>>>>>>>>>  1 file changed, 10 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>>>>>> b/arch/powerpc/platforms/powernv/pci-ioda.c index
>>>>>>>>>>> cde7102..e37b9cc 100644
>>>>>>>>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>>>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>>>>>>>>> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct
>>>>>>>>>>> pci_dev *pdev)>>>>>>>>> 
>>>>>>>>>>>  		pnv_ioda_release_pe(pe);
>>>>>>>>>>>  
>>>>>>>>>>>  }
>>>>>>>>>>>
>>>>>>>>>>> +static void pnv_npu_disable_device(struct pci_dev *pdev)
>>>>>>>>>>> +{
>>>>>>>>>>> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
>>>>>>>>>>> +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
>>>>>>>>>>> +
>>>>>>>>>>> +	if (eehpe && eeh_ops && eeh_ops->reset)
>>>>>>>>>>> +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>>
>>>>>>>>>>>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
>>>>>>>>>>>  {
>>>>>>>>>>>  
>>>>>>>>>>>  	struct pnv_phb *phb = hose->private_data;
>>>>>>>>>>>
>>>>>>>>>>> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops
>>>>>>>>>>> pnv_npu_ioda_controller_ops = {>>>>>>>>> 
>>>>>>>>>>>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
>>>>>>>>>>>  	.dma_set_mask		= pnv_npu_dma_set_mask,
>>>>>>>>>>>  	.shutdown		= pnv_pci_ioda_shutdown,
>>>>>>>>>>>
>>>>>>>>>>> +	.disable_device		= pnv_npu_disable_device,
>>>>>>>>>>>
>>>>>>>>>>>  };
>>>>>>>>>>>  
>>>>>>>>>>>  static const struct pci_controller_ops
>>>>>>>>>>>  pnv_npu_ocapi_ioda_controller_ops = {
> 
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
  2018-10-19  1:20                     ` Alexey Kardashevskiy
@ 2018-10-19  1:47                       ` Alistair Popple
  0 siblings, 0 replies; 13+ messages in thread
From: Alistair Popple @ 2018-10-19  1:47 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Reza Arbab, linuxppc-dev, Sam Bobroff, David Gibson

> >>> wouldn't you also need to do that somewhere? Unless the driver
> >>> does it at startup?
> >> 
> >> VFIO performs GPU reset so I'd expect the GPUs to flush its caches
> >> without any software interactions. Am I hoping for too much here?
> > 
> > Sadly you are. It's not the GPU caches that need flushing, it's the CPU
> > caches. This needs to happen as part of the reset sequence, so I guess
> > you would need to add it to the VFIO driver.
> 
> Well, ok. Caches need flushing, will look into this but this fencing is
> still needed, is not it?

Yes. Although without the flushing I think you may get HMI's on any subsequent 
driver loads.

So from the point of view of what happens on the Skiboot/HW side this looks ok 
so long as all links on an NPU are assigned to the same guest (as this call 
resets every link on a given NPU).

Acked-by: Alistair Popple <alistair@popple.id.au>
 
> > - Alistair
> > 
> >>> - Alistair
> >>> 
> >>>>> - Alistair
> >>>>> 
> >>>>>>> - Alistair
> >>>>>>> 
> >>>>>>>>> - Alistair
> >>>>>>>>> 
> >>>>>>>>> On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy
> > 
> > wrote:
> >>>>>>>>>> Ping?
> >>>>>>>>>> 
> >>>>>>>>>> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
> >>>>>>>>>>> The skiboot firmware has a hot reset handler which fences the
> >>>>>>>>>>> NVIDIA V100
> >>>>>>>>>>> GPU RAM on Witherspoons and makes accesses no-op instead of
> >>>>>>>>>>> throwing HMIs:
> >>>>>>>>>>> https://github.com/open-power/skiboot/commit/fca2b2b839a67
> >>>>>>>>>>> 
> >>>>>>>>>>> Now we are going to pass V100 via VFIO which most certainly
> >>>>>>>>>>> involves
> >>>>>>>>>>> KVM guests which are often terminated without getting a chance
> >>>>>>>>>>> to
> >>>>>>>>>>> offline
> >>>>>>>>>>> GPU RAM so we end up with a running machine with misconfigured
> >>>>>>>>>>> memory.
> >>>>>>>>>>> Accessing this memory produces hardware management interrupts
> >>>>>>>>>>> (HMI)
> >>>>>>>>>>> which bring the host down.
> >>>>>>>>>>> 
> >>>>>>>>>>> To suppress HMIs, this wires up this hot reset hook to
> >>>>>>>>>>> vfio_pci_disable()
> >>>>>>>>>>> via pci_disable_device() which switches NPU2 to a safe mode and
> >>>>>>>>>>> prevents
> >>>>>>>>>>> HMIs.
> >>>>>>>>>>> 
> >>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>>>>>>>>> ---
> >>>>>>>>>>> Changes:
> >>>>>>>>>>> v2:
> >>>>>>>>>>> * updated the commit log
> >>>>>>>>>>> ---
> >>>>>>>>>>> 
> >>>>>>>>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
> >>>>>>>>>>>  1 file changed, 10 insertions(+)
> >>>>>>>>>>> 
> >>>>>>>>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>>>>>>>> b/arch/powerpc/platforms/powernv/pci-ioda.c index
> >>>>>>>>>>> cde7102..e37b9cc 100644
> >>>>>>>>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>>>>>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>>>>>>>> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct
> >>>>>>>>>>> pci_dev *pdev)>>>>>>>>>
> >>>>>>>>>>> 
> >>>>>>>>>>>  		pnv_ioda_release_pe(pe);
> >>>>>>>>>>>  
> >>>>>>>>>>>  }
> >>>>>>>>>>> 
> >>>>>>>>>>> +static void pnv_npu_disable_device(struct pci_dev *pdev)
> >>>>>>>>>>> +{
> >>>>>>>>>>> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
> >>>>>>>>>>> +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
> >>>>>>>>>>> +
> >>>>>>>>>>> +	if (eehpe && eeh_ops && eeh_ops->reset)
> >>>>>>>>>>> +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
> >>>>>>>>>>> +}
> >>>>>>>>>>> +
> >>>>>>>>>>> 
> >>>>>>>>>>>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
> >>>>>>>>>>>  {
> >>>>>>>>>>>  
> >>>>>>>>>>>  	struct pnv_phb *phb = hose->private_data;
> >>>>>>>>>>> 
> >>>>>>>>>>> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops
> >>>>>>>>>>> pnv_npu_ioda_controller_ops = {>>>>>>>>>
> >>>>>>>>>>> 
> >>>>>>>>>>>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
> >>>>>>>>>>>  	.dma_set_mask		= pnv_npu_dma_set_mask,
> >>>>>>>>>>>  	.shutdown		= pnv_pci_ioda_shutdown,
> >>>>>>>>>>> 
> >>>>>>>>>>> +	.disable_device		= pnv_npu_disable_device,
> >>>>>>>>>>> 
> >>>>>>>>>>>  };
> >>>>>>>>>>>  
> >>>>>>>>>>>  static const struct pci_controller_ops
> >>>>>>>>>>>  pnv_npu_ocapi_ioda_controller_ops = {



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-10-19  1:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-02  3:20 [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2 Alexey Kardashevskiy
2018-10-15  7:17 ` Alexey Kardashevskiy
2018-10-16  0:38   ` Alistair Popple
2018-10-16  1:37     ` Alexey Kardashevskiy
2018-10-16  1:44       ` Alistair Popple
2018-10-16  2:02         ` Alexey Kardashevskiy
2018-10-16  2:19           ` Alistair Popple
2018-10-16  2:22             ` Alexey Kardashevskiy
2018-10-16  7:32               ` Alistair Popple
2018-10-16  7:55                 ` Alexey Kardashevskiy
2018-10-18  1:05                   ` Alistair Popple
2018-10-19  1:20                     ` Alexey Kardashevskiy
2018-10-19  1:47                       ` Alistair Popple

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).