qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [question] VFIO Device Migration: The vCPU may be paused during vfio device DMA in iommu nested stage mode && vSVA
@ 2021-09-24  6:18 Kunkun Jiang
  2021-09-24  6:47 ` Tian, Kevin
  0 siblings, 1 reply; 6+ messages in thread
From: Kunkun Jiang @ 2021-09-24  6:18 UTC (permalink / raw)
  To: Tarun Gupta, Alex Williamson, Kirti Wankhede, Eric Auger,
	Shameer Kolothum, open list:All patches CC here, kevin.tian
  Cc: Zenghui Yu, wanghaibin.wang, liulongfang, Keqian Zhu, tangnianyao

Hi all,

I encountered a problem in vfio device migration test. The
vCPU may be paused during vfio-pci DMA in iommu nested
stage mode && vSVA. This may lead to migration fail and
other problems related to device hardware and driver
implementation.

It may be a bit early to discuss this issue, after all, the iommu
nested stage mode and vSVA are not yet mature. But judging
from the current implementation, we will definitely encounter
this problem in the future.

This is the current process of vSVA processing translation fault
in iommu nested stage mode (take SMMU as an example):

guest os            4.handle translation fault 5.send CMD_RESUME to vSMMU


qemu                3.inject fault into guest os 6.deliver response to 
host os
(vfio/vsmmu)


host os              2.notify the qemu 7.send CMD_RESUME to SMMU
(vfio/smmu)


SMMU              1.address translation fault              8.retry or 
terminate

The order is 1--->8.

Currently, qemu may pause vCPU at any step. It is possible to
pause vCPU at step 1-5, that is, in a DMA. This may lead to
migration fail and other problems related to device hardware
and driver implementation. For example, the device status
cannot be changed from RUNNING && SAVING to SAVING,
because the device DMA is not over.

As far as i can see, vCPU should not be paused during a device
IO process, such as DMA. However, currently live migration
does not pay attention to the state of vfio device when pausing
the vCPU. And if the vCPU is not paused, the vfio device is
always running. This looks like a *deadlock*.

Do you have any ideas to solve this problem?
Looking forward to your replay.

Thanks,
Kunkun Jiang





^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [question] VFIO Device Migration: The vCPU may be paused during vfio device DMA in iommu nested stage mode && vSVA
  2021-09-24  6:18 [question] VFIO Device Migration: The vCPU may be paused during vfio device DMA in iommu nested stage mode && vSVA Kunkun Jiang
@ 2021-09-24  6:47 ` Tian, Kevin
  2021-09-24  9:29   ` Kirti Wankhede
  2021-09-27 12:30   ` Kunkun Jiang
  0 siblings, 2 replies; 6+ messages in thread
From: Tian, Kevin @ 2021-09-24  6:47 UTC (permalink / raw)
  To: Kunkun Jiang, Tarun Gupta, Alex Williamson, Kirti Wankhede,
	Eric Auger, Shameer Kolothum, open list:All patches CC here
  Cc: Liu, Yi L, Zhao, Yan Y, tangnianyao, Zenghui Yu, wanghaibin.wang,
	liulongfang, Keqian Zhu

> From: Kunkun Jiang <jiangkunkun@huawei.com>
> Sent: Friday, September 24, 2021 2:19 PM
> 
> Hi all,
> 
> I encountered a problem in vfio device migration test. The
> vCPU may be paused during vfio-pci DMA in iommu nested
> stage mode && vSVA. This may lead to migration fail and
> other problems related to device hardware and driver
> implementation.
> 
> It may be a bit early to discuss this issue, after all, the iommu
> nested stage mode and vSVA are not yet mature. But judging
> from the current implementation, we will definitely encounter
> this problem in the future.

Yes, this is a known limitation to support migration with vSVA.

> 
> This is the current process of vSVA processing translation fault
> in iommu nested stage mode (take SMMU as an example):
> 
> guest os            4.handle translation fault 5.send CMD_RESUME to vSMMU
> 
> 
> qemu                3.inject fault into guest os 6.deliver response to
> host os
> (vfio/vsmmu)
> 
> 
> host os              2.notify the qemu 7.send CMD_RESUME to SMMU
> (vfio/smmu)
> 
> 
> SMMU              1.address translation fault              8.retry or
> terminate
> 
> The order is 1--->8.
> 
> Currently, qemu may pause vCPU at any step. It is possible to
> pause vCPU at step 1-5, that is, in a DMA. This may lead to
> migration fail and other problems related to device hardware
> and driver implementation. For example, the device status
> cannot be changed from RUNNING && SAVING to SAVING,
> because the device DMA is not over.
> 
> As far as i can see, vCPU should not be paused during a device
> IO process, such as DMA. However, currently live migration
> does not pay attention to the state of vfio device when pausing
> the vCPU. And if the vCPU is not paused, the vfio device is
> always running. This looks like a *deadlock*.

Basically this requires:

1) stopping vCPU after stopping device (could selectively enable
this sequence for vSVA);

2) when stopping device, the driver should block new requests
from vCPU (queued to a pending list) and then drain all in-fly 
requests including faults;
    * to block this further requires switching from fast-path to
slow trap-emulation path for the cmd portal before stopping
the device;

3) save the pending requests in the vm image and replay them 
after the vm is resumed;
    * finally disable blocking by switching back to the fast-path for
the cmd portal;

> 
> Do you have any ideas to solve this problem?
> Looking forward to your replay.
> 

We verified above flow can work in our internal POC. 

Thanks
Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [question] VFIO Device Migration: The vCPU may be paused during vfio device DMA in iommu nested stage mode && vSVA
  2021-09-24  6:47 ` Tian, Kevin
@ 2021-09-24  9:29   ` Kirti Wankhede
  2021-09-26  2:48     ` Tian, Kevin
  2021-09-27 12:30   ` Kunkun Jiang
  1 sibling, 1 reply; 6+ messages in thread
From: Kirti Wankhede @ 2021-09-24  9:29 UTC (permalink / raw)
  To: Tian, Kevin, Kunkun Jiang, Tarun Gupta, Alex Williamson,
	Eric Auger, Shameer Kolothum, open list:All patches CC here
  Cc: wanghaibin.wang, Zenghui Yu, Keqian Zhu, liulongfang,
	tangnianyao, Liu, Yi L, Zhao, Yan Y



On 9/24/2021 12:17 PM, Tian, Kevin wrote:
>> From: Kunkun Jiang <jiangkunkun@huawei.com>
>> Sent: Friday, September 24, 2021 2:19 PM
>>
>> Hi all,
>>
>> I encountered a problem in vfio device migration test. The
>> vCPU may be paused during vfio-pci DMA in iommu nested
>> stage mode && vSVA. This may lead to migration fail and
>> other problems related to device hardware and driver
>> implementation.
>>
>> It may be a bit early to discuss this issue, after all, the iommu
>> nested stage mode and vSVA are not yet mature. But judging
>> from the current implementation, we will definitely encounter
>> this problem in the future.
> 
> Yes, this is a known limitation to support migration with vSVA.
> 
>>
>> This is the current process of vSVA processing translation fault
>> in iommu nested stage mode (take SMMU as an example):
>>
>> guest os            4.handle translation fault 5.send CMD_RESUME to vSMMU
>>
>>
>> qemu                3.inject fault into guest os 6.deliver response to
>> host os
>> (vfio/vsmmu)
>>
>>
>> host os              2.notify the qemu 7.send CMD_RESUME to SMMU
>> (vfio/smmu)
>>
>>
>> SMMU              1.address translation fault              8.retry or
>> terminate
>>
>> The order is 1--->8.
>>
>> Currently, qemu may pause vCPU at any step. It is possible to
>> pause vCPU at step 1-5, that is, in a DMA. This may lead to
>> migration fail and other problems related to device hardware
>> and driver implementation. For example, the device status
>> cannot be changed from RUNNING && SAVING to SAVING,
>> because the device DMA is not over.
>>
>> As far as i can see, vCPU should not be paused during a device
>> IO process, such as DMA. However, currently live migration
>> does not pay attention to the state of vfio device when pausing
>> the vCPU. And if the vCPU is not paused, the vfio device is
>> always running. This looks like a *deadlock*.
> 
> Basically this requires:
> 
> 1) stopping vCPU after stopping device (could selectively enable
> this sequence for vSVA);
> 

I don't think this is change is required. When vCPUs are at halt vCPU 
states are already saved, step 4 or 5 will be taken care by that. Then 
when device is transitioned in SAVING state, save qemu and host os state 
in the migration stream, i.e. state at step 2 and 3, depending on that 
take action while resuming, about step 6 or 7 to run.

Thanks,
Kirti

> 2) when stopping device, the driver should block new requests
> from vCPU (queued to a pending list) and then drain all in-fly
> requests including faults;
>      * to block this further requires switching from fast-path to
> slow trap-emulation path for the cmd portal before stopping
> the device;
> 
> 3) save the pending requests in the vm image and replay them
> after the vm is resumed;
>      * finally disable blocking by switching back to the fast-path for
> the cmd portal;
> 
>>
>> Do you have any ideas to solve this problem?
>> Looking forward to your replay.
>>
> 
> We verified above flow can work in our internal POC.
> 
> Thanks
> Kevin
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [question] VFIO Device Migration: The vCPU may be paused during vfio device DMA in iommu nested stage mode && vSVA
  2021-09-24  9:29   ` Kirti Wankhede
@ 2021-09-26  2:48     ` Tian, Kevin
  0 siblings, 0 replies; 6+ messages in thread
From: Tian, Kevin @ 2021-09-26  2:48 UTC (permalink / raw)
  To: Kirti Wankhede, Kunkun Jiang, Tarun Gupta, Alex Williamson,
	Eric Auger, Shameer Kolothum, open list:All patches CC here
  Cc: Liu, Yi L, Zhao, Yan Y, tangnianyao, Zenghui Yu, wanghaibin.wang,
	liulongfang, Keqian Zhu

> From: Kirti Wankhede <kwankhede@nvidia.com>
> Sent: Friday, September 24, 2021 5:29 PM
> 
> On 9/24/2021 12:17 PM, Tian, Kevin wrote:
> >> From: Kunkun Jiang <jiangkunkun@huawei.com>
> >> Sent: Friday, September 24, 2021 2:19 PM
> >>
> >> Hi all,
> >>
> >> I encountered a problem in vfio device migration test. The
> >> vCPU may be paused during vfio-pci DMA in iommu nested
> >> stage mode && vSVA. This may lead to migration fail and
> >> other problems related to device hardware and driver
> >> implementation.
> >>
> >> It may be a bit early to discuss this issue, after all, the iommu
> >> nested stage mode and vSVA are not yet mature. But judging
> >> from the current implementation, we will definitely encounter
> >> this problem in the future.
> >
> > Yes, this is a known limitation to support migration with vSVA.
> >
> >>
> >> This is the current process of vSVA processing translation fault
> >> in iommu nested stage mode (take SMMU as an example):
> >>
> >> guest os            4.handle translation fault 5.send CMD_RESUME to vSMMU
> >>
> >>
> >> qemu                3.inject fault into guest os 6.deliver response to
> >> host os
> >> (vfio/vsmmu)
> >>
> >>
> >> host os              2.notify the qemu 7.send CMD_RESUME to SMMU
> >> (vfio/smmu)
> >>
> >>
> >> SMMU              1.address translation fault              8.retry or
> >> terminate
> >>
> >> The order is 1--->8.
> >>
> >> Currently, qemu may pause vCPU at any step. It is possible to
> >> pause vCPU at step 1-5, that is, in a DMA. This may lead to
> >> migration fail and other problems related to device hardware
> >> and driver implementation. For example, the device status
> >> cannot be changed from RUNNING && SAVING to SAVING,
> >> because the device DMA is not over.
> >>
> >> As far as i can see, vCPU should not be paused during a device
> >> IO process, such as DMA. However, currently live migration
> >> does not pay attention to the state of vfio device when pausing
> >> the vCPU. And if the vCPU is not paused, the vfio device is
> >> always running. This looks like a *deadlock*.
> >
> > Basically this requires:
> >
> > 1) stopping vCPU after stopping device (could selectively enable
> > this sequence for vSVA);
> >
> 
> I don't think this is change is required. When vCPUs are at halt vCPU
> states are already saved, step 4 or 5 will be taken care by that. Then
> when device is transitioned in SAVING state, save qemu and host os state
> in the migration stream, i.e. state at step 2 and 3, depending on that
> take action while resuming, about step 6 or 7 to run.
> 

this is not like normal pending CPU interrupts which can be saved and
migrated.

Here to save the device state you need drain in-fly requests. But in-fly
requests may already hit I/O page faults and are waiting for fault
completion from the CPU. If you pause vCPU in the middle, the fault is
never fixed thus the in-fly requests cannot be drained (unless the device
support preemption per fault, which imho not the case for most SVA-
capable devices). Then you'll either fail migration or migrate broken 
device state.

vCPUs have to continue run until device draining can be completed.
This requirement could be indicated via the migration region.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [question] VFIO Device Migration: The vCPU may be paused during vfio device DMA in iommu nested stage mode && vSVA
  2021-09-24  6:47 ` Tian, Kevin
  2021-09-24  9:29   ` Kirti Wankhede
@ 2021-09-27 12:30   ` Kunkun Jiang
  2021-09-27 13:05     ` Tian, Kevin
  1 sibling, 1 reply; 6+ messages in thread
From: Kunkun Jiang @ 2021-09-27 12:30 UTC (permalink / raw)
  To: Tian, Kevin, Tarun Gupta, Alex Williamson, Kirti Wankhede,
	Eric Auger, Shameer Kolothum, open list:All patches CC here
  Cc: Liu, Yi L, Zhao, Yan Y, tangnianyao, Zenghui Yu, wanghaibin.wang,
	liulongfang, Keqian Zhu

Hi Kevin:

On 2021/9/24 14:47, Tian, Kevin wrote:
>> From: Kunkun Jiang <jiangkunkun@huawei.com>
>> Sent: Friday, September 24, 2021 2:19 PM
>>
>> Hi all,
>>
>> I encountered a problem in vfio device migration test. The
>> vCPU may be paused during vfio-pci DMA in iommu nested
>> stage mode && vSVA. This may lead to migration fail and
>> other problems related to device hardware and driver
>> implementation.
>>
>> It may be a bit early to discuss this issue, after all, the iommu
>> nested stage mode and vSVA are not yet mature. But judging
>> from the current implementation, we will definitely encounter
>> this problem in the future.
> Yes, this is a known limitation to support migration with vSVA.
>
>> This is the current process of vSVA processing translation fault
>> in iommu nested stage mode (take SMMU as an example):
>>
>> guest os            4.handle translation fault 5.send CMD_RESUME to vSMMU
>>
>>
>> qemu                3.inject fault into guest os 6.deliver response to
>> host os
>> (vfio/vsmmu)
>>
>>
>> host os              2.notify the qemu 7.send CMD_RESUME to SMMU
>> (vfio/smmu)
>>
>>
>> SMMU              1.address translation fault              8.retry or
>> terminate
>>
>> The order is 1--->8.
>>
>> Currently, qemu may pause vCPU at any step. It is possible to
>> pause vCPU at step 1-5, that is, in a DMA. This may lead to
>> migration fail and other problems related to device hardware
>> and driver implementation. For example, the device status
>> cannot be changed from RUNNING && SAVING to SAVING,
>> because the device DMA is not over.
>>
>> As far as i can see, vCPU should not be paused during a device
>> IO process, such as DMA. However, currently live migration
>> does not pay attention to the state of vfio device when pausing
>> the vCPU. And if the vCPU is not paused, the vfio device is
>> always running. This looks like a *deadlock*.
> Basically this requires:
>
> 1) stopping vCPU after stopping device (could selectively enable
> this sequence for vSVA);
How to tell if vSVA is open?
In fact, as long as it is in iommu nested stage mode, there will
be such a problem, whether it is vSVA or no-vSVA. In no-vSVA mode,
a fault can also be generated by modifying the guest device driver.
>
> 2) when stopping device, the driver should block new requests
> from vCPU (queued to a pending list) and then drain all in-fly
> requests including faults;
>      * to block this further requires switching from fast-path to
> slow trap-emulation path for the cmd portal before stopping
> the device;
>
> 3) save the pending requests in the vm image and replay them
> after the vm is resumed;
>      * finally disable blocking by switching back to the fast-path for
> the cmd portal;
Is there any related patch sent out and discussed? I might have
overlooked that.

We may be able to discuss and finalize a specification for this
problem.

Thanks,
Kunkun Jiang
>> Do you have any ideas to solve this problem?
>> Looking forward to your replay.
>>
> We verified above flow can work in our internal POC.
>
> Thanks
> Kevin




^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [question] VFIO Device Migration: The vCPU may be paused during vfio device DMA in iommu nested stage mode && vSVA
  2021-09-27 12:30   ` Kunkun Jiang
@ 2021-09-27 13:05     ` Tian, Kevin
  0 siblings, 0 replies; 6+ messages in thread
From: Tian, Kevin @ 2021-09-27 13:05 UTC (permalink / raw)
  To: Kunkun Jiang, Tarun Gupta, Alex Williamson, Kirti Wankhede,
	Eric Auger, Shameer Kolothum, open list:All patches CC here
  Cc: Liu, Yi L, Zhao, Yan Y, tangnianyao, Zenghui Yu, wanghaibin.wang,
	liulongfang, Keqian Zhu

> From: Kunkun Jiang <jiangkunkun@huawei.com>
> Sent: Monday, September 27, 2021 8:30 PM
> 
> Hi Kevin:
> 
> On 2021/9/24 14:47, Tian, Kevin wrote:
> >> From: Kunkun Jiang <jiangkunkun@huawei.com>
> >> Sent: Friday, September 24, 2021 2:19 PM
> >>
> >> Hi all,
> >>
> >> I encountered a problem in vfio device migration test. The
> >> vCPU may be paused during vfio-pci DMA in iommu nested
> >> stage mode && vSVA. This may lead to migration fail and
> >> other problems related to device hardware and driver
> >> implementation.
> >>
> >> It may be a bit early to discuss this issue, after all, the iommu
> >> nested stage mode and vSVA are not yet mature. But judging
> >> from the current implementation, we will definitely encounter
> >> this problem in the future.
> > Yes, this is a known limitation to support migration with vSVA.
> >
> >> This is the current process of vSVA processing translation fault
> >> in iommu nested stage mode (take SMMU as an example):
> >>
> >> guest os            4.handle translation fault 5.send CMD_RESUME to vSMMU
> >>
> >>
> >> qemu                3.inject fault into guest os 6.deliver response to
> >> host os
> >> (vfio/vsmmu)
> >>
> >>
> >> host os              2.notify the qemu 7.send CMD_RESUME to SMMU
> >> (vfio/smmu)
> >>
> >>
> >> SMMU              1.address translation fault              8.retry or
> >> terminate
> >>
> >> The order is 1--->8.
> >>
> >> Currently, qemu may pause vCPU at any step. It is possible to
> >> pause vCPU at step 1-5, that is, in a DMA. This may lead to
> >> migration fail and other problems related to device hardware
> >> and driver implementation. For example, the device status
> >> cannot be changed from RUNNING && SAVING to SAVING,
> >> because the device DMA is not over.
> >>
> >> As far as i can see, vCPU should not be paused during a device
> >> IO process, such as DMA. However, currently live migration
> >> does not pay attention to the state of vfio device when pausing
> >> the vCPU. And if the vCPU is not paused, the vfio device is
> >> always running. This looks like a *deadlock*.
> > Basically this requires:
> >
> > 1) stopping vCPU after stopping device (could selectively enable
> > this sequence for vSVA);
> How to tell if vSVA is open?
> In fact, as long as it is in iommu nested stage mode, there will
> be such a problem, whether it is vSVA or no-vSVA. In no-vSVA mode,
> a fault can also be generated by modifying the guest device driver.

You don't need tell if vSVA is open. The kernel driver knows
whether nested is enabled and whether fault is handled by 
userspace. with this knowledge the driver can generally indicate 
such order restriction to Qemu via migration region info.

> >
> > 2) when stopping device, the driver should block new requests
> > from vCPU (queued to a pending list) and then drain all in-fly
> > requests including faults;
> >      * to block this further requires switching from fast-path to
> > slow trap-emulation path for the cmd portal before stopping
> > the device;
> >
> > 3) save the pending requests in the vm image and replay them
> > after the vm is resumed;
> >      * finally disable blocking by switching back to the fast-path for
> > the cmd portal;
> Is there any related patch sent out and discussed? I might have
> overlooked that.

Not yet. As you said even vSVA support is not in place. It's too 
early to send patches for other features on top of that.

Thanks,
Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-27 13:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-24  6:18 [question] VFIO Device Migration: The vCPU may be paused during vfio device DMA in iommu nested stage mode && vSVA Kunkun Jiang
2021-09-24  6:47 ` Tian, Kevin
2021-09-24  9:29   ` Kirti Wankhede
2021-09-26  2:48     ` Tian, Kevin
2021-09-27 12:30   ` Kunkun Jiang
2021-09-27 13:05     ` Tian, Kevin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).