From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Max Gurtovoy Subject: Live Migration of Virtio Virtual Function Date: Thu, 12 Aug 2021 12:08:41 +0000 Message-ID: MIME-Version: 1.0 Content-Language: en-US Content-Type: multipart/alternative; boundary="_000_DM4PR12MB504062CEF6BEDABE6F76DEC2DEF99DM4PR12MB5040namp_" To: "virtio-comment@lists.oasis-open.org" , Jason Wang , "Michael S. Tsirkin" , "cohuck@redhat.com" Cc: Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: --_000_DM4PR12MB504062CEF6BEDABE6F76DEC2DEF99DM4PR12MB5040namp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi all, Live migration is one of the most important features of virtualization and = virtio devices are oftenly found in virtual environments. The migration process is managed by a migration SW that is running on the h= ypervisor and the VM is not aware of the process at all. Unlike the vDPA case, a real pci Virtual Function state resides in the HW. In our vision, in order to fulfil the Live migration requirements for virtu= al functions, each physical function device must implement migration operat= ions. Using these operations, it will be able to master the migration proce= ss for the virtual function devices. Each capable physical function device = has a supervisor permissions to change the virtual function operational sta= tes, save/restore its internal state and start/stop dirty pages tracking. An example of this approach can be seen in the way NVIDIA performs live mig= ration of a ConnectX NIC function: https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci NVIDIAs SNAP technology enables hardware-accelerated software defined PCIe = devices. virtio-blk/virtio-net/virtio-fs SNAP used for storage and networki= ng solutions. The host OS/hypervisor uses its standard drivers that are imp= lemented according to a well-known VIRTIO specifications. In order to implement Live Migration for these virtual function devices, th= at use a standard drivers as mentioned, the specification should define how= HW vendor should build their devices and for SW developers to adjust the d= rivers. This will enable specification compliant vendor agnostic solution. This is exactly how we built the migration driver for ConnectX (internal HW= design doc) and I guess that this is the way other vendors work. For that, I would like to know if the approach of "PF that controls the VF = live migration process" is acceptable by the VIRTIO technical group ? Cheers, -Max. --_000_DM4PR12MB504062CEF6BEDABE6F76DEC2DEF99DM4PR12MB5040namp_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi all,

Live migration is one of the most important features= of virtualization and virtio devices are oftenly found in virtual environm= ents.

 

The migration process is managed by a migration SW t= hat is running on the hypervisor and the VM is not aware of the process at = all.

 

Unlike the vDPA case, a real pci Virtual Function st= ate resides in the HW.

 

In our vision, in order to fulfil the Live migration= requirements for virtual functions, each physical function device must imp= lement migration operations. Using these operations, it will be able to mas= ter the migration process for the virtual function devices. Each capable physical function device has a supe= rvisor permissions to change the virtual function operational states, save/= restore its internal state and start/stop dirty pages tracking.<= /p>

 

An example of this approach can be seen in the way N= VIDIA performs live migration of a ConnectX NIC function:

https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci=

 

NVIDIAs SNAP technology enables hardware-accelerated= software defined PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used f= or storage and networking solutions. The host OS/hypervisor uses its standa= rd drivers that are implemented according to a well-known VIRTIO specifications.

 

In order to implement Live Migration for these virtu= al function devices, that use a standard drivers as mentioned, the specific= ation should define how HW vendor should build their devices and for SW dev= elopers to adjust the drivers.

This will enable specification compliant vendor agno= stic solution.

 

This is exactly how we built the migration driver fo= r ConnectX (internal HW design doc) and I guess that this is the way other = vendors work.

 

For that, I would like to know if the approach of &#= 8220;PF that controls the VF live migration process” is acceptable by= the VIRTIO technical group ?

 

Cheers,

-Max.

--_000_DM4PR12MB504062CEF6BEDABE6F76DEC2DEF99DM4PR12MB5040namp_-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function References: From: Jason Wang Message-ID: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> Date: Tue, 17 Aug 2021 16:51:39 +0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-US To: Max Gurtovoy , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" Cc: Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: 在 2021/8/12 下午8:08, Max Gurtovoy 写道: > > Hi all, > > Live migration is one of the most important features of virtualization > and virtio devices are oftenly found in virtual environments. > > The migration process is managed by a migration SW that is running on > the hypervisor and the VM is not aware of the process at all. > > Unlike the vDPA case, a real pci Virtual Function state resides in the HW. > vDPA doesn't prevent you from having HW states. Actually from the view of the VMM(Qemu), it doesn't care whether or not a state is stored in the software or hardware. A well designed VMM should be able to hide the virtio device implementation from the migration layer, that is how Qemu is wrote who doesn't care about whether or not it's a software virtio/vDPA device or not. > In our vision, in order to fulfil the Live migration requirements for > virtual functions, each physical function device must implement > migration operations. Using these operations, it will be able to > master the migration process for the virtual function devices. Each > capable physical function device has a supervisor permissions to > change the virtual function operational states, save/restore its > internal state and start/stop dirty pages tracking. > For "supervisor permissions", is this from the software point of view? Maybe it's better to give an example for this. > An example of this approach can be seen in the way NVIDIA performs > live migration of a ConnectX NIC function: > > https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci > > > NVIDIAs SNAP technology enables hardware-accelerated software defined > PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for storage > and networking solutions. The host OS/hypervisor uses its standard > drivers that are implemented according to a well-known VIRTIO > specifications. > > In order to implement Live Migration for these virtual function > devices, that use a standard drivers as mentioned, the specification > should define how HW vendor should build their devices and for SW > developers to adjust the drivers. > > This will enable specification compliant vendor agnostic solution. > > This is exactly how we built the migration driver for ConnectX > (internal HW design doc) and I guess that this is the way other > vendors work. > > For that, I would like to know if the approach of “PF that controls > the VF live migration process” is acceptable by the VIRTIO technical > group ? > I'm not sure but I think it's better to start from the general facility for all transports, then develop features for a specific transport. Thanks > Cheers, > > -Max. > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis-open.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id 364619863B9 for ; Tue, 17 Aug 2021 09:11:28 +0000 (UTC) References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> From: Max Gurtovoy Message-ID: Date: Tue, 17 Aug 2021 12:11:07 +0300 MIME-Version: 1.0 In-Reply-To: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable To: Jason Wang , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" Cc: Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On 8/17/2021 11:51 AM, Jason Wang wrote: > > =E5=9C=A8 2021/8/12 =E4=B8=8B=E5=8D=888:08, Max Gurtovoy =E5=86=99=E9=81= =93: >> >> Hi all, >> >> Live migration is one of the most important features of=20 >> virtualization and virtio devices are oftenly found in virtual=20 >> environments. >> >> The migration process is managed by a migration SW that is running on=20 >> the hypervisor and the VM is not aware of the process at all. >> >> Unlike the vDPA case, a real pci Virtual Function state resides in=20 >> the HW. >> > > vDPA doesn't prevent you from having HW states. Actually from the view=20 > of the VMM(Qemu), it doesn't care whether or not a state is stored in=20 > the software or hardware. A well designed VMM should be able to hide=20 > the virtio device implementation from the migration layer, that is how=20 > Qemu is wrote who doesn't care about whether or not it's a software=20 > virtio/vDPA device or not. > > >> In our vision, in order to fulfil the Live migration requirements for=20 >> virtual functions, each physical function device must implement=20 >> migration operations. Using these operations, it will be able to=20 >> master the migration process for the virtual function devices. Each=20 >> capable physical function device has a supervisor permissions to=20 >> change the virtual function operational states, save/restore its=20 >> internal state and start/stop dirty pages tracking. >> > > For "supervisor permissions", is this from the software point of view?=20 > Maybe it's better to give an example for this. A permission to a PF device for quiesce and freeze a VF device for example. > > >> An example of this approach can be seen in the way NVIDIA performs=20 >> live migration of a ConnectX NIC function: >> >> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci=20 >> >> >> NVIDIAs SNAP technology enables hardware-accelerated software defined=20 >> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for storage=20 >> and networking solutions. The host OS/hypervisor uses its standard=20 >> drivers that are implemented according to a well-known VIRTIO=20 >> specifications. >> >> In order to implement Live Migration for these virtual function=20 >> devices, that use a standard drivers as mentioned, the specification=20 >> should define how HW vendor should build their devices and for SW=20 >> developers to adjust the drivers. >> >> This will enable specification compliant vendor agnostic solution. >> >> This is exactly how we built the migration driver for ConnectX=20 >> (internal HW design doc) and I guess that this is the way other=20 >> vendors work. >> >> For that, I would like to know if the approach of =E2=80=9CPF that contr= ols=20 >> the VF live migration process=E2=80=9D is acceptable by the VIRTIO techn= ical=20 >> group ? >> > > I'm not sure but I think it's better to start from the general=20 > facility for all transports, then develop features for a specific=20 > transport. a general facility for all transports can be a generic admin queue ? > > Thanks > > >> Cheers, >> >> -Max. >> > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lis= ts Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> In-Reply-To: From: Jason Wang Date: Tue, 17 Aug 2021 17:44:17 +0800 Message-ID: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable To: Max Gurtovoy Cc: "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy wrote: > > > On 8/17/2021 11:51 AM, Jason Wang wrote: > > > > =E5=9C=A8 2021/8/12 =E4=B8=8B=E5=8D=888:08, Max Gurtovoy =E5=86=99=E9= =81=93: > >> > >> Hi all, > >> > >> Live migration is one of the most important features of > >> virtualization and virtio devices are oftenly found in virtual > >> environments. > >> > >> The migration process is managed by a migration SW that is running on > >> the hypervisor and the VM is not aware of the process at all. > >> > >> Unlike the vDPA case, a real pci Virtual Function state resides in > >> the HW. > >> > > > > vDPA doesn't prevent you from having HW states. Actually from the view > > of the VMM(Qemu), it doesn't care whether or not a state is stored in > > the software or hardware. A well designed VMM should be able to hide > > the virtio device implementation from the migration layer, that is how > > Qemu is wrote who doesn't care about whether or not it's a software > > virtio/vDPA device or not. > > > > > >> In our vision, in order to fulfil the Live migration requirements for > >> virtual functions, each physical function device must implement > >> migration operations. Using these operations, it will be able to > >> master the migration process for the virtual function devices. Each > >> capable physical function device has a supervisor permissions to > >> change the virtual function operational states, save/restore its > >> internal state and start/stop dirty pages tracking. > >> > > > > For "supervisor permissions", is this from the software point of view? > > Maybe it's better to give an example for this. > > A permission to a PF device for quiesce and freeze a VF device for exampl= e. Note that for safety, VMM (e.g Qemu) is usually running without any privile= ges. > > > > > > >> An example of this approach can be seen in the way NVIDIA performs > >> live migration of a ConnectX NIC function: > >> > >> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci > >> > >> > >> NVIDIAs SNAP technology enables hardware-accelerated software defined > >> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for storage > >> and networking solutions. The host OS/hypervisor uses its standard > >> drivers that are implemented according to a well-known VIRTIO > >> specifications. > >> > >> In order to implement Live Migration for these virtual function > >> devices, that use a standard drivers as mentioned, the specification > >> should define how HW vendor should build their devices and for SW > >> developers to adjust the drivers. > >> > >> This will enable specification compliant vendor agnostic solution. > >> > >> This is exactly how we built the migration driver for ConnectX > >> (internal HW design doc) and I guess that this is the way other > >> vendors work. > >> > >> For that, I would like to know if the approach of =E2=80=9CPF that con= trols > >> the VF live migration process=E2=80=9D is acceptable by the VIRTIO tec= hnical > >> group ? > >> > > > > I'm not sure but I think it's better to start from the general > > facility for all transports, then develop features for a specific > > transport. > > a general facility for all transports can be a generic admin queue ? It could be a virtqueue or a transport specific method (pcie capability). E.g we can define what needs to be migrated for the virtio-blk first (the device state). Then we can define the interface to get and set those states via admin virtqueue. Such decoupling may ease the future development of the transport specific migration interface. Thanks > > > > > > Thanks > > > > > >> Cheers, > >> > >> -Max. > >> > > > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-l= ists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis-open.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id B3E12986488 for ; Wed, 18 Aug 2021 09:16:09 +0000 (UTC) References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> From: Max Gurtovoy Message-ID: Date: Wed, 18 Aug 2021 12:15:58 +0300 MIME-Version: 1.0 In-Reply-To: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable To: Jason Wang Cc: "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On 8/17/2021 12:44 PM, Jason Wang wrote: > On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy wrote= : >> >> On 8/17/2021 11:51 AM, Jason Wang wrote: >>> =E5=9C=A8 2021/8/12 =E4=B8=8B=E5=8D=888:08, Max Gurtovoy =E5=86=99=E9= =81=93: >>>> Hi all, >>>> >>>> Live migration is one of the most important features of >>>> virtualization and virtio devices are oftenly found in virtual >>>> environments. >>>> >>>> The migration process is managed by a migration SW that is running on >>>> the hypervisor and the VM is not aware of the process at all. >>>> >>>> Unlike the vDPA case, a real pci Virtual Function state resides in >>>> the HW. >>>> >>> vDPA doesn't prevent you from having HW states. Actually from the view >>> of the VMM(Qemu), it doesn't care whether or not a state is stored in >>> the software or hardware. A well designed VMM should be able to hide >>> the virtio device implementation from the migration layer, that is how >>> Qemu is wrote who doesn't care about whether or not it's a software >>> virtio/vDPA device or not. >>> >>> >>>> In our vision, in order to fulfil the Live migration requirements for >>>> virtual functions, each physical function device must implement >>>> migration operations. Using these operations, it will be able to >>>> master the migration process for the virtual function devices. Each >>>> capable physical function device has a supervisor permissions to >>>> change the virtual function operational states, save/restore its >>>> internal state and start/stop dirty pages tracking. >>>> >>> For "supervisor permissions", is this from the software point of view? >>> Maybe it's better to give an example for this. >> A permission to a PF device for quiesce and freeze a VF device for examp= le. > Note that for safety, VMM (e.g Qemu) is usually running without any privi= leges. You're mixing layers here. QEMU is not involved here. It's only sending IOCTLs to migration driver.=20 The migration driver will control the migration process of the VF using=20 the PF communication channel. > >>> >>>> An example of this approach can be seen in the way NVIDIA performs >>>> live migration of a ConnectX NIC function: >>>> >>>> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci >>>> >>>> >>>> NVIDIAs SNAP technology enables hardware-accelerated software defined >>>> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for storage >>>> and networking solutions. The host OS/hypervisor uses its standard >>>> drivers that are implemented according to a well-known VIRTIO >>>> specifications. >>>> >>>> In order to implement Live Migration for these virtual function >>>> devices, that use a standard drivers as mentioned, the specification >>>> should define how HW vendor should build their devices and for SW >>>> developers to adjust the drivers. >>>> >>>> This will enable specification compliant vendor agnostic solution. >>>> >>>> This is exactly how we built the migration driver for ConnectX >>>> (internal HW design doc) and I guess that this is the way other >>>> vendors work. >>>> >>>> For that, I would like to know if the approach of =E2=80=9CPF that con= trols >>>> the VF live migration process=E2=80=9D is acceptable by the VIRTIO tec= hnical >>>> group ? >>>> >>> I'm not sure but I think it's better to start from the general >>> facility for all transports, then develop features for a specific >>> transport. >> a general facility for all transports can be a generic admin queue ? > It could be a virtqueue or a transport specific method (pcie capability). No. You said a general facility for all transports. Transport specific is not general. > > E.g we can define what needs to be migrated for the virtio-blk first > (the device state). Then we can define the interface to get and set > those states via admin virtqueue. Such decoupling may ease the future > development of the transport specific migration interface. I asked a simple question here. Lets stick to this. I'm not referring to internal state definitions. Can you please not change the subject of my initial intent in the email ? Thanks. > > Thanks > >> >>> Thanks >>> >>> >>>> Cheers, >>>> >>>> -Max. >>>> >> This publicly archived list offers a means to provide input to the >> OASIS Virtual I/O Device (VIRTIO) TC. >> >> In order to verify user consent to the Feedback License terms and >> to minimize spam in the list archive, subscription is required >> before posting. >> >> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >> List help: virtio-comment-help@lists.oasis-open.org >> List archive: https://nam11.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04= %7C01%7Cmgurtovoy%40nvidia.com%7C0c1898923999420201b408d961639dee%7C43083d1= 5727340c1b7db39efd9ccc17a%7C0%7C0%7C637647902764977086%7CUnknown%7CTWFpbGZs= b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C10= 00&sdata=3DZvEbfxoQhPnFyy1m%2BM5%2FyD0UW%2FJ7YZnOliFepRsoU30%3D&res= erved=3D0 >> Feedback License: https://nam11.safelinks.protection.outlook.com/?url=3D= https%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&dat= a=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C0c1898923999420201b408d961639dee%7C4= 3083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637647902764977086%7CUnknown%7CTW= FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3= D%7C1000&sdata=3DXsf04HBGeQb4qOwgAfUQucHsizBHV5Vj2WOJVc0AYTs%3D&res= erved=3D0 >> List Guidelines: https://nam11.safelinks.protection.outlook.com/?url=3Dh= ttps%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&d= ata=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C0c1898923999420201b408d961639dee%7= C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637647902764977086%7CUnknown%7C= TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0= %3D%7C1000&sdata=3DGUVBMKmEmZxbhacMofMW9OitOgMscoIOOvS%2BkyPTg1Y%3D&= ;reserved=3D0 >> Committee: https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3= A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgu= rtovoy%40nvidia.com%7C0c1898923999420201b408d961639dee%7C43083d15727340c1b7= db39efd9ccc17a%7C0%7C0%7C637647902764977086%7CUnknown%7CTWFpbGZsb3d8eyJWIjo= iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdat= a=3DOTdKqw11N6FxaSNhxBlmLSheYYqnHGVGyfgyuoAzxOU%3D&reserved=3D0 >> Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%= 3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidi= a.com%7C0c1898923999420201b408d961639dee%7C43083d15727340c1b7db39efd9ccc17a= %7C0%7C0%7C637647902764977086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL= CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3De80lScamJ3= Sn5NqsiDBZ3tpc8oHl%2F0dVOq39pGHH4v8%3D&reserved=3D0 >> This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lis= ts Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> In-Reply-To: From: Jason Wang Date: Wed, 18 Aug 2021 18:46:07 +0800 Message-ID: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable To: Max Gurtovoy Cc: "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy wrote: > > > On 8/17/2021 12:44 PM, Jason Wang wrote: > > On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy wro= te: > >> > >> On 8/17/2021 11:51 AM, Jason Wang wrote: > >>> =E5=9C=A8 2021/8/12 =E4=B8=8B=E5=8D=888:08, Max Gurtovoy =E5=86=99=E9= =81=93: > >>>> Hi all, > >>>> > >>>> Live migration is one of the most important features of > >>>> virtualization and virtio devices are oftenly found in virtual > >>>> environments. > >>>> > >>>> The migration process is managed by a migration SW that is running o= n > >>>> the hypervisor and the VM is not aware of the process at all. > >>>> > >>>> Unlike the vDPA case, a real pci Virtual Function state resides in > >>>> the HW. > >>>> > >>> vDPA doesn't prevent you from having HW states. Actually from the vie= w > >>> of the VMM(Qemu), it doesn't care whether or not a state is stored in > >>> the software or hardware. A well designed VMM should be able to hide > >>> the virtio device implementation from the migration layer, that is ho= w > >>> Qemu is wrote who doesn't care about whether or not it's a software > >>> virtio/vDPA device or not. > >>> > >>> > >>>> In our vision, in order to fulfil the Live migration requirements fo= r > >>>> virtual functions, each physical function device must implement > >>>> migration operations. Using these operations, it will be able to > >>>> master the migration process for the virtual function devices. Each > >>>> capable physical function device has a supervisor permissions to > >>>> change the virtual function operational states, save/restore its > >>>> internal state and start/stop dirty pages tracking. > >>>> > >>> For "supervisor permissions", is this from the software point of view= ? > >>> Maybe it's better to give an example for this. > >> A permission to a PF device for quiesce and freeze a VF device for exa= mple. > > Note that for safety, VMM (e.g Qemu) is usually running without any pri= vileges. > > You're mixing layers here. > > QEMU is not involved here. It's only sending IOCTLs to migration driver. > The migration driver will control the migration process of the VF using > the PF communication channel. So who will be granted the "permission" you mentioned here? > > > > > >>> > >>>> An example of this approach can be seen in the way NVIDIA performs > >>>> live migration of a ConnectX NIC function: > >>>> > >>>> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci > >>>> > >>>> > >>>> NVIDIAs SNAP technology enables hardware-accelerated software define= d > >>>> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for storage > >>>> and networking solutions. The host OS/hypervisor uses its standard > >>>> drivers that are implemented according to a well-known VIRTIO > >>>> specifications. > >>>> > >>>> In order to implement Live Migration for these virtual function > >>>> devices, that use a standard drivers as mentioned, the specification > >>>> should define how HW vendor should build their devices and for SW > >>>> developers to adjust the drivers. > >>>> > >>>> This will enable specification compliant vendor agnostic solution. > >>>> > >>>> This is exactly how we built the migration driver for ConnectX > >>>> (internal HW design doc) and I guess that this is the way other > >>>> vendors work. > >>>> > >>>> For that, I would like to know if the approach of =E2=80=9CPF that c= ontrols > >>>> the VF live migration process=E2=80=9D is acceptable by the VIRTIO t= echnical > >>>> group ? > >>>> > >>> I'm not sure but I think it's better to start from the general > >>> facility for all transports, then develop features for a specific > >>> transport. > >> a general facility for all transports can be a generic admin queue ? > > It could be a virtqueue or a transport specific method (pcie capability= ). > > No. You said a general facility for all transports. For general facility, I mean the chapter 2 of the spec which is general " 2 Basic Facilities of a Virtio Device " > > Transport specific is not general. The transport is in charge of implementing the interface for those faciliti= es. > > > > > E.g we can define what needs to be migrated for the virtio-blk first > > (the device state). Then we can define the interface to get and set > > those states via admin virtqueue. Such decoupling may ease the future > > development of the transport specific migration interface. > > I asked a simple question here. > > Lets stick to this. I answered this question. The virtqueue could be one of the approaches. And it's your responsibility to convince the community about that approach. Having an example may help people to understand your proposal. > > I'm not referring to internal state definitions. Without an example, how do we know if it can work well? > > Can you please not change the subject of my initial intent in the email ? Did I? Basically, I'm asking how a virtio-blk can be migrated with your proposal. Thanks > > Thanks. > > > > > > Thanks > > > >> > >>> Thanks > >>> > >>> > >>>> Cheers, > >>>> > >>>> -Max. > >>>> > >> This publicly archived list offers a means to provide input to the > >> OASIS Virtual I/O Device (VIRTIO) TC. > >> > >> In order to verify user consent to the Feedback License terms and > >> to minimize spam in the list archive, subscription is required > >> before posting. > >> > >> Subscribe: virtio-comment-subscribe@lists.oasis-open.org > >> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > >> List help: virtio-comment-help@lists.oasis-open.org > >> List archive: https://nam11.safelinks.protection.outlook.com/?url=3Dht= tps%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D= 04%7C01%7Cmgurtovoy%40nvidia.com%7C0c1898923999420201b408d961639dee%7C43083= d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637647902764977086%7CUnknown%7CTWFpbG= Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C= 1000&sdata=3DZvEbfxoQhPnFyy1m%2BM5%2FyD0UW%2FJ7YZnOliFepRsoU30%3D&r= eserved=3D0 > >> Feedback License: https://nam11.safelinks.protection.outlook.com/?url= =3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&= data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C0c1898923999420201b408d961639dee%= 7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637647902764977086%7CUnknown%7= CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn= 0%3D%7C1000&sdata=3DXsf04HBGeQb4qOwgAfUQucHsizBHV5Vj2WOJVc0AYTs%3D&= reserved=3D0 > >> List Guidelines: https://nam11.safelinks.protection.outlook.com/?url= =3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&a= mp;data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C0c1898923999420201b408d961639d= ee%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637647902764977086%7CUnknow= n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI= 6Mn0%3D%7C1000&sdata=3DGUVBMKmEmZxbhacMofMW9OitOgMscoIOOvS%2BkyPTg1Y%3D= &reserved=3D0 > >> Committee: https://nam11.safelinks.protection.outlook.com/?url=3Dhttps= %3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cm= gurtovoy%40nvidia.com%7C0c1898923999420201b408d961639dee%7C43083d15727340c1= b7db39efd9ccc17a%7C0%7C0%7C637647902764977086%7CUnknown%7CTWFpbGZsb3d8eyJWI= joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sd= ata=3DOTdKqw11N6FxaSNhxBlmLSheYYqnHGVGyfgyuoAzxOU%3D&reserved=3D0 > >> Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvi= dia.com%7C0c1898923999420201b408d961639dee%7C43083d15727340c1b7db39efd9ccc1= 7a%7C0%7C0%7C637647902764977086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDA= iLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3De80lScam= J3Sn5NqsiDBZ3tpc8oHl%2F0dVOq39pGHH4v8%3D&reserved=3D0 > >> > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> From: Max Gurtovoy Message-ID: <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> Date: Wed, 18 Aug 2021 14:45:57 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable Content-Language: en-US To: Jason Wang Cc: "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On 8/18/2021 1:46 PM, Jason Wang wrote: > On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy wrote= : >> >> On 8/17/2021 12:44 PM, Jason Wang wrote: >>> On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy wro= te: >>>> On 8/17/2021 11:51 AM, Jason Wang wrote: >>>>> =E5=9C=A8 2021/8/12 =E4=B8=8B=E5=8D=888:08, Max Gurtovoy =E5=86=99=E9= =81=93: >>>>>> Hi all, >>>>>> >>>>>> Live migration is one of the most important features of >>>>>> virtualization and virtio devices are oftenly found in virtual >>>>>> environments. >>>>>> >>>>>> The migration process is managed by a migration SW that is running o= n >>>>>> the hypervisor and the VM is not aware of the process at all. >>>>>> >>>>>> Unlike the vDPA case, a real pci Virtual Function state resides in >>>>>> the HW. >>>>>> >>>>> vDPA doesn't prevent you from having HW states. Actually from the vie= w >>>>> of the VMM(Qemu), it doesn't care whether or not a state is stored in >>>>> the software or hardware. A well designed VMM should be able to hide >>>>> the virtio device implementation from the migration layer, that is ho= w >>>>> Qemu is wrote who doesn't care about whether or not it's a software >>>>> virtio/vDPA device or not. >>>>> >>>>> >>>>>> In our vision, in order to fulfil the Live migration requirements fo= r >>>>>> virtual functions, each physical function device must implement >>>>>> migration operations. Using these operations, it will be able to >>>>>> master the migration process for the virtual function devices. Each >>>>>> capable physical function device has a supervisor permissions to >>>>>> change the virtual function operational states, save/restore its >>>>>> internal state and start/stop dirty pages tracking. >>>>>> >>>>> For "supervisor permissions", is this from the software point of view= ? >>>>> Maybe it's better to give an example for this. >>>> A permission to a PF device for quiesce and freeze a VF device for exa= mple. >>> Note that for safety, VMM (e.g Qemu) is usually running without any pri= vileges. >> You're mixing layers here. >> >> QEMU is not involved here. It's only sending IOCTLs to migration driver. >> The migration driver will control the migration process of the VF using >> the PF communication channel. > So who will be granted the "permission" you mentioned here? This is just an expression. What is not clear ? The PF device will have an option to quiesce/freeze the VF device. This is simple. Why are you looking for some sophisticated problems ? >> >>>>>> An example of this approach can be seen in the way NVIDIA performs >>>>>> live migration of a ConnectX NIC function: >>>>>> >>>>>> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci >>>>>> >>>>>> >>>>>> NVIDIAs SNAP technology enables hardware-accelerated software define= d >>>>>> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for storage >>>>>> and networking solutions. The host OS/hypervisor uses its standard >>>>>> drivers that are implemented according to a well-known VIRTIO >>>>>> specifications. >>>>>> >>>>>> In order to implement Live Migration for these virtual function >>>>>> devices, that use a standard drivers as mentioned, the specification >>>>>> should define how HW vendor should build their devices and for SW >>>>>> developers to adjust the drivers. >>>>>> >>>>>> This will enable specification compliant vendor agnostic solution. >>>>>> >>>>>> This is exactly how we built the migration driver for ConnectX >>>>>> (internal HW design doc) and I guess that this is the way other >>>>>> vendors work. >>>>>> >>>>>> For that, I would like to know if the approach of =E2=80=9CPF that c= ontrols >>>>>> the VF live migration process=E2=80=9D is acceptable by the VIRTIO t= echnical >>>>>> group ? >>>>>> >>>>> I'm not sure but I think it's better to start from the general >>>>> facility for all transports, then develop features for a specific >>>>> transport. >>>> a general facility for all transports can be a generic admin queue ? >>> It could be a virtqueue or a transport specific method (pcie capability= ). >> No. You said a general facility for all transports. > For general facility, I mean the chapter 2 of the spec which is general > > " > 2 Basic Facilities of a Virtio Device > " > It will be in chapter 2. Right after "2.11 Exporting Object" I can add=20 "2.12 Admin Virtqueues" and this is what I did in the RFC. >> Transport specific is not general. > The transport is in charge of implementing the interface for those facili= ties. Transport specific is not general. > >>> E.g we can define what needs to be migrated for the virtio-blk first >>> (the device state). Then we can define the interface to get and set >>> those states via admin virtqueue. Such decoupling may ease the future >>> development of the transport specific migration interface. >> I asked a simple question here. >> >> Lets stick to this. > I answered this question. No you didn't answer. I asked=C2=A0 if the approach of =E2=80=9CPF that controls the VF live migr= ation=20 process=E2=80=9D is acceptable by the VIRTIO technical group ? And you take the discussion to your direction instead of answering a=20 Yes/No question. > The virtqueue could be one of the > approaches. And it's your responsibility to convince the community > about that approach. Having an example may help people to understand > your proposal. > >> I'm not referring to internal state definitions. > Without an example, how do we know if it can work well? > >> Can you please not change the subject of my initial intent in the email = ? > Did I? Basically, I'm asking how a virtio-blk can be migrated with > your proposal. The virtio-blk PF admin queue will be used to manage the virtio-blk VF=20 migration. This is the whole discussion. I don't want to get into resolution. Since you already know the answer as I published 4 RFCs already with all=20 the flow. Lets stick to my question. > Thanks > >> Thanks. >> >> >>> Thanks >>> >>>>> Thanks >>>>> >>>>> >>>>>> Cheers, >>>>>> >>>>>> -Max. >>>>>> >>>> This publicly archived list offers a means to provide input to the >>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>> >>>> In order to verify user consent to the Feedback License terms and >>>> to minimize spam in the list archive, subscription is required >>>> before posting. >>>> >>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>> List help: virtio-comment-help@lists.oasis-open.org >>>> List archive: https://nam11.safelinks.protection.outlook.com/?url=3Dht= tps%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D= 04%7C01%7Cmgurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c9c%7C43083= d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637648804586291850%7CUnknown%7CTWFpbG= Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C= 1000&sdata=3Dbsdgv6XEcsFCSLo0G00WxKUaSQzj0xh4TLlOR2v4c8Y%3D&reserve= d=3D0 >>>> Feedback License: https://nam11.safelinks.protection.outlook.com/?url= =3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&= data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c9c%= 7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637648804586301810%7CUnknown%7= CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn= 0%3D%7C1000&sdata=3DP0NLoCAtirxtRJT6%2FhLir%2BHAPgZkFOIaKKLf3wgzRpE%3D&= amp;reserved=3D0 >>>> List Guidelines: https://nam11.safelinks.protection.outlook.com/?url= =3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&a= mp;data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c= 9c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637648804586301810%7CUnknow= n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI= 6Mn0%3D%7C1000&sdata=3DgOcr8NTEiA0142OTIayO5C%2FnKfaROqSYtCpBYEfyrds%3D= &reserved=3D0 >>>> Committee: https://nam11.safelinks.protection.outlook.com/?url=3Dhttps= %3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cm= gurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c9c%7C43083d15727340c1= b7db39efd9ccc17a%7C0%7C0%7C637648804586301810%7CUnknown%7CTWFpbGZsb3d8eyJWI= joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sd= ata=3DEbQ3NmU7YDLLetvoS41PxtADJx1TmWK90INGjZozrkk%3D&reserved=3D0 >>>> Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvi= dia.com%7Cb8808c97b68a409d091a08d962356c9c%7C43083d15727340c1b7db39efd9ccc1= 7a%7C0%7C0%7C637648804586301810%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDA= iLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DxWF9jQSt= dg9SBspPSs8w5KYcZS08G72tfEKpd9bir2g%3D&reserved=3D0 >>>> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> From: Jason Wang Message-ID: <5b4db3ac-0291-69ea-7e82-5b3f19049e61@redhat.com> Date: Thu, 19 Aug 2021 10:44:46 +0800 MIME-Version: 1.0 In-Reply-To: <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-US To: Max Gurtovoy Cc: "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: 在 2021/8/18 下午7:45, Max Gurtovoy 写道: > > On 8/18/2021 1:46 PM, Jason Wang wrote: >> On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy >> wrote: >>> >>> On 8/17/2021 12:44 PM, Jason Wang wrote: >>>> On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy >>>> wrote: >>>>> On 8/17/2021 11:51 AM, Jason Wang wrote: >>>>>> 在 2021/8/12 下午8:08, Max Gurtovoy 写道: >>>>>>> Hi all, >>>>>>> >>>>>>> Live migration is one of the most important features of >>>>>>> virtualization and virtio devices are oftenly found in virtual >>>>>>> environments. >>>>>>> >>>>>>> The migration process is managed by a migration SW that is >>>>>>> running on >>>>>>> the hypervisor and the VM is not aware of the process at all. >>>>>>> >>>>>>> Unlike the vDPA case, a real pci Virtual Function state resides in >>>>>>> the HW. >>>>>>> >>>>>> vDPA doesn't prevent you from having HW states. Actually from the >>>>>> view >>>>>> of the VMM(Qemu), it doesn't care whether or not a state is >>>>>> stored in >>>>>> the software or hardware. A well designed VMM should be able to hide >>>>>> the virtio device implementation from the migration layer, that >>>>>> is how >>>>>> Qemu is wrote who doesn't care about whether or not it's a software >>>>>> virtio/vDPA device or not. >>>>>> >>>>>> >>>>>>> In our vision, in order to fulfil the Live migration >>>>>>> requirements for >>>>>>> virtual functions, each physical function device must implement >>>>>>> migration operations. Using these operations, it will be able to >>>>>>> master the migration process for the virtual function devices. Each >>>>>>> capable physical function device has a supervisor permissions to >>>>>>> change the virtual function operational states, save/restore its >>>>>>> internal state and start/stop dirty pages tracking. >>>>>>> >>>>>> For "supervisor permissions", is this from the software point of >>>>>> view? >>>>>> Maybe it's better to give an example for this. >>>>> A permission to a PF device for quiesce and freeze a VF device for >>>>> example. >>>> Note that for safety, VMM (e.g Qemu) is usually running without any >>>> privileges. >>> You're mixing layers here. >>> >>> QEMU is not involved here. It's only sending IOCTLs to migration >>> driver. >>> The migration driver will control the migration process of the VF using >>> the PF communication channel. >> So who will be granted the "permission" you mentioned here? > > This is just an expression. > > What is not clear ? Well, the "supervisor permission" usually means it must be done that way otherwise it may have security implication. But your answer sounds nothing related to that which is confusing. > > The PF device will have an option to quiesce/freeze the VF device. Is such design a must? If no, why not simply introduce those functions in the VF? If yes, what's the reason for making virtio different (e.g VCPU live migration is not designed like that)? > > This is simple. Why are you looking for some sophisticated problems ? It's pretty natural that people may review the patch or proposal from different angles. But it looks to me it's not something you want to see? If you mandate people to think the same as you, that's not how the community work. And it makes the conversation very hard. Before we moving forward, I think we should agree on some basic code-of-conduct as what Linux had: https://www.kernel.org/doc/html/latest/process/code-of-conduct.html. Especially the second standard: "Being respectful of differing viewpoints and experiences". In the mean time, it's your duty to explain the motivation in a clear way or explain it to the reviewers. I suggest you to re-visit how to submit patches: https://www.kernel.org/doc/html/latest/process/submitting-patches.html > >>> >>>>>>> An example of this approach can be seen in the way NVIDIA performs >>>>>>> live migration of a ConnectX NIC function: >>>>>>> >>>>>>> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci >>>>>>> >>>>>>> >>>>>>> NVIDIAs SNAP technology enables hardware-accelerated software >>>>>>> defined >>>>>>> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for storage >>>>>>> and networking solutions. The host OS/hypervisor uses its standard >>>>>>> drivers that are implemented according to a well-known VIRTIO >>>>>>> specifications. >>>>>>> >>>>>>> In order to implement Live Migration for these virtual function >>>>>>> devices, that use a standard drivers as mentioned, the >>>>>>> specification >>>>>>> should define how HW vendor should build their devices and for SW >>>>>>> developers to adjust the drivers. >>>>>>> >>>>>>> This will enable specification compliant vendor agnostic solution. >>>>>>> >>>>>>> This is exactly how we built the migration driver for ConnectX >>>>>>> (internal HW design doc) and I guess that this is the way other >>>>>>> vendors work. >>>>>>> >>>>>>> For that, I would like to know if the approach of “PF that controls >>>>>>> the VF live migration process” is acceptable by the VIRTIO >>>>>>> technical >>>>>>> group ? >>>>>>> >>>>>> I'm not sure but I think it's better to start from the general >>>>>> facility for all transports, then develop features for a specific >>>>>> transport. >>>>> a general facility for all transports can be a generic admin queue ? >>>> It could be a virtqueue or a transport specific method (pcie >>>> capability). >>> No. You said a general facility for all transports. >> For general facility, I mean the chapter 2 of the spec which is general >> >> " >> 2 Basic Facilities of a Virtio Device >> " >> > It will be in chapter 2. Right after "2.11 Exporting Object" I can add > "2.12 Admin Virtqueues" and this is what I did in the RFC. The point is, migration should be an independent facility and it's possible to be done in transport specific way other than the admin virtqueue. > >>> Transport specific is not general. >> The transport is in charge of implementing the interface for those >> facilities. > > Transport specific is not general. > > >> >>>> E.g we can define what needs to be migrated for the virtio-blk first >>>> (the device state). Then we can define the interface to get and set >>>> those states via admin virtqueue. Such decoupling may ease the future >>>> development of the transport specific migration interface. >>> I asked a simple question here. >>> >>> Lets stick to this. >> I answered this question. > > No you didn't answer. I answered "I'm not sure". Or are you expecting the answer like yes or no? Of course I can't answer like that since it depends on whether your proposal is agreed by the vast majority of the members and the other procedure e.g voting before it is merged. You may refer this doc to see about the procedure: https://github.com/oasis-tcs/virtio-admin/blob/master/README.md > > I asked  if the approach of “PF that controls the VF live migration > process” is acceptable by the VIRTIO technical group ? > > And you take the discussion to your direction instead of answering a > Yes/No question. I don't get the point of this question. If the reviewer think a direction may help, the review has the right to do that. And what I want to say is: 1) I'm not sure it can be acceptable (I can't speak for the whole TC) 2) but I have idea to help people to understand the proposal (start form an example) > >>    The virtqueue could be one of the >> approaches. And it's your responsibility to convince the community >> about that approach. Having an example may help people to understand >> your proposal. >> >>> I'm not referring to internal state definitions. >> Without an example, how do we know if it can work well? >> >>> Can you please not change the subject of my initial intent in the >>> email ? >> Did I? Basically, I'm asking how a virtio-blk can be migrated with >> your proposal. > > The virtio-blk PF admin queue will be used to manage the virtio-blk VF > migration. > > This is the whole discussion. I don't want to get into resolution. > > Since you already know the answer as I published 4 RFCs already with > all the flow. No I don't, especially the part of device states that need to be migrated. Even if I knew the answer, it doesn't mean other people can easily understand that. You only add a github link for to your mlx5e development tree, it's really hard to see the connections. And you don't even mention the 4 RFCS you've posted (and a lot of comments were not addressed there). > > Lets stick to my question. I don't think you expectation can be met through "Hey, I have an idea, and you know how it work, does it make sense?". Especially consider it's a complicated issue. Thanks > >> Thanks >> >>> Thanks. >>> >>> >>>> Thanks >>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> -Max. >>>>>>> >>>>> This publicly archived list offers a means to provide input to the >>>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>>> >>>>> In order to verify user consent to the Feedback License terms and >>>>> to minimize spam in the list archive, subscription is required >>>>> before posting. >>>>> >>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>>> List help: virtio-comment-help@lists.oasis-open.org >>>>> List archive: >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c9c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637648804586291850%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bsdgv6XEcsFCSLo0G00WxKUaSQzj0xh4TLlOR2v4c8Y%3D&reserved=0 >>>>> Feedback License: >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c9c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637648804586301810%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P0NLoCAtirxtRJT6%2FhLir%2BHAPgZkFOIaKKLf3wgzRpE%3D&reserved=0 >>>>> List Guidelines: >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c9c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637648804586301810%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gOcr8NTEiA0142OTIayO5C%2FnKfaROqSYtCpBYEfyrds%3D&reserved=0 >>>>> Committee: >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c9c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637648804586301810%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=EbQ3NmU7YDLLetvoS41PxtADJx1TmWK90INGjZozrkk%3D&reserved=0 >>>>> Join OASIS: >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c9c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637648804586301810%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xWF9jQStdg9SBspPSs8w5KYcZS08G72tfEKpd9bir2g%3D&reserved=0 >>>>> > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 19 Aug 2021 10:58:58 -0400 From: "Michael S. Tsirkin" Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Message-ID: <20210819104216-mutt-send-email-mst@kernel.org> References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <5b4db3ac-0291-69ea-7e82-5b3f19049e61@redhat.com> MIME-Version: 1.0 In-Reply-To: <5b4db3ac-0291-69ea-7e82-5b3f19049e61@redhat.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline To: Jason Wang Cc: Max Gurtovoy , "virtio-comment@lists.oasis-open.org" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Thu, Aug 19, 2021 at 10:44:46AM +0800, Jason Wang wrote: > > > > The PF device will have an option to quiesce/freeze the VF device. > > > Is such design a must? If no, why not simply introduce those functions in > the VF? Many IOMMUs only support protections at the function level. Thus we need ability to have one device (e.g. a PF) to control migration of another (e.g. a VF). This is because allowing VF to access hypervisor memory used for migration is not a good idea. For IOMMUs that support subfunctions, these "devices" could be subfunctions. The only alternative is to keep things in device memory which does not need an IOMMU. I guess we'd end up with something like a VQ in device memory which might be tricky from multiple points of view, but yes, this could be useful and people did ask for such a capability in the past. > If yes, what's the reason for making virtio different (e.g VCPU live > migration is not designed like that)? I think the main difference is we need PF's help for memory tracking for pre-copy migration anyway. Might as well integrate the rest of state in the same channel. Another answer is that CPUs trivially switch between functions by switching the active page tables. For PCI DMA it is all much trickier sine the page tables can be separate from the device, and assumed to be mostly static. So if you want to create something like the VMCS then again you either need some help from another device or put it in device memory. -- MST From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <5b4db3ac-0291-69ea-7e82-5b3f19049e61@redhat.com> <20210819104216-mutt-send-email-mst@kernel.org> From: Jason Wang Message-ID: Date: Fri, 20 Aug 2021 10:17:05 +0800 MIME-Version: 1.0 In-Reply-To: <20210819104216-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset="gbk"; format="flowed" Content-Transfer-Encoding: base64 Content-Language: en-US To: "Michael S. Tsirkin" Cc: Max Gurtovoy , "virtio-comment@lists.oasis-open.org" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: CtTaIDIwMjEvOC8xOSDPws7nMTA6NTgsIE1pY2hhZWwgUy4gVHNpcmtpbiDQtLXAOgo+IE9uIFRo dSwgQXVnIDE5LCAyMDIxIGF0IDEwOjQ0OjQ2QU0gKzA4MDAsIEphc29uIFdhbmcgd3JvdGU6Cj4+ PiBUaGUgUEYgZGV2aWNlIHdpbGwgaGF2ZSBhbiBvcHRpb24gdG8gcXVpZXNjZS9mcmVlemUgdGhl IFZGIGRldmljZS4KPj4KPj4gSXMgc3VjaCBkZXNpZ24gYSBtdXN0PyBJZiBubywgd2h5IG5vdCBz aW1wbHkgaW50cm9kdWNlIHRob3NlIGZ1bmN0aW9ucyBpbgo+PiB0aGUgVkY/Cj4gTWFueSBJT01N VXMgb25seSBzdXBwb3J0IHByb3RlY3Rpb25zIGF0IHRoZSBmdW5jdGlvbiBsZXZlbC4KPiBUaHVz IHdlIG5lZWQgYWJpbGl0eSB0byBoYXZlIG9uZSBkZXZpY2UgKGUuZy4gYSBQRikKPiB0byBjb250 cm9sIG1pZ3JhdGlvbiBvZiBhbm90aGVyIChlLmcuIGEgVkYpLgoKClNvIGFzIGRpc2N1c3NlZCBw cmV2aW91c2x5LCB0aGUgb25seSBwb3NzaWJsZSAiYWR2YW50YWdlIiBpcyB0aGF0IHRoZSAKRE1B IGlzIGlzb2xhdGVkLgoKCj4gVGhpcyBpcyBiZWNhdXNlIGFsbG93aW5nIFZGIHRvIGFjY2VzcyBo eXBlcnZpc29yIG1lbW9yeSB1c2VkIGZvcgo+IG1pZ3JhdGlvbiBpcyBub3QgYSBnb29kIGlkZWEu Cj4gRm9yIElPTU1VcyB0aGF0IHN1cHBvcnQgc3ViZnVuY3Rpb25zLCB0aGVzZSAiZGV2aWNlcyIg Y291bGQgYmUKPiBzdWJmdW5jdGlvbnMuCj4KPiBUaGUgb25seSBhbHRlcm5hdGl2ZSBpcyB0byBr ZWVwIHRoaW5ncyBpbiBkZXZpY2UgbWVtb3J5IHdoaWNoCj4gZG9lcyBub3QgbmVlZCBhbiBJT01N VS4KPiBJIGd1ZXNzIHdlJ2QgZW5kIHVwIHdpdGggc29tZXRoaW5nIGxpa2UgYSBWUSBpbiBkZXZp Y2UgbWVtb3J5IHdoaWNoIG1pZ2h0Cj4gYmUgdHJpY2t5IGZyb20gbXVsdGlwbGUgcG9pbnRzIG9m IHZpZXcsIGJ1dCB5ZXMsIHRoaXMgY291bGQgYmUKPiB1c2VmdWwgYW5kIHBlb3BsZSBkaWQgYXNr IGZvciBzdWNoIGEgY2FwYWJpbGl0eSBpbiB0aGUgcGFzdC4KCgpJIGFzc3VtZSB0aGUgc3BlYyBh bHJlYWR5IHN1cHBvcnQgdGhpcy4gV2UgcHJvYmFibHkgbmVlZCBzb21lIApjbGFyaWZpY2F0aW9u IGF0IHRoZSB0cmFuc3BvcnQgbGF5ZXIuIEJ1dCBpdCdzIGFzIHNpbXBsZSBhcyBzZXR0aW5nIE1N SU8gCmFyZSBhcyB2aXJ0cXVldWUgYWRkcmVzcz8KCkV4Y2VwdCBmb3IgdGhlIGRpcnR5IGJpdCB0 cmFja2luZywgd2UgZG9uJ3QgaGF2ZSBidWxrIGRhdGEgdGhhdCBuZWVkcyB0byAKYmUgdHJhbnNm ZXJyZWQgZHVyaW5nIG1pZ3JhdGlvbi4gU28gYSB2aXJ0cXVldWUgaXMgbm90IG11c3QgZXZlbiBp biB0aGlzIApjYXNlLgoKCj4KPj4gSWYgeWVzLCB3aGF0J3MgdGhlIHJlYXNvbiBmb3IgbWFraW5n IHZpcnRpbyBkaWZmZXJlbnQgKGUuZyBWQ1BVIGxpdmUKPj4gbWlncmF0aW9uIGlzIG5vdCBkZXNp Z25lZCBsaWtlIHRoYXQpPwo+IEkgdGhpbmsgdGhlIG1haW4gZGlmZmVyZW5jZSBpcyB3ZSBuZWVk IFBGJ3MgaGVscCBmb3IgbWVtb3J5Cj4gdHJhY2tpbmcgZm9yIHByZS1jb3B5IG1pZ3JhdGlvbiBh bnl3YXkuCgoKU3VjaCBraW5kIG9mIG1lbW9yeSB0cmFja2luZyBpcyBub3QgYSBtdXN0LiBLVk0g dXNlcyBzb2Z0d2FyZSBhc3Npc3RlZCAKdGVjaG5vbG9naWVzICh3cml0ZSBwcm90ZWN0aW9uKSBh bmQgaXQgd29ya3MgdmVyeSB3ZWxsLiBGb3IgdmlydGlvLCAKdGVjaG5vbG9neSBsaWtlIHNoYWRv dyB2aXJ0cXVldWUgaGFzIGJlZW4gdXNlZCBieSBEUERLIGFuZCBwcm90b3R5cGVkIGJ5IApFdWdl bmlvLgoKRXZlbiBpZiB3ZSB3YW50IHRvIGdvIHdpdGggaGFyZHdhcmUgdGVjaG5vbG9neSwgd2Ug aGF2ZSBtYW55IAphbHRlcm5hdGl2ZXMgKGFzIHdlJ3ZlIGRpc2N1c3NlZCBpbiB0aGUgcGFzdCk6 CgoxKSBJT01NVSBkaXJ0eSBiaXQgKEUuZyBtb2Rlcm4gSU9NTVUgaGF2ZSBFQSBiaXQgZm9yIGxv Z2dpbmcgZXh0ZXJuYWwgCmRldmljZSB3cml0ZSkKMikgV3JpdGUgcHJvdGVjdGlvbiB2aWEgSU9N TVUgb3IgZGV2aWNlIE1NVQozKSBBZGRyZXNzIHNwYWNlIElEIGZvciBpc29sYXRpbmcgRE1BcwoK VXNpbmcgcGh5c2ljYWwgZnVuY3Rpb24gaXMgc3ViLW9wdGltYWwgdGhhdCBhbGwgb2YgdGhlIGFi b3ZlIHNpbmNlOgoKMSkgbGltaXRlZCB0byBhIHNwZWNpZmljIHRyYW5zcG9ydCBvciBpbXBsZW1l bnRhdGlvbiBhbmQgaXQgZG9lc24ndCB3b3JrIApmb3IgZGV2aWNlIG9yIHRyYW5zcG9ydCB3aXRo b3V0IFBGCjIpIHRoZSB2aXJ0aW8gbGV2ZWwgZnVuY3Rpb24gaXMgbm90IHNlbGYgY29udGFpbmVk LCB0aGlzIG1ha2VzIGFueSAKZmVhdHVyZSB0aGF0IHRpZXMgdG8gUEYgaW1wb3NzaWJsZSB0byBi ZSB1c2VkIGluIHRoZSBuZXN0ZWQgbGF5ZXIKMykgbW9yZSBjb21wbGljYXRlZCB0aGFuIGxldmVy YWdpbmcgdGhlIGV4aXN0aW5nIGZhY2lsaXRpZXMgcHJvdmlkZWQgYnkgCnRoZSBwbGF0Zm9ybSBv ciB0cmFuc3BvcnQKCkNvbnNpZGVyIChQKUFTSUQgd2lsbCBiZSByZWFkeSB2ZXJ5IHNvb24sIHdv cmthcm91bmQgdGhlIHBsYXRmb3JtIApsaW1pdGF0aW9uIHZpYSBQRiBpcyBub3QgYSBnb29kIGlk ZWEgZm9yIG1lLiBFc3BlY2lhbGx5IGNvbnNpZGVyIGl0J3MgCm5vdCBhIG11c3QgYW5kIHdlIGhh ZCBhbHJlYWR5IHByb3RvdHlwZSB0aGUgc29mdHdhcmUgYXNzaXN0ZWQgdGVjaG5vbG9neS4KCgo+ ICAgTWlnaHQgYXMgd2VsbCBpbnRlZ3JhdGUKPiB0aGUgcmVzdCBvZiBzdGF0ZSBpbiB0aGUgc2Ft ZSBjaGFubmVsLgoKClRoYXQncyBhbm90aGVyIHF1ZXN0aW9uLiBJIHRoaW5rIGZvciB0aGUgZnVu Y3Rpb24gdGhhdCBpcyBhIG11c3QgZm9yIApkb2luZyBsaXZlIG1pZ3JhdGlvbiwgaW50cm9kdWNp bmcgdGhlbSBpbiB0aGUgZnVuY3Rpb24gaXRzZWxmIGlzIHRoZSAKbW9zdCBuYXR1cmFsIHdheSBz aW5jZSB3ZSBkaWQgYWxsIHRoZSBvdGhlciBmYWNpbGl0aWVzIHRoZXJlLiBUaGlzIGVhc2UgCnRo ZSBmdW5jdGlvbiB0aGF0IGNhbiBiZSB1c2VkIGluIHRoZSBuZXN0ZWQgbGF5ZXIuCgpBbmQgdXNp bmcgdGhlIGNoYW5uZWwgaW4gdGhlIFBGIGlzIG5vdCBjb21pbmcgZm9yIGZyZWUuIEl0IHJlcXVp cmVzIApzeW5jaHJvbml6YXRpb24gaW4gdGhlIHNvZnR3YXJlIG9yIGV2ZW4gUU9TLgoKT3Igd2Ug Y2FuIGp1c3Qgc2VwYXJhdGUgdGhlIGRpcnR5IHBhZ2UgdHJhY2tpbmcgaW50byBQRiAoYnV0IG5l ZWQgdG8gCmRlZmluZSB0aGVtIGFzIGJhc2ljIGZhY2lsaXR5IGZvciBmdXR1cmUgZXh0ZW5zaW9u KS4KCgo+Cj4gQW5vdGhlciBhbnN3ZXIgaXMgdGhhdCBDUFVzIHRyaXZpYWxseSBzd2l0Y2ggYmV0 d2Vlbgo+IGZ1bmN0aW9ucyBieSBzd2l0Y2hpbmcgdGhlIGFjdGl2ZSBwYWdlIHRhYmxlcy4gRm9y IFBDSSBETUEKPiBpdCBpcyBhbGwgbXVjaCB0cmlja2llciBzaW5lIHRoZSBwYWdlIHRhYmxlcyBj YW4gYmUgc2VwYXJhdGUKPiBmcm9tIHRoZSBkZXZpY2UsIGFuZCBhc3N1bWVkIHRvIGJlIG1vc3Rs eSBzdGF0aWMuCgoKSSBkb24ndCBzZWUgbXVjaCBkaWZmZXJlbnQsIHRoZSBwYWdlIHRhYmxlIGlz IGFsc28gc2VwYXJhdGVkIGZyb20gdGhlIApDUFUuIElmIHRoZSBkZXZpY2Ugc3VwcG9ydHMgc3Rh dGUgc2F2ZSBhbmQgcmVzdG9yZSB3ZSBjYW4gc2NoZWR1bGluZyB0aGUgCm11bHRpcGxlIFZNcy9W Q1BVcyBvbiB0aGUgc2FtZSBkZXZpY2UuCgoKPiBTbyBpZiB5b3Ugd2FudCB0byBjcmVhdGUgc29t ZXRoaW5nIGxpa2UgdGhlIFZNQ1MgdGhlbgo+IGFnYWluIHlvdSBlaXRoZXIgbmVlZCBzb21lIGhl bHAgZnJvbSBhbm90aGVyIGRldmljZSBvcgo+IHB1dCBpdCBpbiBkZXZpY2UgbWVtb3J5LgoKCkZv ciBDUFUgdmlydHVhbGl6YXRpb24sIHRoZSBzdGF0ZXMgY291bGQgYmUgc2F2ZWQgYW5kIHJlc3Rv cmVkIHZpYSBNU1JzLiAKRm9yIHZpcnRpbywgYWNjZXNzaW5nIHRoZW0gdmlhIHJlZ2lzdGVycyBp cyBhbHNvIHBvc3NpYmxlIGFuZCBtdWNoIG1vcmUgCnNpbXBsZS4KClRoYW5rcwoKCj4KPgoK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 20 Aug 2021 03:03:57 -0400 From: "Michael S. Tsirkin" Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Message-ID: <20210820023540-mutt-send-email-mst@kernel.org> References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <5b4db3ac-0291-69ea-7e82-5b3f19049e61@redhat.com> <20210819104216-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit To: Jason Wang Cc: Max Gurtovoy , "virtio-comment@lists.oasis-open.org" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Fri, Aug 20, 2021 at 10:17:05AM +0800, Jason Wang wrote: > > 在 2021/8/19 下午10:58, Michael S. Tsirkin 写道: > > On Thu, Aug 19, 2021 at 10:44:46AM +0800, Jason Wang wrote: > > > > The PF device will have an option to quiesce/freeze the VF device. > > > > > > Is such design a must? If no, why not simply introduce those functions in > > > the VF? > > Many IOMMUs only support protections at the function level. > > Thus we need ability to have one device (e.g. a PF) > > to control migration of another (e.g. a VF). > > > So as discussed previously, the only possible "advantage" is that the DMA is > isolated. > > > > This is because allowing VF to access hypervisor memory used for > > migration is not a good idea. > > For IOMMUs that support subfunctions, these "devices" could be > > subfunctions. > > > > The only alternative is to keep things in device memory which > > does not need an IOMMU. > > I guess we'd end up with something like a VQ in device memory which might > > be tricky from multiple points of view, but yes, this could be > > useful and people did ask for such a capability in the past. > > > I assume the spec already support this. We probably need some clarification > at the transport layer. But it's as simple as setting MMIO are as virtqueue > address? Several issues - we do not support changing VQ address. Devices do need to support changing memory addresses. - Ordering becomes tricky especially . E.g. when device reads descriptor in VQ memory it suddenly does not flush out writes into buffer that is potentially in RAM. We might also need even stronger barriers on the driver side. We used dma_wmb but now it's probably need to be wmb. Reading multibyte structures from device memory is slow. To get reasonable performance we might need to mark this device memory WB or WC. That generally makes things even trickier. > Except for the dirty bit tracking, we don't have bulk data that needs to be > transferred during migration. So a virtqueue is not must even in this case. Main traffic is write tracking. > > > > > > If yes, what's the reason for making virtio different (e.g VCPU live > > > migration is not designed like that)? > > I think the main difference is we need PF's help for memory > > tracking for pre-copy migration anyway. > > > Such kind of memory tracking is not a must. KVM uses software assisted > technologies (write protection) and it works very well. So page-fault support is absolutely a viable option IMHO. To work well we need VIRTIO_F_PARTIAL_ORDER - there was not a lot of excitement but sure I will finalize and repost it. However we need support for reporting and handling faults. Again this is data path stuff and needs to be under hypervisor control so I guess we get right back to having this in the PF? > For virtio, > technology like shadow virtqueue has been used by DPDK and prototyped by > Eugenio. That's ok but I think since it affects performance at 100% of the time when active we can not rely on this as the only solution. > Even if we want to go with hardware technology, we have many alternatives > (as we've discussed in the past): > > 1) IOMMU dirty bit (E.g modern IOMMU have EA bit for logging external device > write) > 2) Write protection via IOMMU or device MMU > 3) Address space ID for isolating DMAs Not all systems support any of the above unfortunately. Also some systems might have a limited # of PASIDs. So burning up a extra PASID per VF halving their number might not be great as the only option. > > Using physical function is sub-optimal that all of the above since: > > 1) limited to a specific transport or implementation and it doesn't work for > device or transport without PF > 2) the virtio level function is not self contained, this makes any feature > that ties to PF impossible to be used in the nested layer > 3) more complicated than leveraging the existing facilities provided by the > platform or transport I think I disagree with 2 and 3 above simply because controlling VFs through a PF is how all other devices did this. About 1 - well this is just about us being smart and writing this in a way that is generic enough, right? E.g. include options for PASIDs too. Note that support for cross-device addressing is useful even outside of migration. We also have things like priority where it is useful to adjust properties of a VF on the fly while it is active. Again the normal way all devices do this is through a PF. Yes a bunch of tricks in QEMU is possible but having a driver in host kernel and just handle it in a contained way is way cleaner. > Consider (P)ASID will be ready very soon, workaround the platform limitation > via PF is not a good idea for me. Especially consider it's not a must and we > had already prototype the software assisted technology. Well PASID is just one technology. > > > Might as well integrate > > the rest of state in the same channel. > > > That's another question. I think for the function that is a must for doing > live migration, introducing them in the function itself is the most natural > way since we did all the other facilities there. This ease the function that > can be used in the nested layer. > > And using the channel in the PF is not coming for free. It requires > synchronization in the software or even QOS. > > Or we can just separate the dirty page tracking into PF (but need to define > them as basic facility for future extension). Well maybe just start focusing on write tracking, sure. Once there's a proposal for this we can see whether adding other state there is easier or harder. > > > > > Another answer is that CPUs trivially switch between > > functions by switching the active page tables. For PCI DMA > > it is all much trickier sine the page tables can be separate > > from the device, and assumed to be mostly static. > > > I don't see much different, the page table is also separated from the CPU. > If the device supports state save and restore we can scheduling the multiple > VMs/VCPUs on the same device. It's just that performance is terrible. If you keep losing packets migration might as well not be live. > > > So if you want to create something like the VMCS then > > again you either need some help from another device or > > put it in device memory. > > > For CPU virtualization, the states could be saved and restored via MSRs. For > virtio, accessing them via registers is also possible and much more simple. > > Thanks IMy guess is performance is going to be bad. MSRs are part of the same CPU that is executing the accesses.... > > > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <5b4db3ac-0291-69ea-7e82-5b3f19049e61@redhat.com> <20210819104216-mutt-send-email-mst@kernel.org> <20210820023540-mutt-send-email-mst@kernel.org> In-Reply-To: <20210820023540-mutt-send-email-mst@kernel.org> From: Jason Wang Date: Fri, 20 Aug 2021 15:49:55 +0800 Message-ID: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable To: "Michael S. Tsirkin" Cc: Max Gurtovoy , "virtio-comment@lists.oasis-open.org" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Fri, Aug 20, 2021 at 3:04 PM Michael S. Tsirkin wrote: > > On Fri, Aug 20, 2021 at 10:17:05AM +0800, Jason Wang wrote: > > > > =E5=9C=A8 2021/8/19 =E4=B8=8B=E5=8D=8810:58, Michael S. Tsirkin =E5=86= =99=E9=81=93: > > > On Thu, Aug 19, 2021 at 10:44:46AM +0800, Jason Wang wrote: > > > > > The PF device will have an option to quiesce/freeze the VF device= . > > > > > > > > Is such design a must? If no, why not simply introduce those functi= ons in > > > > the VF? > > > Many IOMMUs only support protections at the function level. > > > Thus we need ability to have one device (e.g. a PF) > > > to control migration of another (e.g. a VF). > > > > > > So as discussed previously, the only possible "advantage" is that the D= MA is > > isolated. > > > > > > > This is because allowing VF to access hypervisor memory used for > > > migration is not a good idea. > > > For IOMMUs that support subfunctions, these "devices" could be > > > subfunctions. > > > > > > The only alternative is to keep things in device memory which > > > does not need an IOMMU. > > > I guess we'd end up with something like a VQ in device memory which m= ight > > > be tricky from multiple points of view, but yes, this could be > > > useful and people did ask for such a capability in the past. > > > > > > I assume the spec already support this. We probably need some clarifica= tion > > at the transport layer. But it's as simple as setting MMIO are as virtq= ueue > > address? > > Several issues > - we do not support changing VQ address. Devices do need to support > changing memory addresses. So it looks like a transport specific requirement (PCI-E) instead of a general issue. > - Ordering becomes tricky especially . > E.g. when device reads descriptor in VQ > memory it suddenly does not flush out writes into buffer > that is potentially in RAM. We might also need even stronger > barriers on the driver side. We used dma_wmb but now it's > probably need to be wmb. > Reading multibyte structures from device memory is slow. > To get reasonable performance we might need to mark this device memory > WB or WC. That generally makes things even trickier. I agree, but still they are all transport specific requirements. If we do that in a PCI-E BAR, the driver must obey the ordering rule for PCI to make it work. > > > > Except for the dirty bit tracking, we don't have bulk data that needs t= o be > > transferred during migration. So a virtqueue is not must even in this c= ase. > > Main traffic is write tracking. Right. > > > > > > > > > > > If yes, what's the reason for making virtio different (e.g VCPU liv= e > > > > migration is not designed like that)? > > > I think the main difference is we need PF's help for memory > > > tracking for pre-copy migration anyway. > > > > > > Such kind of memory tracking is not a must. KVM uses software assisted > > technologies (write protection) and it works very well. > > So page-fault support is absolutely a viable option IMHO. > To work well we need VIRTIO_F_PARTIAL_ORDER - there was not > a lot of excitement but sure I will finalize and repost it. As discussed before, it looks like a performance optimization but not a mus= t? I guess we don't do that for KVM and it works well. > > > However we need support for reporting and handling faults. > Again this is data path stuff and needs to be under > hypervisor control so I guess we get right back > to having this in the PF? So it depends on whether it requires a DMA. If it's just something like a CR2 register, we don't need PF. > > > > > > > For virtio, > > technology like shadow virtqueue has been used by DPDK and prototyped b= y > > Eugenio. > > That's ok but I think since it affects performance at 100% of the > time when active we can not rely on this as the only solution. This part I don't understand: - KVM writes protect the pages, so it loses performance as well. - If we are using virtqueue for reporting dirty bitmap, it can easily run out of space and we will lose the performance as well - If we are using bitmap/bytemap, we may also losing the performance (e.g the huge footprint) or at PCI level So I'm not against the idea, what I think makes more sense is not limit the facilities like device states, dirty page tracking to the PF. > > > > Even if we want to go with hardware technology, we have many alternativ= es > > (as we've discussed in the past): > > > > 1) IOMMU dirty bit (E.g modern IOMMU have EA bit for logging external d= evice > > write) > > 2) Write protection via IOMMU or device MMU > > 3) Address space ID for isolating DMAs > > Not all systems support any of the above unfortunately. > Yes. But we know the platform (AMD/Intel/ARM) will be ready soon for them in the near future. > Also some systems might have a limited # of PASIDs. > So burning up a extra PASID per VF halving their > number might not be great as the only option. Yes, so I think we agree that we should not limit the spec to work on a specific configuration (e.g the device with PF). > > > > > > Using physical function is sub-optimal that all of the above since: > > > > 1) limited to a specific transport or implementation and it doesn't wor= k for > > device or transport without PF > > 2) the virtio level function is not self contained, this makes any feat= ure > > that ties to PF impossible to be used in the nested layer > > 3) more complicated than leveraging the existing facilities provided by= the > > platform or transport > > I think I disagree with 2 and 3 above simply because controlling VFs thro= ugh > a PF is how all other devices did this. For management and provision yes. For other features, the answer is not. This is simply because most hardware vendors don't consider whether or not a feature could be virtualized. That's fine for them but not us. E.g if we limit the feature A to PF. It means feature A can't be used by guests. My understanding is that we'd better not introduce a feature that is hard to be virtualized. > About 1 - well this is > just about us being smart and writing this in a way that is > generic enough, right? That's exactly my question and my point, I know it can be done in the PF. What I'm asking is "why it must be in the PF". And I'm trying to convince Max to introduce those features as "basic device facilities" instead of doing that in the "admin virtqueue" or other stuff that belongs to PF. > E.g. include options for PASIDs too. > > Note that support for cross-device addressing is useful > even outside of migration. We also have things like > priority where it is useful to adjust properties of > a VF on the fly while it is active. Again the normal way > all devices do this is through a PF. Yes a bunch of tricks > in QEMU is possible but having a driver in host kernel > and just handle it in a contained way is way cleaner. > > > > Consider (P)ASID will be ready very soon, workaround the platform limit= ation > > via PF is not a good idea for me. Especially consider it's not a must a= nd we > > had already prototype the software assisted technology. > > Well PASID is just one technology. Yes, devices are allowed to have their own function to isolate DMA. I mentioned PASID just because it is the most popular technology. > > > > > > > Might as well integrate > > > the rest of state in the same channel. > > > > > > That's another question. I think for the function that is a must for do= ing > > live migration, introducing them in the function itself is the most nat= ural > > way since we did all the other facilities there. This ease the function= that > > can be used in the nested layer. > > > > And using the channel in the PF is not coming for free. It requires > > synchronization in the software or even QOS. > > > > Or we can just separate the dirty page tracking into PF (but need to de= fine > > them as basic facility for future extension). > > Well maybe just start focusing on write tracking, sure. > Once there's a proposal for this we can see whether > adding other state there is easier or harder. Fine with me. > > > > > > > > > > Another answer is that CPUs trivially switch between > > > functions by switching the active page tables. For PCI DMA > > > it is all much trickier sine the page tables can be separate > > > from the device, and assumed to be mostly static. > > > > > > I don't see much different, the page table is also separated from the C= PU. > > If the device supports state save and restore we can scheduling the mul= tiple > > VMs/VCPUs on the same device. > > It's just that performance is terrible. If you keep losing packets > migration might as well not be live. I don't measure the performance. But I believe the shadow virtqueue should perform better than kernel vhost-net backends. If it's not, we can switch to vhost-net if necessary and we know it works well for the live migration. > > > > > > So if you want to create something like the VMCS then > > > again you either need some help from another device or > > > put it in device memory. > > > > > > For CPU virtualization, the states could be saved and restored via MSRs= . For > > virtio, accessing them via registers is also possible and much more sim= ple. > > > > Thanks > > IMy guess is performance is going to be bad. MSRs are part of the > same CPU that is executing the accesses.... I'm not sure but it's how current VMX or SVM did. Thanks > > > > > > > > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 20 Aug 2021 07:06:22 -0400 From: "Michael S. Tsirkin" Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Message-ID: <20210820065144-mutt-send-email-mst@kernel.org> References: <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <5b4db3ac-0291-69ea-7e82-5b3f19049e61@redhat.com> <20210819104216-mutt-send-email-mst@kernel.org> <20210820023540-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit To: Jason Wang Cc: Max Gurtovoy , "virtio-comment@lists.oasis-open.org" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Fri, Aug 20, 2021 at 03:49:55PM +0800, Jason Wang wrote: > On Fri, Aug 20, 2021 at 3:04 PM Michael S. Tsirkin wrote: > > > > On Fri, Aug 20, 2021 at 10:17:05AM +0800, Jason Wang wrote: > > > > > > 在 2021/8/19 下午10:58, Michael S. Tsirkin 写道: > > > > On Thu, Aug 19, 2021 at 10:44:46AM +0800, Jason Wang wrote: > > > > > > The PF device will have an option to quiesce/freeze the VF device. > > > > > > > > > > Is such design a must? If no, why not simply introduce those functions in > > > > > the VF? > > > > Many IOMMUs only support protections at the function level. > > > > Thus we need ability to have one device (e.g. a PF) > > > > to control migration of another (e.g. a VF). > > > > > > > > > So as discussed previously, the only possible "advantage" is that the DMA is > > > isolated. > > > > > > > > > > This is because allowing VF to access hypervisor memory used for > > > > migration is not a good idea. > > > > For IOMMUs that support subfunctions, these "devices" could be > > > > subfunctions. > > > > > > > > The only alternative is to keep things in device memory which > > > > does not need an IOMMU. > > > > I guess we'd end up with something like a VQ in device memory which might > > > > be tricky from multiple points of view, but yes, this could be > > > > useful and people did ask for such a capability in the past. > > > > > > > > > I assume the spec already support this. We probably need some clarification > > > at the transport layer. But it's as simple as setting MMIO are as virtqueue > > > address? > > > > Several issues > > - we do not support changing VQ address. Devices do need to support > > changing memory addresses. > > So it looks like a transport specific requirement (PCI-E) instead of a > general issue. > > > - Ordering becomes tricky especially . > > E.g. when device reads descriptor in VQ > > memory it suddenly does not flush out writes into buffer > > that is potentially in RAM. We might also need even stronger > > barriers on the driver side. We used dma_wmb but now it's > > probably need to be wmb. > > Reading multibyte structures from device memory is slow. > > To get reasonable performance we might need to mark this device memory > > WB or WC. That generally makes things even trickier. > > I agree, but still they are all transport specific requirements. If we > do that in a PCI-E BAR, the driver must obey the ordering rule for PCI > to make it work. > > > > > > > Except for the dirty bit tracking, we don't have bulk data that needs to be > > > transferred during migration. So a virtqueue is not must even in this case. > > > > Main traffic is write tracking. > > Right. > > > > > > > > > > > > > > > > > If yes, what's the reason for making virtio different (e.g VCPU live > > > > > migration is not designed like that)? > > > > I think the main difference is we need PF's help for memory > > > > tracking for pre-copy migration anyway. > > > > > > > > > Such kind of memory tracking is not a must. KVM uses software assisted > > > technologies (write protection) and it works very well. > > > > So page-fault support is absolutely a viable option IMHO. > > To work well we need VIRTIO_F_PARTIAL_ORDER - there was not > > a lot of excitement but sure I will finalize and repost it. > > As discussed before, it looks like a performance optimization but not a must? > > I guess we don't do that for KVM and it works well. Depends on type of device. For networking it's a problem because it is driven by outside events so it keeps going leading to packet drops which is a quality of implementation issue, not an optimization. Same thing with e.g. audio I suspect. Maybe graphics. For KVM and e.g. storage it's more of a performance issue. > > > > > > However we need support for reporting and handling faults. > > Again this is data path stuff and needs to be under > > hypervisor control so I guess we get right back > > to having this in the PF? > > So it depends on whether it requires a DMA. If it's just something > like a CR2 register, we don't need PF. We won't strictly need it but it is a well understood model, working well with e.g. vfio. It makes sense to support it. > > > > > > > > > > > > > For virtio, > > > technology like shadow virtqueue has been used by DPDK and prototyped by > > > Eugenio. > > > > That's ok but I think since it affects performance at 100% of the > > time when active we can not rely on this as the only solution. > > This part I don't understand: > > - KVM writes protect the pages, so it loses performance as well. > - If we are using virtqueue for reporting dirty bitmap, it can easily > run out of space and we will lose the performance as well > - If we are using bitmap/bytemap, we may also losing the performance > (e.g the huge footprint) or at PCI level > > So I'm not against the idea, what I think makes more sense is not > limit the facilities like device states, dirty page tracking to the > PF. It could be a cross-device facility that can support PF but also other forms of communication, yes. > > > > > > > Even if we want to go with hardware technology, we have many alternatives > > > (as we've discussed in the past): > > > > > > 1) IOMMU dirty bit (E.g modern IOMMU have EA bit for logging external device > > > write) > > > 2) Write protection via IOMMU or device MMU > > > 3) Address space ID for isolating DMAs > > > > Not all systems support any of the above unfortunately. > > > > Yes. But we know the platform (AMD/Intel/ARM) will be ready soon for > them in the near future. know and future in the same sentence make an oxymoron ;) > > Also some systems might have a limited # of PASIDs. > > So burning up a extra PASID per VF halving their > > number might not be great as the only option. > > Yes, so I think we agree that we should not limit the spec to work on > a specific configuration (e.g the device with PF). That makes sense to me. > > > > > > > > > > Using physical function is sub-optimal that all of the above since: > > > > > > 1) limited to a specific transport or implementation and it doesn't work for > > > device or transport without PF > > > 2) the virtio level function is not self contained, this makes any feature > > > that ties to PF impossible to be used in the nested layer > > > 3) more complicated than leveraging the existing facilities provided by the > > > platform or transport > > > > I think I disagree with 2 and 3 above simply because controlling VFs through > > a PF is how all other devices did this. > > For management and provision yes. For other features, the answer is > not. This is simply because most hardware vendors don't consider > whether or not a feature could be virtualized. That's fine for them > but not us. E.g if we limit the feature A to PF. It means feature A > can't be used by guests. My understanding is that we'd better not > introduce a feature that is hard to be virtualized. I'm not sure what do you mean when you say management but I guess at least stuff that ip link does normally: [ vf NUM [ mac LLADDR ] [ VFVLAN-LIST ] [ rate TXRATE ] [ max_tx_rate TXRATE ] [ min_tx_rate TXRATE ] [ spoofchk { on | off } ] [ query_rss { on | off } ] [ state { auto | enable | disable } ] [ trust { on | off } ] [ node_guid eui64 ] [ port_guid eui64 ] ] is fair game ... > > About 1 - well this is > > just about us being smart and writing this in a way that is > > generic enough, right? > > That's exactly my question and my point, I know it can be done in the > PF. What I'm asking is "why it must be in the PF". > > And I'm trying to convince Max to introduce those features as "basic > device facilities" instead of doing that in the "admin virtqueue" or > other stuff that belongs to PF. Let's say it's not in a PF, I think it needs some way to be separate so we don't need lots of logic in the hypervisor to handle that. So from that POV admin queue is ok. In fact from my POV admin queue is suffering in that it does not focus on cross device communication enough, not that it's doing that too much. > > E.g. include options for PASIDs too. > > > > Note that support for cross-device addressing is useful > > even outside of migration. We also have things like > > priority where it is useful to adjust properties of > > a VF on the fly while it is active. Again the normal way > > all devices do this is through a PF. Yes a bunch of tricks > > in QEMU is possible but having a driver in host kernel > > and just handle it in a contained way is way cleaner. > > > > > > > Consider (P)ASID will be ready very soon, workaround the platform limitation > > > via PF is not a good idea for me. Especially consider it's not a must and we > > > had already prototype the software assisted technology. > > > > Well PASID is just one technology. > > Yes, devices are allowed to have their own function to isolate DMA. I > mentioned PASID just because it is the most popular technology. > > > > > > > > > > > > Might as well integrate > > > > the rest of state in the same channel. > > > > > > > > > That's another question. I think for the function that is a must for doing > > > live migration, introducing them in the function itself is the most natural > > > way since we did all the other facilities there. This ease the function that > > > can be used in the nested layer. > > > > > > And using the channel in the PF is not coming for free. It requires > > > synchronization in the software or even QOS. > > > > > > Or we can just separate the dirty page tracking into PF (but need to define > > > them as basic facility for future extension). > > > > Well maybe just start focusing on write tracking, sure. > > Once there's a proposal for this we can see whether > > adding other state there is easier or harder. > > Fine with me. > > > > > > > > > > > > > > > > Another answer is that CPUs trivially switch between > > > > functions by switching the active page tables. For PCI DMA > > > > it is all much trickier sine the page tables can be separate > > > > from the device, and assumed to be mostly static. > > > > > > > > > I don't see much different, the page table is also separated from the CPU. > > > If the device supports state save and restore we can scheduling the multiple > > > VMs/VCPUs on the same device. > > > > It's just that performance is terrible. If you keep losing packets > > migration might as well not be live. > > I don't measure the performance. But I believe the shadow virtqueue > should perform better than kernel vhost-net backends. > > If it's not, we can switch to vhost-net if necessary and we know it > works well for the live migration. Well but not as fast as hardware offloads with faults would be, which can potentially go full speed as long as you are lucky and do not hit too many faults. > > > > > > > > > So if you want to create something like the VMCS then > > > > again you either need some help from another device or > > > > put it in device memory. > > > > > > > > > For CPU virtualization, the states could be saved and restored via MSRs. For > > > virtio, accessing them via registers is also possible and much more simple. > > > > > > Thanks > > > > IMy guess is performance is going to be bad. MSRs are part of the > > same CPU that is executing the accesses.... > > I'm not sure but it's how current VMX or SVM did. > > Thanks Yes but again. moving state of the CPU around is faster than pulling it across the PCI-E bus. > > > > > > > > > > > > > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 References: <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <5b4db3ac-0291-69ea-7e82-5b3f19049e61@redhat.com> <20210819104216-mutt-send-email-mst@kernel.org> <20210820023540-mutt-send-email-mst@kernel.org> <20210820065144-mutt-send-email-mst@kernel.org> In-Reply-To: <20210820065144-mutt-send-email-mst@kernel.org> From: Jason Wang Date: Mon, 23 Aug 2021 11:20:53 +0800 Message-ID: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable To: "Michael S. Tsirkin" Cc: Max Gurtovoy , "virtio-comment@lists.oasis-open.org" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Fri, Aug 20, 2021 at 7:06 PM Michael S. Tsirkin wrote: > > On Fri, Aug 20, 2021 at 03:49:55PM +0800, Jason Wang wrote: > > On Fri, Aug 20, 2021 at 3:04 PM Michael S. Tsirkin wro= te: > > > > > > On Fri, Aug 20, 2021 at 10:17:05AM +0800, Jason Wang wrote: > > > > > > > > =E5=9C=A8 2021/8/19 =E4=B8=8B=E5=8D=8810:58, Michael S. Tsirkin =E5= =86=99=E9=81=93: > > > > > On Thu, Aug 19, 2021 at 10:44:46AM +0800, Jason Wang wrote: > > > > > > > The PF device will have an option to quiesce/freeze the VF de= vice. > > > > > > > > > > > > Is such design a must? If no, why not simply introduce those fu= nctions in > > > > > > the VF? > > > > > Many IOMMUs only support protections at the function level. > > > > > Thus we need ability to have one device (e.g. a PF) > > > > > to control migration of another (e.g. a VF). > > > > > > > > > > > > So as discussed previously, the only possible "advantage" is that t= he DMA is > > > > isolated. > > > > > > > > > > > > > This is because allowing VF to access hypervisor memory used for > > > > > migration is not a good idea. > > > > > For IOMMUs that support subfunctions, these "devices" could be > > > > > subfunctions. > > > > > > > > > > The only alternative is to keep things in device memory which > > > > > does not need an IOMMU. > > > > > I guess we'd end up with something like a VQ in device memory whi= ch might > > > > > be tricky from multiple points of view, but yes, this could be > > > > > useful and people did ask for such a capability in the past. > > > > > > > > > > > > I assume the spec already support this. We probably need some clari= fication > > > > at the transport layer. But it's as simple as setting MMIO are as v= irtqueue > > > > address? > > > > > > Several issues > > > - we do not support changing VQ address. Devices do need to support > > > changing memory addresses. > > > > So it looks like a transport specific requirement (PCI-E) instead of a > > general issue. > > > > > - Ordering becomes tricky especially . > > > E.g. when device reads descriptor in VQ > > > memory it suddenly does not flush out writes into buffer > > > that is potentially in RAM. We might also need even stronger > > > barriers on the driver side. We used dma_wmb but now it's > > > probably need to be wmb. > > > Reading multibyte structures from device memory is slow. > > > To get reasonable performance we might need to mark this device mem= ory > > > WB or WC. That generally makes things even trickier. > > > > I agree, but still they are all transport specific requirements. If we > > do that in a PCI-E BAR, the driver must obey the ordering rule for PCI > > to make it work. > > > > > > > > > > Except for the dirty bit tracking, we don't have bulk data that nee= ds to be > > > > transferred during migration. So a virtqueue is not must even in th= is case. > > > > > > Main traffic is write tracking. > > > > Right. > > > > > > > > > > > > > > > > > > > > > > > If yes, what's the reason for making virtio different (e.g VCPU= live > > > > > > migration is not designed like that)? > > > > > I think the main difference is we need PF's help for memory > > > > > tracking for pre-copy migration anyway. > > > > > > > > > > > > Such kind of memory tracking is not a must. KVM uses software assis= ted > > > > technologies (write protection) and it works very well. > > > > > > So page-fault support is absolutely a viable option IMHO. > > > To work well we need VIRTIO_F_PARTIAL_ORDER - there was not > > > a lot of excitement but sure I will finalize and repost it. > > > > As discussed before, it looks like a performance optimization but not a= must? > > > > I guess we don't do that for KVM and it works well. > > Depends on type of device. For networking it's a problem because it is > driven by outside events so it keeps going leading to packet drops which > is a quality of implementation issue, not an optimization. So it looks to me it's a factor of how device page faults perform. E.g we may suffer from packet drops during live migration when KVM is logger dirty pages as well. > Same thing with e.g. audio I suspect. Maybe graphics. I wonder even with this, it may not work for those real time tasks. > For KVM and > e.g. storage it's more of a performance issue. > > > > > > > > > > > However we need support for reporting and handling faults. > > > Again this is data path stuff and needs to be under > > > hypervisor control so I guess we get right back > > > to having this in the PF? > > > > So it depends on whether it requires a DMA. If it's just something > > like a CR2 register, we don't need PF. > > We won't strictly need it but it is a well understood model, > working well with e.g. vfio. It makes sense to support it. > > > > > > > > > > > > > > > > > > > > For virtio, > > > > technology like shadow virtqueue has been used by DPDK and prototyp= ed by > > > > Eugenio. > > > > > > That's ok but I think since it affects performance at 100% of the > > > time when active we can not rely on this as the only solution. > > > > This part I don't understand: > > > > - KVM writes protect the pages, so it loses performance as well. > > - If we are using virtqueue for reporting dirty bitmap, it can easily > > run out of space and we will lose the performance as well > > - If we are using bitmap/bytemap, we may also losing the performance > > (e.g the huge footprint) or at PCI level > > > > So I'm not against the idea, what I think makes more sense is not > > limit the facilities like device states, dirty page tracking to the > > PF. > > It could be a cross-device facility that can support PF but > also other forms of communication, yes. That's my understanding as well. > > > > > > > > > > > > Even if we want to go with hardware technology, we have many altern= atives > > > > (as we've discussed in the past): > > > > > > > > 1) IOMMU dirty bit (E.g modern IOMMU have EA bit for logging extern= al device > > > > write) > > > > 2) Write protection via IOMMU or device MMU > > > > 3) Address space ID for isolating DMAs > > > > > > Not all systems support any of the above unfortunately. > > > > > > > Yes. But we know the platform (AMD/Intel/ARM) will be ready soon for > > them in the near future. > > know and future in the same sentence make an oxymoron ;) > > > > Also some systems might have a limited # of PASIDs. > > > So burning up a extra PASID per VF halving their > > > number might not be great as the only option. > > > > Yes, so I think we agree that we should not limit the spec to work on > > a specific configuration (e.g the device with PF). > > That makes sense to me. > > > > > > > > > > > > > > > Using physical function is sub-optimal that all of the above since: > > > > > > > > 1) limited to a specific transport or implementation and it doesn't= work for > > > > device or transport without PF > > > > 2) the virtio level function is not self contained, this makes any = feature > > > > that ties to PF impossible to be used in the nested layer > > > > 3) more complicated than leveraging the existing facilities provide= d by the > > > > platform or transport > > > > > > I think I disagree with 2 and 3 above simply because controlling VFs = through > > > a PF is how all other devices did this. > > > > For management and provision yes. For other features, the answer is > > not. This is simply because most hardware vendors don't consider > > whether or not a feature could be virtualized. That's fine for them > > but not us. E.g if we limit the feature A to PF. It means feature A > > can't be used by guests. My understanding is that we'd better not > > introduce a feature that is hard to be virtualized. > > I'm not sure what do you mean when you say management but I guess > at least stuff that ip link does normally: > > > [ vf NUM [ mac LLADDR ] > [ VFVLAN-LIST ] > [ rate TXRATE ] > [ max_tx_rate TXRATE ] > [ min_tx_rate TXRATE ] > [ spoofchk { on | off } ] > [ query_rss { on | off } ] > [ state { auto | enable | disable } ] > [ trust { on | off } ] > [ node_guid eui64 ] > [ port_guid eui64 ] ] > > > is fair game ... That's the example of the management tasks: 1) is not expected or exposed for the guest 2) requires capabilities (CAP_NET_ADMIN) for security 3) won't be used by Qemu But live migration seems different 1) it can be exposed for the guest for nested live migration 2) doesn't require capabilities, no security concern 3) will be used by Qemu > > > > About 1 - well this is > > > just about us being smart and writing this in a way that is > > > generic enough, right? > > > > That's exactly my question and my point, I know it can be done in the > > PF. What I'm asking is "why it must be in the PF". > > > > And I'm trying to convince Max to introduce those features as "basic > > device facilities" instead of doing that in the "admin virtqueue" or > > other stuff that belongs to PF. > > Let's say it's not in a PF, I think it needs some way to be separate so > we don't need lots of logic in the hypervisor to handle that. We don't need a lot I think: 1) stop/freeze the device 2) device state set and get > So from that POV admin queue is ok. In fact > from my POV admin queue is suffering in that it does not focus on cross > device communication enough, not that it's doing that too much. Ok. > > > > E.g. include options for PASIDs too. > > > > > > Note that support for cross-device addressing is useful > > > even outside of migration. We also have things like > > > priority where it is useful to adjust properties of > > > a VF on the fly while it is active. Again the normal way > > > all devices do this is through a PF. Yes a bunch of tricks > > > in QEMU is possible but having a driver in host kernel > > > and just handle it in a contained way is way cleaner. > > > > > > > > > > Consider (P)ASID will be ready very soon, workaround the platform l= imitation > > > > via PF is not a good idea for me. Especially consider it's not a mu= st and we > > > > had already prototype the software assisted technology. > > > > > > Well PASID is just one technology. > > > > Yes, devices are allowed to have their own function to isolate DMA. I > > mentioned PASID just because it is the most popular technology. > > > > > > > > > > > > > > > > > Might as well integrate > > > > > the rest of state in the same channel. > > > > > > > > > > > > That's another question. I think for the function that is a must fo= r doing > > > > live migration, introducing them in the function itself is the most= natural > > > > way since we did all the other facilities there. This ease the func= tion that > > > > can be used in the nested layer. > > > > > > > > And using the channel in the PF is not coming for free. It requires > > > > synchronization in the software or even QOS. > > > > > > > > Or we can just separate the dirty page tracking into PF (but need t= o define > > > > them as basic facility for future extension). > > > > > > Well maybe just start focusing on write tracking, sure. > > > Once there's a proposal for this we can see whether > > > adding other state there is easier or harder. > > > > Fine with me. > > > > > > > > > > > > > > > > > > > > > > Another answer is that CPUs trivially switch between > > > > > functions by switching the active page tables. For PCI DMA > > > > > it is all much trickier sine the page tables can be separate > > > > > from the device, and assumed to be mostly static. > > > > > > > > > > > > I don't see much different, the page table is also separated from t= he CPU. > > > > If the device supports state save and restore we can scheduling the= multiple > > > > VMs/VCPUs on the same device. > > > > > > It's just that performance is terrible. If you keep losing packets > > > migration might as well not be live. > > > > I don't measure the performance. But I believe the shadow virtqueue > > should perform better than kernel vhost-net backends. > > > > If it's not, we can switch to vhost-net if necessary and we know it > > works well for the live migration. > > Well but not as fast as hardware offloads with faults would be, > which can potentially go full speed as long as you are lucky > and do not hit too many faults. Yes, but for live migration, I agree that we need better performance, but if we go full speed, that may break the convergence. Anyhow, we can see how well shadow virtqueue performs. > > > > > > > > > > > > > So if you want to create something like the VMCS then > > > > > again you either need some help from another device or > > > > > put it in device memory. > > > > > > > > > > > > For CPU virtualization, the states could be saved and restored via = MSRs. For > > > > virtio, accessing them via registers is also possible and much more= simple. > > > > > > > > Thanks > > > > > > IMy guess is performance is going to be bad. MSRs are part of the > > > same CPU that is executing the accesses.... > > > > I'm not sure but it's how current VMX or SVM did. > > > > Thanks > > Yes but again. moving state of the CPU around is faster than > pulling it across the PCI-E bus. Right. Thanks > > > > > > > > > > > > > > > > > > > > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 23 Aug 2021 13:08:47 +0100 From: "Dr. David Alan Gilbert" Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Message-ID: References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <5b4db3ac-0291-69ea-7e82-5b3f19049e61@redhat.com> <20210819104216-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit To: Jason Wang Cc: "Michael S. Tsirkin" , Max Gurtovoy , "virtio-comment@lists.oasis-open.org" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: * Jason Wang (jasowang@redhat.com) wrote: > > 在 2021/8/19 下午10:58, Michael S. Tsirkin 写道: > > On Thu, Aug 19, 2021 at 10:44:46AM +0800, Jason Wang wrote: > > > > The PF device will have an option to quiesce/freeze the VF device. > > > > > > Is such design a must? If no, why not simply introduce those functions in > > > the VF? > > Many IOMMUs only support protections at the function level. > > Thus we need ability to have one device (e.g. a PF) > > to control migration of another (e.g. a VF). > > > So as discussed previously, the only possible "advantage" is that the DMA is > isolated. > > > > This is because allowing VF to access hypervisor memory used for > > migration is not a good idea. > > For IOMMUs that support subfunctions, these "devices" could be > > subfunctions. > > > > The only alternative is to keep things in device memory which > > does not need an IOMMU. > > I guess we'd end up with something like a VQ in device memory which might > > be tricky from multiple points of view, but yes, this could be > > useful and people did ask for such a capability in the past. > > > I assume the spec already support this. We probably need some clarification > at the transport layer. But it's as simple as setting MMIO are as virtqueue > address? > > Except for the dirty bit tracking, we don't have bulk data that needs to be > transferred during migration. So a virtqueue is not must even in this case. > > > > > > > If yes, what's the reason for making virtio different (e.g VCPU live > > > migration is not designed like that)? > > I think the main difference is we need PF's help for memory > > tracking for pre-copy migration anyway. > > > Such kind of memory tracking is not a must. KVM uses software assisted > technologies (write protection) and it works very well. For virtio, > technology like shadow virtqueue has been used by DPDK and prototyped by > Eugenio. > > Even if we want to go with hardware technology, we have many alternatives > (as we've discussed in the past): > > 1) IOMMU dirty bit (E.g modern IOMMU have EA bit for logging external device > write) > 2) Write protection via IOMMU or device MMU > 3) Address space ID for isolating DMAs What's the state of those - last time I chatted to anyone about IOMMUs doing protection, things were at the 'in the future' stage. Dave > Using physical function is sub-optimal that all of the above since: > > 1) limited to a specific transport or implementation and it doesn't work for > device or transport without PF > 2) the virtio level function is not self contained, this makes any feature > that ties to PF impossible to be used in the nested layer > 3) more complicated than leveraging the existing facilities provided by the > platform or transport > > Consider (P)ASID will be ready very soon, workaround the platform limitation > via PF is not a good idea for me. Especially consider it's not a must and we > had already prototype the software assisted technology. > > > > Might as well integrate > > the rest of state in the same channel. > > > That's another question. I think for the function that is a must for doing > live migration, introducing them in the function itself is the most natural > way since we did all the other facilities there. This ease the function that > can be used in the nested layer. > > And using the channel in the PF is not coming for free. It requires > synchronization in the software or even QOS. > > Or we can just separate the dirty page tracking into PF (but need to define > them as basic facility for future extension). > > > > > > Another answer is that CPUs trivially switch between > > functions by switching the active page tables. For PCI DMA > > it is all much trickier sine the page tables can be separate > > from the device, and assumed to be mostly static. > > > I don't see much different, the page table is also separated from the CPU. > If the device supports state save and restore we can scheduling the multiple > VMs/VCPUs on the same device. > > > > So if you want to create something like the VMCS then > > again you either need some help from another device or > > put it in device memory. > > > For CPU virtualization, the states could be saved and restored via MSRs. For > virtio, accessing them via registers is also possible and much more simple. > > Thanks > > > > > > > > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <5b4db3ac-0291-69ea-7e82-5b3f19049e61@redhat.com> <20210819104216-mutt-send-email-mst@kernel.org> In-Reply-To: From: Jason Wang Date: Tue, 24 Aug 2021 11:00:54 +0800 Message-ID: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable To: "Dr. David Alan Gilbert" Cc: "Michael S. Tsirkin" , Max Gurtovoy , "virtio-comment@lists.oasis-open.org" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Mon, Aug 23, 2021 at 8:08 PM Dr. David Alan Gilbert wrote: > > * Jason Wang (jasowang@redhat.com) wrote: > > > > =E5=9C=A8 2021/8/19 =E4=B8=8B=E5=8D=8810:58, Michael S. Tsirkin =E5=86= =99=E9=81=93: > > > On Thu, Aug 19, 2021 at 10:44:46AM +0800, Jason Wang wrote: > > > > > The PF device will have an option to quiesce/freeze the VF device= . > > > > > > > > Is such design a must? If no, why not simply introduce those functi= ons in > > > > the VF? > > > Many IOMMUs only support protections at the function level. > > > Thus we need ability to have one device (e.g. a PF) > > > to control migration of another (e.g. a VF). > > > > > > So as discussed previously, the only possible "advantage" is that the D= MA is > > isolated. > > > > > > > This is because allowing VF to access hypervisor memory used for > > > migration is not a good idea. > > > For IOMMUs that support subfunctions, these "devices" could be > > > subfunctions. > > > > > > The only alternative is to keep things in device memory which > > > does not need an IOMMU. > > > I guess we'd end up with something like a VQ in device memory which m= ight > > > be tricky from multiple points of view, but yes, this could be > > > useful and people did ask for such a capability in the past. > > > > > > I assume the spec already support this. We probably need some clarifica= tion > > at the transport layer. But it's as simple as setting MMIO are as virtq= ueue > > address? > > > > Except for the dirty bit tracking, we don't have bulk data that needs t= o be > > transferred during migration. So a virtqueue is not must even in this c= ase. > > > > > > > > > > > If yes, what's the reason for making virtio different (e.g VCPU liv= e > > > > migration is not designed like that)? > > > I think the main difference is we need PF's help for memory > > > tracking for pre-copy migration anyway. > > > > > > Such kind of memory tracking is not a must. KVM uses software assisted > > technologies (write protection) and it works very well. For virtio, > > technology like shadow virtqueue has been used by DPDK and prototyped b= y > > Eugenio. > > > > Even if we want to go with hardware technology, we have many alternativ= es > > (as we've discussed in the past): > > > > 1) IOMMU dirty bit (E.g modern IOMMU have EA bit for logging external d= evice > > write) > > 2) Write protection via IOMMU or device MMU > > 3) Address space ID for isolating DMAs > > What's the state of those - last time I chatted to anyone about IOMMUs > doing protection, things were at the 'in the future' stage. For the IOMMU dirty bit, I don't check the hardware but it claims to be supported by the vtd spec several years before. For write protection via IOMMU, I think PRI or ATS has been supported by some devices, especially the PRI allows a vendor specific way for reporting page faults. For device MMU, it has been supported by some vendors. For ASID, PASID requires cpu and platform vendor support, but AFAIK, it should be ready very soon, it could be the end of this year but I'm not sure. Thanks > > Dave > > > Using physical function is sub-optimal that all of the above since: > > > > 1) limited to a specific transport or implementation and it doesn't wor= k for > > device or transport without PF > > 2) the virtio level function is not self contained, this makes any feat= ure > > that ties to PF impossible to be used in the nested layer > > 3) more complicated than leveraging the existing facilities provided by= the > > platform or transport > > > > Consider (P)ASID will be ready very soon, workaround the platform limit= ation > > via PF is not a good idea for me. Especially consider it's not a must a= nd we > > had already prototype the software assisted technology. > > > > > > > Might as well integrate > > > the rest of state in the same channel. > > > > > > That's another question. I think for the function that is a must for do= ing > > live migration, introducing them in the function itself is the most nat= ural > > way since we did all the other facilities there. This ease the function= that > > can be used in the nested layer. > > > > And using the channel in the PF is not coming for free. It requires > > synchronization in the software or even QOS. > > > > Or we can just separate the dirty page tracking into PF (but need to de= fine > > them as basic facility for future extension). > > > > > > > > > > Another answer is that CPUs trivially switch between > > > functions by switching the active page tables. For PCI DMA > > > it is all much trickier sine the page tables can be separate > > > from the device, and assumed to be mostly static. > > > > > > I don't see much different, the page table is also separated from the C= PU. > > If the device supports state save and restore we can scheduling the mul= tiple > > VMs/VCPUs on the same device. > > > > > > > So if you want to create something like the VMCS then > > > again you either need some help from another device or > > > put it in device memory. > > > > > > For CPU virtualization, the states could be saved and restored via MSRs= . For > > virtio, accessing them via registers is also possible and much more sim= ple. > > > > Thanks > > > > > > > > > > > > > > > > This publicly archived list offers a means to provide input to the > > OASIS Virtual I/O Device (VIRTIO) TC. > > > > In order to verify user consent to the Feedback License terms and > > to minimize spam in the list archive, subscription is required > > before posting. > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > > List help: virtio-comment-help@lists.oasis-open.org > > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.p= df > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing= -lists > > Committee: https://www.oasis-open.org/committees/virtio/ > > Join OASIS: https://www.oasis-open.org/join/ > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 19 Aug 2021 12:12:02 +0100 From: "Dr. David Alan Gilbert" Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Message-ID: References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> MIME-Version: 1.0 In-Reply-To: <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit To: Max Gurtovoy Cc: Jason Wang , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: > > On 8/18/2021 1:46 PM, Jason Wang wrote: > > On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy wrote: > > > > > > On 8/17/2021 12:44 PM, Jason Wang wrote: > > > > On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy wrote: > > > > > On 8/17/2021 11:51 AM, Jason Wang wrote: > > > > > > 在 2021/8/12 下午8:08, Max Gurtovoy 写道: > > > > > > > Hi all, > > > > > > > > > > > > > > Live migration is one of the most important features of > > > > > > > virtualization and virtio devices are oftenly found in virtual > > > > > > > environments. > > > > > > > > > > > > > > The migration process is managed by a migration SW that is running on > > > > > > > the hypervisor and the VM is not aware of the process at all. > > > > > > > > > > > > > > Unlike the vDPA case, a real pci Virtual Function state resides in > > > > > > > the HW. > > > > > > > > > > > > > vDPA doesn't prevent you from having HW states. Actually from the view > > > > > > of the VMM(Qemu), it doesn't care whether or not a state is stored in > > > > > > the software or hardware. A well designed VMM should be able to hide > > > > > > the virtio device implementation from the migration layer, that is how > > > > > > Qemu is wrote who doesn't care about whether or not it's a software > > > > > > virtio/vDPA device or not. > > > > > > > > > > > > > > > > > > > In our vision, in order to fulfil the Live migration requirements for > > > > > > > virtual functions, each physical function device must implement > > > > > > > migration operations. Using these operations, it will be able to > > > > > > > master the migration process for the virtual function devices. Each > > > > > > > capable physical function device has a supervisor permissions to > > > > > > > change the virtual function operational states, save/restore its > > > > > > > internal state and start/stop dirty pages tracking. > > > > > > > > > > > > > For "supervisor permissions", is this from the software point of view? > > > > > > Maybe it's better to give an example for this. > > > > > A permission to a PF device for quiesce and freeze a VF device for example. > > > > Note that for safety, VMM (e.g Qemu) is usually running without any privileges. > > > You're mixing layers here. > > > > > > QEMU is not involved here. It's only sending IOCTLs to migration driver. > > > The migration driver will control the migration process of the VF using > > > the PF communication channel. > > So who will be granted the "permission" you mentioned here? > > This is just an expression. > > What is not clear ? > > The PF device will have an option to quiesce/freeze the VF device. > > This is simple. Why are you looking for some sophisticated problems ? I'm trying to follow along here and have not completely; but I think the issue is a security separation one. The VMM (e.g. qemu) that has been given access to one of the VF's is isolated and shouldn't be able to go poking at other devices; so it can't go poking at the PF (it probably doesn't even have the PF device node accessible) - so then the question is who has access to the migration driver and how do you make sure it can only deal with VF's that it's supposed to be able to migrate. Dave > > > > > > > > > > An example of this approach can be seen in the way NVIDIA performs > > > > > > > live migration of a ConnectX NIC function: > > > > > > > > > > > > > > https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci > > > > > > > > > > > > > > > > > > > > > NVIDIAs SNAP technology enables hardware-accelerated software defined > > > > > > > PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for storage > > > > > > > and networking solutions. The host OS/hypervisor uses its standard > > > > > > > drivers that are implemented according to a well-known VIRTIO > > > > > > > specifications. > > > > > > > > > > > > > > In order to implement Live Migration for these virtual function > > > > > > > devices, that use a standard drivers as mentioned, the specification > > > > > > > should define how HW vendor should build their devices and for SW > > > > > > > developers to adjust the drivers. > > > > > > > > > > > > > > This will enable specification compliant vendor agnostic solution. > > > > > > > > > > > > > > This is exactly how we built the migration driver for ConnectX > > > > > > > (internal HW design doc) and I guess that this is the way other > > > > > > > vendors work. > > > > > > > > > > > > > > For that, I would like to know if the approach of “PF that controls > > > > > > > the VF live migration process” is acceptable by the VIRTIO technical > > > > > > > group ? > > > > > > > > > > > > > I'm not sure but I think it's better to start from the general > > > > > > facility for all transports, then develop features for a specific > > > > > > transport. > > > > > a general facility for all transports can be a generic admin queue ? > > > > It could be a virtqueue or a transport specific method (pcie capability). > > > No. You said a general facility for all transports. > > For general facility, I mean the chapter 2 of the spec which is general > > > > " > > 2 Basic Facilities of a Virtio Device > > " > > > It will be in chapter 2. Right after "2.11 Exporting Object" I can add "2.12 > Admin Virtqueues" and this is what I did in the RFC. > > > > Transport specific is not general. > > The transport is in charge of implementing the interface for those facilities. > > Transport specific is not general. > > > > > > > > E.g we can define what needs to be migrated for the virtio-blk first > > > > (the device state). Then we can define the interface to get and set > > > > those states via admin virtqueue. Such decoupling may ease the future > > > > development of the transport specific migration interface. > > > I asked a simple question here. > > > > > > Lets stick to this. > > I answered this question. > > No you didn't answer. > > I asked  if the approach of “PF that controls the VF live migration process” > is acceptable by the VIRTIO technical group ? > > And you take the discussion to your direction instead of answering a Yes/No > question. > > > The virtqueue could be one of the > > approaches. And it's your responsibility to convince the community > > about that approach. Having an example may help people to understand > > your proposal. > > > > > I'm not referring to internal state definitions. > > Without an example, how do we know if it can work well? > > > > > Can you please not change the subject of my initial intent in the email ? > > Did I? Basically, I'm asking how a virtio-blk can be migrated with > > your proposal. > > The virtio-blk PF admin queue will be used to manage the virtio-blk VF > migration. > > This is the whole discussion. I don't want to get into resolution. > > Since you already know the answer as I published 4 RFCs already with all the > flow. > > Lets stick to my question. > > > Thanks > > > > > Thanks. > > > > > > > > > > Thanks > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > -Max. > > > > > > > > > > > > This publicly archived list offers a means to provide input to the > > > > > OASIS Virtual I/O Device (VIRTIO) TC. > > > > > > > > > > In order to verify user consent to the Feedback License terms and > > > > > to minimize spam in the list archive, subscription is required > > > > > before posting. > > > > > > > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > > > > > List help: virtio-comment-help@lists.oasis-open.org > > > > > List archive: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c9c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637648804586291850%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bsdgv6XEcsFCSLo0G00WxKUaSQzj0xh4TLlOR2v4c8Y%3D&reserved=0 > > > > > Feedback License: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c9c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637648804586301810%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P0NLoCAtirxtRJT6%2FhLir%2BHAPgZkFOIaKKLf3wgzRpE%3D&reserved=0 > > > > > List Guidelines: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c9c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637648804586301810%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gOcr8NTEiA0142OTIayO5C%2FnKfaROqSYtCpBYEfyrds%3D&reserved=0 > > > > > Committee: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c9c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637648804586301810%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=EbQ3NmU7YDLLetvoS41PxtADJx1TmWK90INGjZozrkk%3D&reserved=0 > > > > > Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cb8808c97b68a409d091a08d962356c9c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637648804586301810%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xWF9jQStdg9SBspPSs8w5KYcZS08G72tfEKpd9bir2g%3D&reserved=0 > > > > > > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> From: Max Gurtovoy Message-ID: <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> Date: Thu, 19 Aug 2021 17:16:51 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable Content-Language: en-US To: "Dr. David Alan Gilbert" Cc: Jason Wang , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On 8/19/2021 2:12 PM, Dr. David Alan Gilbert wrote: > * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: >> On 8/18/2021 1:46 PM, Jason Wang wrote: >>> On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy wro= te: >>>> On 8/17/2021 12:44 PM, Jason Wang wrote: >>>>> On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy w= rote: >>>>>> On 8/17/2021 11:51 AM, Jason Wang wrote: >>>>>>> =E5=9C=A8 2021/8/12 =E4=B8=8B=E5=8D=888:08, Max Gurtovoy =E5=86=99= =E9=81=93: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> Live migration is one of the most important features of >>>>>>>> virtualization and virtio devices are oftenly found in virtual >>>>>>>> environments. >>>>>>>> >>>>>>>> The migration process is managed by a migration SW that is running= on >>>>>>>> the hypervisor and the VM is not aware of the process at all. >>>>>>>> >>>>>>>> Unlike the vDPA case, a real pci Virtual Function state resides in >>>>>>>> the HW. >>>>>>>> >>>>>>> vDPA doesn't prevent you from having HW states. Actually from the v= iew >>>>>>> of the VMM(Qemu), it doesn't care whether or not a state is stored = in >>>>>>> the software or hardware. A well designed VMM should be able to hid= e >>>>>>> the virtio device implementation from the migration layer, that is = how >>>>>>> Qemu is wrote who doesn't care about whether or not it's a software >>>>>>> virtio/vDPA device or not. >>>>>>> >>>>>>> >>>>>>>> In our vision, in order to fulfil the Live migration requirements = for >>>>>>>> virtual functions, each physical function device must implement >>>>>>>> migration operations. Using these operations, it will be able to >>>>>>>> master the migration process for the virtual function devices. Eac= h >>>>>>>> capable physical function device has a supervisor permissions to >>>>>>>> change the virtual function operational states, save/restore its >>>>>>>> internal state and start/stop dirty pages tracking. >>>>>>>> >>>>>>> For "supervisor permissions", is this from the software point of vi= ew? >>>>>>> Maybe it's better to give an example for this. >>>>>> A permission to a PF device for quiesce and freeze a VF device for e= xample. >>>>> Note that for safety, VMM (e.g Qemu) is usually running without any p= rivileges. >>>> You're mixing layers here. >>>> >>>> QEMU is not involved here. It's only sending IOCTLs to migration drive= r. >>>> The migration driver will control the migration process of the VF usin= g >>>> the PF communication channel. >>> So who will be granted the "permission" you mentioned here? >> This is just an expression. >> >> What is not clear ? >> >> The PF device will have an option to quiesce/freeze the VF device. >> >> This is simple. Why are you looking for some sophisticated problems ? > I'm trying to follow along here and have not completely; but I think the = issue is a > security separation one. > The VMM (e.g. qemu) that has been given access to one of the VF's is > isolated and shouldn't be able to go poking at other devices; so it > can't go poking at the PF (it probably doesn't even have the PF device > node accessible) - so then the question is who has access to the > migration driver and how do you make sure it can only deal with VF's > that it's supposed to be able to migrate. The QEMU/userspace doesn't know or care about the PF connection and=20 internal virtio_vfio_pci driver implementation. You shouldn't change 1 line of code in the VM driver nor in QEMU. QEMU does not have access to the PF. Only the kernel driver that has=20 access to the VF will have access to the PF communication channel.=C2=A0=20 There is no permission problem here. The kernel driver of the VF will do this internally, and make sure that=20 the commands it build will only impact the VF originating them. We already do this in mlx5 NIC migration. The kernel is secured and QEMU=20 interface is the VF. > Dave > >>>>>>>> An example of this approach can be seen in the way NVIDIA performs >>>>>>>> live migration of a ConnectX NIC function: >>>>>>>> >>>>>>>> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci >>>>>>>> >>>>>>>> >>>>>>>> NVIDIAs SNAP technology enables hardware-accelerated software defi= ned >>>>>>>> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for storag= e >>>>>>>> and networking solutions. The host OS/hypervisor uses its standard >>>>>>>> drivers that are implemented according to a well-known VIRTIO >>>>>>>> specifications. >>>>>>>> >>>>>>>> In order to implement Live Migration for these virtual function >>>>>>>> devices, that use a standard drivers as mentioned, the specificati= on >>>>>>>> should define how HW vendor should build their devices and for SW >>>>>>>> developers to adjust the drivers. >>>>>>>> >>>>>>>> This will enable specification compliant vendor agnostic solution. >>>>>>>> >>>>>>>> This is exactly how we built the migration driver for ConnectX >>>>>>>> (internal HW design doc) and I guess that this is the way other >>>>>>>> vendors work. >>>>>>>> >>>>>>>> For that, I would like to know if the approach of =E2=80=9CPF that= controls >>>>>>>> the VF live migration process=E2=80=9D is acceptable by the VIRTIO= technical >>>>>>>> group ? >>>>>>>> >>>>>>> I'm not sure but I think it's better to start from the general >>>>>>> facility for all transports, then develop features for a specific >>>>>>> transport. >>>>>> a general facility for all transports can be a generic admin queue ? >>>>> It could be a virtqueue or a transport specific method (pcie capabili= ty). >>>> No. You said a general facility for all transports. >>> For general facility, I mean the chapter 2 of the spec which is general >>> >>> " >>> 2 Basic Facilities of a Virtio Device >>> " >>> >> It will be in chapter 2. Right after "2.11 Exporting Object" I can add "= 2.12 >> Admin Virtqueues" and this is what I did in the RFC. >> >>>> Transport specific is not general. >>> The transport is in charge of implementing the interface for those faci= lities. >> Transport specific is not general. >> >> >>>>> E.g we can define what needs to be migrated for the virtio-blk first >>>>> (the device state). Then we can define the interface to get and set >>>>> those states via admin virtqueue. Such decoupling may ease the future >>>>> development of the transport specific migration interface. >>>> I asked a simple question here. >>>> >>>> Lets stick to this. >>> I answered this question. >> No you didn't answer. >> >> I asked=C2=A0 if the approach of =E2=80=9CPF that controls the VF live m= igration process=E2=80=9D >> is acceptable by the VIRTIO technical group ? >> >> And you take the discussion to your direction instead of answering a Yes= /No >> question. >> >>> The virtqueue could be one of the >>> approaches. And it's your responsibility to convince the community >>> about that approach. Having an example may help people to understand >>> your proposal. >>> >>>> I'm not referring to internal state definitions. >>> Without an example, how do we know if it can work well? >>> >>>> Can you please not change the subject of my initial intent in the emai= l ? >>> Did I? Basically, I'm asking how a virtio-blk can be migrated with >>> your proposal. >> The virtio-blk PF admin queue will be used to manage the virtio-blk VF >> migration. >> >> This is the whole discussion. I don't want to get into resolution. >> >> Since you already know the answer as I published 4 RFCs already with all= the >> flow. >> >> Lets stick to my question. >> >>> Thanks >>> >>>> Thanks. >>>> >>>> >>>>> Thanks >>>>> >>>>>>> Thanks >>>>>>> >>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> -Max. >>>>>>>> >>>>>> This publicly archived list offers a means to provide input to the >>>>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>>>> >>>>>> In order to verify user consent to the Feedback License terms and >>>>>> to minimize spam in the list archive, subscription is required >>>>>> before posting. >>>>>> >>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>>>> List help: virtio-comment-help@lists.oasis-open.org >>>>>> List archive: https://nam11.safelinks.protection.outlook.com/?url=3D= https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data= =3D04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43= 083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113731308%7CUnknown%7CTWF= pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D= %7C1000&sdata=3Dl%2FVw755FQgN6%2BunOXoNRummuOWky%2Feh0xcITkNemXvE%3D&am= p;reserved=3D0 >>>>>> Feedback License: https://nam11.safelinks.protection.outlook.com/?ur= l=3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&= ;data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e= %7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113731308%7CUnknown%= 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6M= n0%3D%7C1000&sdata=3DtlaAvXlxAXCdN3E%2F%2BH1%2Bb3t2C3jgQUofXTXGmrq5ug8%= 3D&reserved=3D0 >>>>>> List Guidelines: https://nam11.safelinks.protection.outlook.com/?url= =3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&a= mp;data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d9630233= 7e%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113741261%7CUnknow= n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI= 6Mn0%3D%7C1000&sdata=3DgWzhW9NEeeYiE3oJl5ZTzBui1tIqEzYw%2FuV59qiBqzw%3D= &reserved=3D0 >>>>>> Committee: https://nam11.safelinks.protection.outlook.com/?url=3Dhtt= ps%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7= Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43083d15727340= c1b7db39efd9ccc17a%7C0%7C0%7C637649684113930443%7CUnknown%7CTWFpbGZsb3d8eyJ= WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&= sdata=3DccZibMmYiCnFVUQ4QeCexz8OQUsvdeC9dlr1aaRnXhg%3D&reserved=3D0 >>>>>> Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=3Dht= tps%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40n= vidia.com%7C012299d734484885c90308d96302337e%7C43083d15727340c1b7db39efd9cc= c17a%7C0%7C0%7C637649684113930443%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM= DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DO7T8D2= Y6xzCLp%2FceWqiRoriTh0M6IBeMPLpfb7d3zvA%3D&reserved=3D0 >>>>>> >> This publicly archived list offers a means to provide input to the >> OASIS Virtual I/O Device (VIRTIO) TC. >> >> In order to verify user consent to the Feedback License terms and >> to minimize spam in the list archive, subscription is required >> before posting. >> >> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >> List help: virtio-comment-help@lists.oasis-open.org >> List archive: https://nam11.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04= %7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43083d1= 5727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113930443%7CUnknown%7CTWFpbGZs= b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C10= 00&sdata=3DapuVb9a9YN2itMje3TXj1ZtwFWm9%2FPQ%2BVfrL%2BoNHzOc%3D&res= erved=3D0 >> Feedback License: https://nam11.safelinks.protection.outlook.com/?url=3D= https%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&dat= a=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C4= 3083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113930443%7CUnknown%7CTW= FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3= D%7C1000&sdata=3D8Pl4yjzImSQV3YdXl8o7LUOI70I6%2BS78kORP1%2BbOZaU%3D&= ;reserved=3D0 >> List Guidelines: https://nam11.safelinks.protection.outlook.com/?url=3Dh= ttps%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&d= ata=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7= C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113930443%7CUnknown%7C= TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0= %3D%7C1000&sdata=3DIcvHERzgz0h2oUleoQXpPk%2FPpf0CGVpNA74zc%2FpSDRQ%3D&a= mp;reserved=3D0 >> Committee: https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3= A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgu= rtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43083d15727340c1b7= db39efd9ccc17a%7C0%7C0%7C637649684113930443%7CUnknown%7CTWFpbGZsb3d8eyJWIjo= iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdat= a=3DccZibMmYiCnFVUQ4QeCexz8OQUsvdeC9dlr1aaRnXhg%3D&reserved=3D0 >> Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%= 3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidi= a.com%7C012299d734484885c90308d96302337e%7C43083d15727340c1b7db39efd9ccc17a= %7C0%7C0%7C637649684113930443%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL= CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DO7T8D2Y6xz= CLp%2FceWqiRoriTh0M6IBeMPLpfb7d3zvA%3D&reserved=3D0 >> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 19 Aug 2021 15:24:02 +0100 From: "Dr. David Alan Gilbert" Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Message-ID: References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> MIME-Version: 1.0 In-Reply-To: <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit To: Max Gurtovoy Cc: Jason Wang , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: > > On 8/19/2021 2:12 PM, Dr. David Alan Gilbert wrote: > > * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: > > > On 8/18/2021 1:46 PM, Jason Wang wrote: > > > > On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy wrote: > > > > > On 8/17/2021 12:44 PM, Jason Wang wrote: > > > > > > On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy wrote: > > > > > > > On 8/17/2021 11:51 AM, Jason Wang wrote: > > > > > > > > 在 2021/8/12 下午8:08, Max Gurtovoy 写道: > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > Live migration is one of the most important features of > > > > > > > > > virtualization and virtio devices are oftenly found in virtual > > > > > > > > > environments. > > > > > > > > > > > > > > > > > > The migration process is managed by a migration SW that is running on > > > > > > > > > the hypervisor and the VM is not aware of the process at all. > > > > > > > > > > > > > > > > > > Unlike the vDPA case, a real pci Virtual Function state resides in > > > > > > > > > the HW. > > > > > > > > > > > > > > > > > vDPA doesn't prevent you from having HW states. Actually from the view > > > > > > > > of the VMM(Qemu), it doesn't care whether or not a state is stored in > > > > > > > > the software or hardware. A well designed VMM should be able to hide > > > > > > > > the virtio device implementation from the migration layer, that is how > > > > > > > > Qemu is wrote who doesn't care about whether or not it's a software > > > > > > > > virtio/vDPA device or not. > > > > > > > > > > > > > > > > > > > > > > > > > In our vision, in order to fulfil the Live migration requirements for > > > > > > > > > virtual functions, each physical function device must implement > > > > > > > > > migration operations. Using these operations, it will be able to > > > > > > > > > master the migration process for the virtual function devices. Each > > > > > > > > > capable physical function device has a supervisor permissions to > > > > > > > > > change the virtual function operational states, save/restore its > > > > > > > > > internal state and start/stop dirty pages tracking. > > > > > > > > > > > > > > > > > For "supervisor permissions", is this from the software point of view? > > > > > > > > Maybe it's better to give an example for this. > > > > > > > A permission to a PF device for quiesce and freeze a VF device for example. > > > > > > Note that for safety, VMM (e.g Qemu) is usually running without any privileges. > > > > > You're mixing layers here. > > > > > > > > > > QEMU is not involved here. It's only sending IOCTLs to migration driver. > > > > > The migration driver will control the migration process of the VF using > > > > > the PF communication channel. > > > > So who will be granted the "permission" you mentioned here? > > > This is just an expression. > > > > > > What is not clear ? > > > > > > The PF device will have an option to quiesce/freeze the VF device. > > > > > > This is simple. Why are you looking for some sophisticated problems ? > > I'm trying to follow along here and have not completely; but I think the issue is a > > security separation one. > > The VMM (e.g. qemu) that has been given access to one of the VF's is > > isolated and shouldn't be able to go poking at other devices; so it > > can't go poking at the PF (it probably doesn't even have the PF device > > node accessible) - so then the question is who has access to the > > migration driver and how do you make sure it can only deal with VF's > > that it's supposed to be able to migrate. > > The QEMU/userspace doesn't know or care about the PF connection and internal > virtio_vfio_pci driver implementation. OK > You shouldn't change 1 line of code in the VM driver nor in QEMU. Hmm OK. > QEMU does not have access to the PF. Only the kernel driver that has access > to the VF will have access to the PF communication channel.  There is no > permission problem here. > > The kernel driver of the VF will do this internally, and make sure that the > commands it build will only impact the VF originating them. > Now that confuses me; isn't the kernel driver that has access to the VF running inside the guest? If it's inside the guest we can't trust it to do anything about stopping impact to other devices. Dave > We already do this in mlx5 NIC migration. The kernel is secured and QEMU > interface is the VF. > > > Dave > > > > > > > > > > > An example of this approach can be seen in the way NVIDIA performs > > > > > > > > > live migration of a ConnectX NIC function: > > > > > > > > > > > > > > > > > > https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci > > > > > > > > > > > > > > > > > > > > > > > > > > > NVIDIAs SNAP technology enables hardware-accelerated software defined > > > > > > > > > PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for storage > > > > > > > > > and networking solutions. The host OS/hypervisor uses its standard > > > > > > > > > drivers that are implemented according to a well-known VIRTIO > > > > > > > > > specifications. > > > > > > > > > > > > > > > > > > In order to implement Live Migration for these virtual function > > > > > > > > > devices, that use a standard drivers as mentioned, the specification > > > > > > > > > should define how HW vendor should build their devices and for SW > > > > > > > > > developers to adjust the drivers. > > > > > > > > > > > > > > > > > > This will enable specification compliant vendor agnostic solution. > > > > > > > > > > > > > > > > > > This is exactly how we built the migration driver for ConnectX > > > > > > > > > (internal HW design doc) and I guess that this is the way other > > > > > > > > > vendors work. > > > > > > > > > > > > > > > > > > For that, I would like to know if the approach of “PF that controls > > > > > > > > > the VF live migration process” is acceptable by the VIRTIO technical > > > > > > > > > group ? > > > > > > > > > > > > > > > > > I'm not sure but I think it's better to start from the general > > > > > > > > facility for all transports, then develop features for a specific > > > > > > > > transport. > > > > > > > a general facility for all transports can be a generic admin queue ? > > > > > > It could be a virtqueue or a transport specific method (pcie capability). > > > > > No. You said a general facility for all transports. > > > > For general facility, I mean the chapter 2 of the spec which is general > > > > > > > > " > > > > 2 Basic Facilities of a Virtio Device > > > > " > > > > > > > It will be in chapter 2. Right after "2.11 Exporting Object" I can add "2.12 > > > Admin Virtqueues" and this is what I did in the RFC. > > > > > > > > Transport specific is not general. > > > > The transport is in charge of implementing the interface for those facilities. > > > Transport specific is not general. > > > > > > > > > > > > E.g we can define what needs to be migrated for the virtio-blk first > > > > > > (the device state). Then we can define the interface to get and set > > > > > > those states via admin virtqueue. Such decoupling may ease the future > > > > > > development of the transport specific migration interface. > > > > > I asked a simple question here. > > > > > > > > > > Lets stick to this. > > > > I answered this question. > > > No you didn't answer. > > > > > > I asked  if the approach of “PF that controls the VF live migration process” > > > is acceptable by the VIRTIO technical group ? > > > > > > And you take the discussion to your direction instead of answering a Yes/No > > > question. > > > > > > > The virtqueue could be one of the > > > > approaches. And it's your responsibility to convince the community > > > > about that approach. Having an example may help people to understand > > > > your proposal. > > > > > > > > > I'm not referring to internal state definitions. > > > > Without an example, how do we know if it can work well? > > > > > > > > > Can you please not change the subject of my initial intent in the email ? > > > > Did I? Basically, I'm asking how a virtio-blk can be migrated with > > > > your proposal. > > > The virtio-blk PF admin queue will be used to manage the virtio-blk VF > > > migration. > > > > > > This is the whole discussion. I don't want to get into resolution. > > > > > > Since you already know the answer as I published 4 RFCs already with all the > > > flow. > > > > > > Lets stick to my question. > > > > > > > Thanks > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > -Max. > > > > > > > > > > > > > > > > This publicly archived list offers a means to provide input to the > > > > > > > OASIS Virtual I/O Device (VIRTIO) TC. > > > > > > > > > > > > > > In order to verify user consent to the Feedback License terms and > > > > > > > to minimize spam in the list archive, subscription is required > > > > > > > before posting. > > > > > > > > > > > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > > > > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > > > > > > > List help: virtio-comment-help@lists.oasis-open.org > > > > > > > List archive: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113731308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=l%2FVw755FQgN6%2BunOXoNRummuOWky%2Feh0xcITkNemXvE%3D&reserved=0 > > > > > > > Feedback License: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113731308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=tlaAvXlxAXCdN3E%2F%2BH1%2Bb3t2C3jgQUofXTXGmrq5ug8%3D&reserved=0 > > > > > > > List Guidelines: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113741261%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gWzhW9NEeeYiE3oJl5ZTzBui1tIqEzYw%2FuV59qiBqzw%3D&reserved=0 > > > > > > > Committee: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113930443%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ccZibMmYiCnFVUQ4QeCexz8OQUsvdeC9dlr1aaRnXhg%3D&reserved=0 > > > > > > > Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113930443%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=O7T8D2Y6xzCLp%2FceWqiRoriTh0M6IBeMPLpfb7d3zvA%3D&reserved=0 > > > > > > > > > > This publicly archived list offers a means to provide input to the > > > OASIS Virtual I/O Device (VIRTIO) TC. > > > > > > In order to verify user consent to the Feedback License terms and > > > to minimize spam in the list archive, subscription is required > > > before posting. > > > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > > > List help: virtio-comment-help@lists.oasis-open.org > > > List archive: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113930443%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=apuVb9a9YN2itMje3TXj1ZtwFWm9%2FPQ%2BVfrL%2BoNHzOc%3D&reserved=0 > > > Feedback License: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113930443%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8Pl4yjzImSQV3YdXl8o7LUOI70I6%2BS78kORP1%2BbOZaU%3D&reserved=0 > > > List Guidelines: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113930443%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IcvHERzgz0h2oUleoQXpPk%2FPpf0CGVpNA74zc%2FpSDRQ%3D&reserved=0 > > > Committee: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113930443%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ccZibMmYiCnFVUQ4QeCexz8OQUsvdeC9dlr1aaRnXhg%3D&reserved=0 > > > Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C012299d734484885c90308d96302337e%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649684113930443%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=O7T8D2Y6xzCLp%2FceWqiRoriTh0M6IBeMPLpfb7d3zvA%3D&reserved=0 > > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> From: Max Gurtovoy Message-ID: <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> Date: Thu, 19 Aug 2021 18:20:45 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable Content-Language: en-US To: "Dr. David Alan Gilbert" Cc: Jason Wang , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On 8/19/2021 5:24 PM, Dr. David Alan Gilbert wrote: > * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: >> On 8/19/2021 2:12 PM, Dr. David Alan Gilbert wrote: >>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: >>>> On 8/18/2021 1:46 PM, Jason Wang wrote: >>>>> On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy w= rote: >>>>>> On 8/17/2021 12:44 PM, Jason Wang wrote: >>>>>>> On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy = wrote: >>>>>>>> On 8/17/2021 11:51 AM, Jason Wang wrote: >>>>>>>>> =E5=9C=A8 2021/8/12 =E4=B8=8B=E5=8D=888:08, Max Gurtovoy =E5=86= =99=E9=81=93: >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> Live migration is one of the most important features of >>>>>>>>>> virtualization and virtio devices are oftenly found in virtual >>>>>>>>>> environments. >>>>>>>>>> >>>>>>>>>> The migration process is managed by a migration SW that is runni= ng on >>>>>>>>>> the hypervisor and the VM is not aware of the process at all. >>>>>>>>>> >>>>>>>>>> Unlike the vDPA case, a real pci Virtual Function state resides = in >>>>>>>>>> the HW. >>>>>>>>>> >>>>>>>>> vDPA doesn't prevent you from having HW states. Actually from the= view >>>>>>>>> of the VMM(Qemu), it doesn't care whether or not a state is store= d in >>>>>>>>> the software or hardware. A well designed VMM should be able to h= ide >>>>>>>>> the virtio device implementation from the migration layer, that i= s how >>>>>>>>> Qemu is wrote who doesn't care about whether or not it's a softwa= re >>>>>>>>> virtio/vDPA device or not. >>>>>>>>> >>>>>>>>> >>>>>>>>>> In our vision, in order to fulfil the Live migration requirement= s for >>>>>>>>>> virtual functions, each physical function device must implement >>>>>>>>>> migration operations. Using these operations, it will be able to >>>>>>>>>> master the migration process for the virtual function devices. E= ach >>>>>>>>>> capable physical function device has a supervisor permissions to >>>>>>>>>> change the virtual function operational states, save/restore its >>>>>>>>>> internal state and start/stop dirty pages tracking. >>>>>>>>>> >>>>>>>>> For "supervisor permissions", is this from the software point of = view? >>>>>>>>> Maybe it's better to give an example for this. >>>>>>>> A permission to a PF device for quiesce and freeze a VF device for= example. >>>>>>> Note that for safety, VMM (e.g Qemu) is usually running without any= privileges. >>>>>> You're mixing layers here. >>>>>> >>>>>> QEMU is not involved here. It's only sending IOCTLs to migration dri= ver. >>>>>> The migration driver will control the migration process of the VF us= ing >>>>>> the PF communication channel. >>>>> So who will be granted the "permission" you mentioned here? >>>> This is just an expression. >>>> >>>> What is not clear ? >>>> >>>> The PF device will have an option to quiesce/freeze the VF device. >>>> >>>> This is simple. Why are you looking for some sophisticated problems ? >>> I'm trying to follow along here and have not completely; but I think th= e issue is a >>> security separation one. >>> The VMM (e.g. qemu) that has been given access to one of the VF's is >>> isolated and shouldn't be able to go poking at other devices; so it >>> can't go poking at the PF (it probably doesn't even have the PF device >>> node accessible) - so then the question is who has access to the >>> migration driver and how do you make sure it can only deal with VF's >>> that it's supposed to be able to migrate. >> The QEMU/userspace doesn't know or care about the PF connection and inte= rnal >> virtio_vfio_pci driver implementation. > OK > >> You shouldn't change 1 line of code in the VM driver nor in QEMU. > Hmm OK. > >> QEMU does not have access to the PF. Only the kernel driver that has acc= ess >> to the VF will have access to the PF communication channel.=C2=A0 There = is no >> permission problem here. >> >> The kernel driver of the VF will do this internally, and make sure that = the >> commands it build will only impact the VF originating them. >> > Now that confuses me; isn't the kernel driver that has access to the VF > running inside the guest? If it's inside the guest we can't trust it to > do anything about stopping impact to other devices. No. The driver is in the hypervisor (virtio_vfio_pci). This is the=20 migration driver, right ? The guest is running as usual. It doesn't aware on the migration at all. This is the point I try to make here. I don't (and I can't) change even=20 1 line of code in the guest. e.g: QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor=20 (bounded to VF5) --> send admin command on PF adminq to start tracking=20 dirty pages for VF5 --> PF device will do it QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor=20 (bounded to VF5) --> send admin command on PF adminq to quiesce VF5 -->=20 PF device will do it You can take a look how we implement mlx5_vfio_pci in the link I provided. > > Dave > > >> We already do this in mlx5 NIC migration. The kernel is secured and QEMU >> interface is the VF. >> >>> Dave >>> >>>>>>>>>> An example of this approach can be seen in the way NVIDIA perfor= ms >>>>>>>>>> live migration of a ConnectX NIC function: >>>>>>>>>> >>>>>>>>>> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> NVIDIAs SNAP technology enables hardware-accelerated software de= fined >>>>>>>>>> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for stor= age >>>>>>>>>> and networking solutions. The host OS/hypervisor uses its standa= rd >>>>>>>>>> drivers that are implemented according to a well-known VIRTIO >>>>>>>>>> specifications. >>>>>>>>>> >>>>>>>>>> In order to implement Live Migration for these virtual function >>>>>>>>>> devices, that use a standard drivers as mentioned, the specifica= tion >>>>>>>>>> should define how HW vendor should build their devices and for S= W >>>>>>>>>> developers to adjust the drivers. >>>>>>>>>> >>>>>>>>>> This will enable specification compliant vendor agnostic solutio= n. >>>>>>>>>> >>>>>>>>>> This is exactly how we built the migration driver for ConnectX >>>>>>>>>> (internal HW design doc) and I guess that this is the way other >>>>>>>>>> vendors work. >>>>>>>>>> >>>>>>>>>> For that, I would like to know if the approach of =E2=80=9CPF th= at controls >>>>>>>>>> the VF live migration process=E2=80=9D is acceptable by the VIRT= IO technical >>>>>>>>>> group ? >>>>>>>>>> >>>>>>>>> I'm not sure but I think it's better to start from the general >>>>>>>>> facility for all transports, then develop features for a specific >>>>>>>>> transport. >>>>>>>> a general facility for all transports can be a generic admin queue= ? >>>>>>> It could be a virtqueue or a transport specific method (pcie capabi= lity). >>>>>> No. You said a general facility for all transports. >>>>> For general facility, I mean the chapter 2 of the spec which is gener= al >>>>> >>>>> " >>>>> 2 Basic Facilities of a Virtio Device >>>>> " >>>>> >>>> It will be in chapter 2. Right after "2.11 Exporting Object" I can add= "2.12 >>>> Admin Virtqueues" and this is what I did in the RFC. >>>> >>>>>> Transport specific is not general. >>>>> The transport is in charge of implementing the interface for those fa= cilities. >>>> Transport specific is not general. >>>> >>>> >>>>>>> E.g we can define what needs to be migrated for the virtio-blk firs= t >>>>>>> (the device state). Then we can define the interface to get and set >>>>>>> those states via admin virtqueue. Such decoupling may ease the futu= re >>>>>>> development of the transport specific migration interface. >>>>>> I asked a simple question here. >>>>>> >>>>>> Lets stick to this. >>>>> I answered this question. >>>> No you didn't answer. >>>> >>>> I asked=C2=A0 if the approach of =E2=80=9CPF that controls the VF live= migration process=E2=80=9D >>>> is acceptable by the VIRTIO technical group ? >>>> >>>> And you take the discussion to your direction instead of answering a Y= es/No >>>> question. >>>> >>>>> The virtqueue could be one of the >>>>> approaches. And it's your responsibility to convince the community >>>>> about that approach. Having an example may help people to understand >>>>> your proposal. >>>>> >>>>>> I'm not referring to internal state definitions. >>>>> Without an example, how do we know if it can work well? >>>>> >>>>>> Can you please not change the subject of my initial intent in the em= ail ? >>>>> Did I? Basically, I'm asking how a virtio-blk can be migrated with >>>>> your proposal. >>>> The virtio-blk PF admin queue will be used to manage the virtio-blk VF >>>> migration. >>>> >>>> This is the whole discussion. I don't want to get into resolution. >>>> >>>> Since you already know the answer as I published 4 RFCs already with a= ll the >>>> flow. >>>> >>>> Lets stick to my question. >>>> >>>>> Thanks >>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>>>> Thanks >>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> -Max. >>>>>>>>>> >>>>>>>> This publicly archived list offers a means to provide input to the >>>>>>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>>>>>> >>>>>>>> In order to verify user consent to the Feedback License terms and >>>>>>>> to minimize spam in the list archive, subscription is required >>>>>>>> before posting. >>>>>>>> >>>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>>>>>> List help: virtio-comment-help@lists.oasis-open.org >>>>>>>> List archive: https://nam11.safelinks.protection.outlook.com/?url= =3Dhttps%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&da= ta=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C= 43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CT= WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%= 3D%7C1000&sdata=3DgVamhvYG3lbMVVyMz%2F%2Fq3VBMZKY47pqxvRi94Mp%2B%2B7I%3= D&reserved=3D0 >>>>>>>> Feedback License: https://nam11.safelinks.protection.outlook.com/?= url=3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&a= mp;data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d02= 20%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknow= n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI= 6Mn0%3D%7C1000&sdata=3DU%2FxmgTCEaUTSgG%2BLohnAAuNXLncUOKU8yBxhkEMpmQk%= 3D&reserved=3D0 >>>>>>>> List Guidelines: https://nam11.safelinks.protection.outlook.com/?u= rl=3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists= &data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d= 0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnkn= own%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXV= CI6Mn0%3D%7C1000&sdata=3DnXzbdkD4B4TzAFXbD%2B4Jap8rWmzX2CZ8fVnEE2f4Tdc%= 3D&reserved=3D0 >>>>>>>> Committee: https://nam11.safelinks.protection.outlook.com/?url=3Dh= ttps%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01= %7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d157273= 40c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8e= yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&am= p;sdata=3D8Issa0O7V4p6MnuuJOcLDN4MAG77cSMSJ7MSZqvXol4%3D&reserved=3D0 >>>>>>>> Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=3D= https%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%4= 0nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9= ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA= wMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DpPqc= ruawglqgMjakkslrpSVZzaOu%2FCvfkTSuUfMiEh0%3D&reserved=3D0 >>>>>>>> >>>> This publicly archived list offers a means to provide input to the >>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>> >>>> In order to verify user consent to the Feedback License terms and >>>> to minimize spam in the list archive, subscription is required >>>> before posting. >>>> >>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>> List help: virtio-comment-help@lists.oasis-open.org >>>> List archive: https://nam11.safelinks.protection.outlook.com/?url=3Dht= tps%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D= 04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083= d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbG= Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C= 1000&sdata=3DgVamhvYG3lbMVVyMz%2F%2Fq3VBMZKY47pqxvRi94Mp%2B%2B7I%3D&= ;reserved=3D0 >>>> Feedback License: https://nam11.safelinks.protection.outlook.com/?url= =3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&= data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%= 7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7= CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn= 0%3D%7C1000&sdata=3DU%2FxmgTCEaUTSgG%2BLohnAAuNXLncUOKU8yBxhkEMpmQk%3D&= amp;reserved=3D0 >>>> List Guidelines: https://nam11.safelinks.protection.outlook.com/?url= =3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&a= mp;data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d02= 20%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknow= n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI= 6Mn0%3D%7C1000&sdata=3DnXzbdkD4B4TzAFXbD%2B4Jap8rWmzX2CZ8fVnEE2f4Tdc%3D= &reserved=3D0 >>>> Committee: https://nam11.safelinks.protection.outlook.com/?url=3Dhttps= %3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cm= gurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1= b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWI= joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sd= ata=3D8Issa0O7V4p6MnuuJOcLDN4MAG77cSMSJ7MSZqvXol4%3D&reserved=3D0 >>>> Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvi= dia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc1= 7a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDA= iLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DpPqcruaw= glqgMjakkslrpSVZzaOu%2FCvfkTSuUfMiEh0%3D&reserved=3D0 >>>> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> From: Jason Wang Message-ID: <0d06c26e-f1e7-3cac-a017-059e8985bb44@redhat.com> Date: Fri, 20 Aug 2021 10:24:40 +0800 MIME-Version: 1.0 In-Reply-To: <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-US To: Max Gurtovoy , "Dr. David Alan Gilbert" Cc: "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: 在 2021/8/19 下午11:20, Max Gurtovoy 写道: > > On 8/19/2021 5:24 PM, Dr. David Alan Gilbert wrote: >> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: >>> On 8/19/2021 2:12 PM, Dr. David Alan Gilbert wrote: >>>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: >>>>> On 8/18/2021 1:46 PM, Jason Wang wrote: >>>>>> On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy >>>>>> wrote: >>>>>>> On 8/17/2021 12:44 PM, Jason Wang wrote: >>>>>>>> On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy >>>>>>>> wrote: >>>>>>>>> On 8/17/2021 11:51 AM, Jason Wang wrote: >>>>>>>>>> 在 2021/8/12 下午8:08, Max Gurtovoy 写道: >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> Live migration is one of the most important features of >>>>>>>>>>> virtualization and virtio devices are oftenly found in virtual >>>>>>>>>>> environments. >>>>>>>>>>> >>>>>>>>>>> The migration process is managed by a migration SW that is >>>>>>>>>>> running on >>>>>>>>>>> the hypervisor and the VM is not aware of the process at all. >>>>>>>>>>> >>>>>>>>>>> Unlike the vDPA case, a real pci Virtual Function state >>>>>>>>>>> resides in >>>>>>>>>>> the HW. >>>>>>>>>>> >>>>>>>>>> vDPA doesn't prevent you from having HW states. Actually from >>>>>>>>>> the view >>>>>>>>>> of the VMM(Qemu), it doesn't care whether or not a state is >>>>>>>>>> stored in >>>>>>>>>> the software or hardware. A well designed VMM should be able >>>>>>>>>> to hide >>>>>>>>>> the virtio device implementation from the migration layer, >>>>>>>>>> that is how >>>>>>>>>> Qemu is wrote who doesn't care about whether or not it's a >>>>>>>>>> software >>>>>>>>>> virtio/vDPA device or not. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> In our vision, in order to fulfil the Live migration >>>>>>>>>>> requirements for >>>>>>>>>>> virtual functions, each physical function device must implement >>>>>>>>>>> migration operations. Using these operations, it will be >>>>>>>>>>> able to >>>>>>>>>>> master the migration process for the virtual function >>>>>>>>>>> devices. Each >>>>>>>>>>> capable physical function device has a supervisor >>>>>>>>>>> permissions to >>>>>>>>>>> change the virtual function operational states, save/restore >>>>>>>>>>> its >>>>>>>>>>> internal state and start/stop dirty pages tracking. >>>>>>>>>>> >>>>>>>>>> For "supervisor permissions", is this from the software point >>>>>>>>>> of view? >>>>>>>>>> Maybe it's better to give an example for this. >>>>>>>>> A permission to a PF device for quiesce and freeze a VF device >>>>>>>>> for example. >>>>>>>> Note that for safety, VMM (e.g Qemu) is usually running without >>>>>>>> any privileges. >>>>>>> You're mixing layers here. >>>>>>> >>>>>>> QEMU is not involved here. It's only sending IOCTLs to migration >>>>>>> driver. >>>>>>> The migration driver will control the migration process of the >>>>>>> VF using >>>>>>> the PF communication channel. >>>>>> So who will be granted the "permission" you mentioned here? >>>>> This is just an expression. >>>>> >>>>> What is not clear ? >>>>> >>>>> The PF device will have an option to quiesce/freeze the VF device. >>>>> >>>>> This is simple. Why are you looking for some sophisticated problems ? >>>> I'm trying to follow along here and have not completely; but I >>>> think the issue is a >>>> security separation one. >>>> The VMM (e.g. qemu) that has been given access to one of the VF's is >>>> isolated and shouldn't be able to go poking at other devices; so it >>>> can't go poking at the PF (it probably doesn't even have the PF device >>>> node accessible) - so then the question is who has access to the >>>> migration driver and how do you make sure it can only deal with VF's >>>> that it's supposed to be able to migrate. >>> The QEMU/userspace doesn't know or care about the PF connection and >>> internal >>> virtio_vfio_pci driver implementation. >> OK >> >>> You shouldn't change 1 line of code in the VM driver nor in QEMU. >> Hmm OK. >> >>> QEMU does not have access to the PF. Only the kernel driver that has >>> access >>> to the VF will have access to the PF communication channel. There is no >>> permission problem here. >>> >>> The kernel driver of the VF will do this internally, and make sure >>> that the >>> commands it build will only impact the VF originating them. >>> >> Now that confuses me; isn't the kernel driver that has access to the VF >> running inside the guest?  If it's inside the guest we can't trust it to >> do anything about stopping impact to other devices. > > No. The driver is in the hypervisor (virtio_vfio_pci). This is the > migration driver, right ? Well, talking things like virtio_vfio_pci that is not mentioned before and not justified on the list may easily confuse people. As pointed out in another thread, it has too many disadvantages over the existing virtio-pci vdpa driver. And it just duplicates a partial function of what virtio-pci vdpa driver can do. I don't think we will go that way. Thanks > > The guest is running as usual. It doesn't aware on the migration at all. > > This is the point I try to make here. I don't (and I can't) change > even 1 line of code in the guest. > > e.g: > > QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor > (bounded to VF5) --> send admin command on PF adminq to start tracking > dirty pages for VF5 --> PF device will do it > > QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor > (bounded to VF5) --> send admin command on PF adminq to quiesce VF5 > --> PF device will do it > > You can take a look how we implement mlx5_vfio_pci in the link I > provided. > >> >> Dave >> >> >>> We already do this in mlx5 NIC migration. The kernel is secured and >>> QEMU >>> interface is the VF. >>> >>>> Dave >>>> >>>>>>>>>>> An example of this approach can be seen in the way NVIDIA >>>>>>>>>>> performs >>>>>>>>>>> live migration of a ConnectX NIC function: >>>>>>>>>>> >>>>>>>>>>> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> NVIDIAs SNAP technology enables hardware-accelerated >>>>>>>>>>> software defined >>>>>>>>>>> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for >>>>>>>>>>> storage >>>>>>>>>>> and networking solutions. The host OS/hypervisor uses its >>>>>>>>>>> standard >>>>>>>>>>> drivers that are implemented according to a well-known VIRTIO >>>>>>>>>>> specifications. >>>>>>>>>>> >>>>>>>>>>> In order to implement Live Migration for these virtual function >>>>>>>>>>> devices, that use a standard drivers as mentioned, the >>>>>>>>>>> specification >>>>>>>>>>> should define how HW vendor should build their devices and >>>>>>>>>>> for SW >>>>>>>>>>> developers to adjust the drivers. >>>>>>>>>>> >>>>>>>>>>> This will enable specification compliant vendor agnostic >>>>>>>>>>> solution. >>>>>>>>>>> >>>>>>>>>>> This is exactly how we built the migration driver for ConnectX >>>>>>>>>>> (internal HW design doc) and I guess that this is the way other >>>>>>>>>>> vendors work. >>>>>>>>>>> >>>>>>>>>>> For that, I would like to know if the approach of “PF that >>>>>>>>>>> controls >>>>>>>>>>> the VF live migration process” is acceptable by the VIRTIO >>>>>>>>>>> technical >>>>>>>>>>> group ? >>>>>>>>>>> >>>>>>>>>> I'm not sure but I think it's better to start from the general >>>>>>>>>> facility for all transports, then develop features for a >>>>>>>>>> specific >>>>>>>>>> transport. >>>>>>>>> a general facility for all transports can be a generic admin >>>>>>>>> queue ? >>>>>>>> It could be a virtqueue or a transport specific method (pcie >>>>>>>> capability). >>>>>>> No. You said a general facility for all transports. >>>>>> For general facility, I mean the chapter 2 of the spec which is >>>>>> general >>>>>> >>>>>> " >>>>>> 2 Basic Facilities of a Virtio Device >>>>>> " >>>>>> >>>>> It will be in chapter 2. Right after "2.11 Exporting Object" I can >>>>> add "2.12 >>>>> Admin Virtqueues" and this is what I did in the RFC. >>>>> >>>>>>> Transport specific is not general. >>>>>> The transport is in charge of implementing the interface for >>>>>> those facilities. >>>>> Transport specific is not general. >>>>> >>>>> >>>>>>>> E.g we can define what needs to be migrated for the virtio-blk >>>>>>>> first >>>>>>>> (the device state). Then we can define the interface to get and >>>>>>>> set >>>>>>>> those states via admin virtqueue. Such decoupling may ease the >>>>>>>> future >>>>>>>> development of the transport specific migration interface. >>>>>>> I asked a simple question here. >>>>>>> >>>>>>> Lets stick to this. >>>>>> I answered this question. >>>>> No you didn't answer. >>>>> >>>>> I asked  if the approach of “PF that controls the VF live >>>>> migration process” >>>>> is acceptable by the VIRTIO technical group ? >>>>> >>>>> And you take the discussion to your direction instead of answering >>>>> a Yes/No >>>>> question. >>>>> >>>>>>      The virtqueue could be one of the >>>>>> approaches. And it's your responsibility to convince the community >>>>>> about that approach. Having an example may help people to understand >>>>>> your proposal. >>>>>> >>>>>>> I'm not referring to internal state definitions. >>>>>> Without an example, how do we know if it can work well? >>>>>> >>>>>>> Can you please not change the subject of my initial intent in >>>>>>> the email ? >>>>>> Did I? Basically, I'm asking how a virtio-blk can be migrated with >>>>>> your proposal. >>>>> The virtio-blk PF admin queue will be used to manage the >>>>> virtio-blk VF >>>>> migration. >>>>> >>>>> This is the whole discussion. I don't want to get into resolution. >>>>> >>>>> Since you already know the answer as I published 4 RFCs already >>>>> with all the >>>>> flow. >>>>> >>>>> Lets stick to my question. >>>>> >>>>>> Thanks >>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> >>>>>>>>>>> -Max. >>>>>>>>>>> >>>>>>>>> This publicly archived list offers a means to provide input to >>>>>>>>> the >>>>>>>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>>>>>>> >>>>>>>>> In order to verify user consent to the Feedback License terms and >>>>>>>>> to minimize spam in the list archive, subscription is required >>>>>>>>> before posting. >>>>>>>>> >>>>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>>>>>>> List help: virtio-comment-help@lists.oasis-open.org >>>>>>>>> List archive: >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gVamhvYG3lbMVVyMz%2F%2Fq3VBMZKY47pqxvRi94Mp%2B%2B7I%3D&reserved=0 >>>>>>>>> Feedback License: >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=U%2FxmgTCEaUTSgG%2BLohnAAuNXLncUOKU8yBxhkEMpmQk%3D&reserved=0 >>>>>>>>> List Guidelines: >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nXzbdkD4B4TzAFXbD%2B4Jap8rWmzX2CZ8fVnEE2f4Tdc%3D&reserved=0 >>>>>>>>> Committee: >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8Issa0O7V4p6MnuuJOcLDN4MAG77cSMSJ7MSZqvXol4%3D&reserved=0 >>>>>>>>> Join OASIS: >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pPqcruawglqgMjakkslrpSVZzaOu%2FCvfkTSuUfMiEh0%3D&reserved=0 >>>>>>>>> >>>>> This publicly archived list offers a means to provide input to the >>>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>>> >>>>> In order to verify user consent to the Feedback License terms and >>>>> to minimize spam in the list archive, subscription is required >>>>> before posting. >>>>> >>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>>> List help: virtio-comment-help@lists.oasis-open.org >>>>> List archive: >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gVamhvYG3lbMVVyMz%2F%2Fq3VBMZKY47pqxvRi94Mp%2B%2B7I%3D&reserved=0 >>>>> Feedback License: >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=U%2FxmgTCEaUTSgG%2BLohnAAuNXLncUOKU8yBxhkEMpmQk%3D&reserved=0 >>>>> List Guidelines: >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nXzbdkD4B4TzAFXbD%2B4Jap8rWmzX2CZ8fVnEE2f4Tdc%3D&reserved=0 >>>>> Committee: >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8Issa0O7V4p6MnuuJOcLDN4MAG77cSMSJ7MSZqvXol4%3D&reserved=0 >>>>> Join OASIS: >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pPqcruawglqgMjakkslrpSVZzaOu%2FCvfkTSuUfMiEh0%3D&reserved=0 >>>>> > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: > https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis-open.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id 6899E986497 for ; Fri, 20 Aug 2021 10:26:40 +0000 (UTC) References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> <0d06c26e-f1e7-3cac-a017-059e8985bb44@redhat.com> From: Max Gurtovoy Message-ID: <74151019-6f78-2bff-5b0a-b5a4da814787@nvidia.com> Date: Fri, 20 Aug 2021 13:26:28 +0300 MIME-Version: 1.0 In-Reply-To: <0d06c26e-f1e7-3cac-a017-059e8985bb44@redhat.com> Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable To: Jason Wang , "Dr. David Alan Gilbert" Cc: "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On 8/20/2021 5:24 AM, Jason Wang wrote: > > =E5=9C=A8 2021/8/19 =E4=B8=8B=E5=8D=8811:20, Max Gurtovoy =E5=86=99=E9=81= =93: >> >> On 8/19/2021 5:24 PM, Dr. David Alan Gilbert wrote: >>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: >>>> On 8/19/2021 2:12 PM, Dr. David Alan Gilbert wrote: >>>>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: >>>>>> On 8/18/2021 1:46 PM, Jason Wang wrote: >>>>>>> On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy=20 >>>>>>> wrote: >>>>>>>> On 8/17/2021 12:44 PM, Jason Wang wrote: >>>>>>>>> On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy=20 >>>>>>>>> wrote: >>>>>>>>>> On 8/17/2021 11:51 AM, Jason Wang wrote: >>>>>>>>>>> =E5=9C=A8 2021/8/12 =E4=B8=8B=E5=8D=888:08, Max Gurtovoy =E5=86= =99=E9=81=93: >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> Live migration is one of the most important features of >>>>>>>>>>>> virtualization and virtio devices are oftenly found in virtual >>>>>>>>>>>> environments. >>>>>>>>>>>> >>>>>>>>>>>> The migration process is managed by a migration SW that is=20 >>>>>>>>>>>> running on >>>>>>>>>>>> the hypervisor and the VM is not aware of the process at all. >>>>>>>>>>>> >>>>>>>>>>>> Unlike the vDPA case, a real pci Virtual Function state=20 >>>>>>>>>>>> resides in >>>>>>>>>>>> the HW. >>>>>>>>>>>> >>>>>>>>>>> vDPA doesn't prevent you from having HW states. Actually=20 >>>>>>>>>>> from the view >>>>>>>>>>> of the VMM(Qemu), it doesn't care whether or not a state is=20 >>>>>>>>>>> stored in >>>>>>>>>>> the software or hardware. A well designed VMM should be able=20 >>>>>>>>>>> to hide >>>>>>>>>>> the virtio device implementation from the migration layer,=20 >>>>>>>>>>> that is how >>>>>>>>>>> Qemu is wrote who doesn't care about whether or not it's a=20 >>>>>>>>>>> software >>>>>>>>>>> virtio/vDPA device or not. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> In our vision, in order to fulfil the Live migration=20 >>>>>>>>>>>> requirements for >>>>>>>>>>>> virtual functions, each physical function device must=20 >>>>>>>>>>>> implement >>>>>>>>>>>> migration operations. Using these operations, it will be=20 >>>>>>>>>>>> able to >>>>>>>>>>>> master the migration process for the virtual function=20 >>>>>>>>>>>> devices. Each >>>>>>>>>>>> capable physical function device has a supervisor=20 >>>>>>>>>>>> permissions to >>>>>>>>>>>> change the virtual function operational states,=20 >>>>>>>>>>>> save/restore its >>>>>>>>>>>> internal state and start/stop dirty pages tracking. >>>>>>>>>>>> >>>>>>>>>>> For "supervisor permissions", is this from the software=20 >>>>>>>>>>> point of view? >>>>>>>>>>> Maybe it's better to give an example for this. >>>>>>>>>> A permission to a PF device for quiesce and freeze a VF=20 >>>>>>>>>> device for example. >>>>>>>>> Note that for safety, VMM (e.g Qemu) is usually running=20 >>>>>>>>> without any privileges. >>>>>>>> You're mixing layers here. >>>>>>>> >>>>>>>> QEMU is not involved here. It's only sending IOCTLs to=20 >>>>>>>> migration driver. >>>>>>>> The migration driver will control the migration process of the=20 >>>>>>>> VF using >>>>>>>> the PF communication channel. >>>>>>> So who will be granted the "permission" you mentioned here? >>>>>> This is just an expression. >>>>>> >>>>>> What is not clear ? >>>>>> >>>>>> The PF device will have an option to quiesce/freeze the VF device. >>>>>> >>>>>> This is simple. Why are you looking for some sophisticated=20 >>>>>> problems ? >>>>> I'm trying to follow along here and have not completely; but I=20 >>>>> think the issue is a >>>>> security separation one. >>>>> The VMM (e.g. qemu) that has been given access to one of the VF's is >>>>> isolated and shouldn't be able to go poking at other devices; so it >>>>> can't go poking at the PF (it probably doesn't even have the PF=20 >>>>> device >>>>> node accessible) - so then the question is who has access to the >>>>> migration driver and how do you make sure it can only deal with VF's >>>>> that it's supposed to be able to migrate. >>>> The QEMU/userspace doesn't know or care about the PF connection and=20 >>>> internal >>>> virtio_vfio_pci driver implementation. >>> OK >>> >>>> You shouldn't change 1 line of code in the VM driver nor in QEMU. >>> Hmm OK. >>> >>>> QEMU does not have access to the PF. Only the kernel driver that=20 >>>> has access >>>> to the VF will have access to the PF communication channel. There=20 >>>> is no >>>> permission problem here. >>>> >>>> The kernel driver of the VF will do this internally, and make sure=20 >>>> that the >>>> commands it build will only impact the VF originating them. >>>> >>> Now that confuses me; isn't the kernel driver that has access to the VF >>> running inside the guest?=C2=A0 If it's inside the guest we can't trust= =20 >>> it to >>> do anything about stopping impact to other devices. >> >> No. The driver is in the hypervisor (virtio_vfio_pci). This is the=20 >> migration driver, right ? > > > Well, talking things like virtio_vfio_pci that is not mentioned before=20 > and not justified on the list may easily confuse people. As pointed=20 > out in another thread, it has too many disadvantages over the existing=20 > virtio-pci vdpa driver. And it just duplicates a partial function of=20 > what virtio-pci vdpa driver can do. I don't think we will go that way. This was just an example for David to help with understanding the=20 solution since he thought that the guest drivers somehow should be changed. David I'm sorry if I confused you. Again Jason, you try to propose your vDPA solution that is not what=20 we're trying to achieve in this work. Think of a world without vDPA.=20 Also I don't understand how vDPA is related to virtio specification=20 decisions ? make vDPA into virtio and then we can open a discussion. I'm interesting in virtio migration of HW devices. The proposal in this thread is actually get support from Michal AFAIU=20 and also others were happy with. All beside of you. We do it in mlx5 and we didn't see any issues with that design. I don't think you can say that we "go that way". You're trying to build a complementary solution for creating scalable=20 functions and for some reason trying to sabotage NVIDIA efforts to add=20 new important functionality to virtio. This also sabotage the evolvment of virtio as a standard. You're trying to enforce some un-finished idea that should work on some=20 future specific HW platform instead of helping defining a good spec for=20 virtio. And all is for having users to choose vDPA framework instead of using=20 plain virtio. We believe in our solution and we have a working prototype. We'll=20 continue with our discussion to convince the community with it. Thanks. > > Thanks > > >> >> The guest is running as usual. It doesn't aware on the migration at all. >> >> This is the point I try to make here. I don't (and I can't) change=20 >> even 1 line of code in the guest. >> >> e.g: >> >> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor=20 >> (bounded to VF5) --> send admin command on PF adminq to start=20 >> tracking dirty pages for VF5 --> PF device will do it >> >> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor=20 >> (bounded to VF5) --> send admin command on PF adminq to quiesce VF5=20 >> --> PF device will do it >> >> You can take a look how we implement mlx5_vfio_pci in the link I=20 >> provided. >> >>> >>> Dave >>> >>> >>>> We already do this in mlx5 NIC migration. The kernel is secured and=20 >>>> QEMU >>>> interface is the VF. >>>> >>>>> Dave >>>>> >>>>>>>>>>>> An example of this approach can be seen in the way NVIDIA=20 >>>>>>>>>>>> performs >>>>>>>>>>>> live migration of a ConnectX NIC function: >>>>>>>>>>>> >>>>>>>>>>>> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> NVIDIAs SNAP technology enables hardware-accelerated=20 >>>>>>>>>>>> software defined >>>>>>>>>>>> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for=20 >>>>>>>>>>>> storage >>>>>>>>>>>> and networking solutions. The host OS/hypervisor uses its=20 >>>>>>>>>>>> standard >>>>>>>>>>>> drivers that are implemented according to a well-known VIRTIO >>>>>>>>>>>> specifications. >>>>>>>>>>>> >>>>>>>>>>>> In order to implement Live Migration for these virtual=20 >>>>>>>>>>>> function >>>>>>>>>>>> devices, that use a standard drivers as mentioned, the=20 >>>>>>>>>>>> specification >>>>>>>>>>>> should define how HW vendor should build their devices and=20 >>>>>>>>>>>> for SW >>>>>>>>>>>> developers to adjust the drivers. >>>>>>>>>>>> >>>>>>>>>>>> This will enable specification compliant vendor agnostic=20 >>>>>>>>>>>> solution. >>>>>>>>>>>> >>>>>>>>>>>> This is exactly how we built the migration driver for ConnectX >>>>>>>>>>>> (internal HW design doc) and I guess that this is the way=20 >>>>>>>>>>>> other >>>>>>>>>>>> vendors work. >>>>>>>>>>>> >>>>>>>>>>>> For that, I would like to know if the approach of =E2=80=9CPF = that=20 >>>>>>>>>>>> controls >>>>>>>>>>>> the VF live migration process=E2=80=9D is acceptable by the VI= RTIO=20 >>>>>>>>>>>> technical >>>>>>>>>>>> group ? >>>>>>>>>>>> >>>>>>>>>>> I'm not sure but I think it's better to start from the general >>>>>>>>>>> facility for all transports, then develop features for a=20 >>>>>>>>>>> specific >>>>>>>>>>> transport. >>>>>>>>>> a general facility for all transports can be a generic admin=20 >>>>>>>>>> queue ? >>>>>>>>> It could be a virtqueue or a transport specific method (pcie=20 >>>>>>>>> capability). >>>>>>>> No. You said a general facility for all transports. >>>>>>> For general facility, I mean the chapter 2 of the spec which is=20 >>>>>>> general >>>>>>> >>>>>>> " >>>>>>> 2 Basic Facilities of a Virtio Device >>>>>>> " >>>>>>> >>>>>> It will be in chapter 2. Right after "2.11 Exporting Object" I=20 >>>>>> can add "2.12 >>>>>> Admin Virtqueues" and this is what I did in the RFC. >>>>>> >>>>>>>> Transport specific is not general. >>>>>>> The transport is in charge of implementing the interface for=20 >>>>>>> those facilities. >>>>>> Transport specific is not general. >>>>>> >>>>>> >>>>>>>>> E.g we can define what needs to be migrated for the virtio-blk=20 >>>>>>>>> first >>>>>>>>> (the device state). Then we can define the interface to get=20 >>>>>>>>> and set >>>>>>>>> those states via admin virtqueue. Such decoupling may ease the=20 >>>>>>>>> future >>>>>>>>> development of the transport specific migration interface. >>>>>>>> I asked a simple question here. >>>>>>>> >>>>>>>> Lets stick to this. >>>>>>> I answered this question. >>>>>> No you didn't answer. >>>>>> >>>>>> I asked=C2=A0 if the approach of =E2=80=9CPF that controls the VF li= ve=20 >>>>>> migration process=E2=80=9D >>>>>> is acceptable by the VIRTIO technical group ? >>>>>> >>>>>> And you take the discussion to your direction instead of=20 >>>>>> answering a Yes/No >>>>>> question. >>>>>> >>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0 The virtqueue could be one of the >>>>>>> approaches. And it's your responsibility to convince the community >>>>>>> about that approach. Having an example may help people to=20 >>>>>>> understand >>>>>>> your proposal. >>>>>>> >>>>>>>> I'm not referring to internal state definitions. >>>>>>> Without an example, how do we know if it can work well? >>>>>>> >>>>>>>> Can you please not change the subject of my initial intent in=20 >>>>>>>> the email ? >>>>>>> Did I? Basically, I'm asking how a virtio-blk can be migrated with >>>>>>> your proposal. >>>>>> The virtio-blk PF admin queue will be used to manage the=20 >>>>>> virtio-blk VF >>>>>> migration. >>>>>> >>>>>> This is the whole discussion. I don't want to get into resolution. >>>>>> >>>>>> Since you already know the answer as I published 4 RFCs already=20 >>>>>> with all the >>>>>> flow. >>>>>> >>>>>> Lets stick to my question. >>>>>> >>>>>>> Thanks >>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> >>>>>>>>>>>> -Max. >>>>>>>>>>>> >>>>>>>>>> This publicly archived list offers a means to provide input=20 >>>>>>>>>> to the >>>>>>>>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>>>>>>>> >>>>>>>>>> In order to verify user consent to the Feedback License terms=20 >>>>>>>>>> and >>>>>>>>>> to minimize spam in the list archive, subscription is required >>>>>>>>>> before posting. >>>>>>>>>> >>>>>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>>>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>>>>>>>> List help: virtio-comment-help@lists.oasis-open.org >>>>>>>>>> List archive:=20 >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C01%= 7Cmgurtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d1572734= 0c1b7db39efd9ccc17a%7C0%7C0%7C637650231017968126%7CUnknown%7CTWFpbGZsb3d8ey= JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&= ;sdata=3DtqLsKdPEjF4uMUeqrwpeBD7J4q%2BrWKH1xRUXozJ4KEI%3D&reserved=3D0 >>>>>>>>>> Feedback License:=20 >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7C0= 1%7Cmgurtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727= 340c1b7db39efd9ccc17a%7C0%7C0%7C637650231017968126%7CUnknown%7CTWFpbGZsb3d8= eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&a= mp;sdata=3D2tg8X5uT17Vtj1hWZ7kEskf%2BeG9jV5HhJA3yepZksrg%3D&reserved=3D= 0 >>>>>>>>>> List Guidelines:=20 >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D04%= 7C01%7Cmgurtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15= 727340c1b7db39efd9ccc17a%7C0%7C0%7C637650231017968126%7CUnknown%7CTWFpbGZsb= 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C100= 0&sdata=3DYnuvd9xCIEDhTjjxuAgMFVCek%2B6wQGhGjnUyrsKCamM%3D&reserved= =3D0 >>>>>>>>>> Committee:=20 >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgurto= voy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7db3= 9efd9ccc17a%7C0%7C0%7C637650231017968126%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC= 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata= =3DQyJw7q7fwlq8oFjiaI%2BZICxARQEvXhrhRbiKkIyFmik%3D&reserved=3D0 >>>>>>>>>> Join OASIS:=20 >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.co= m%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7db39efd9ccc17a%7C0= %7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI= joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DjebnTI98aw8LE0= zC0BF2hiP6Rg3YsoKKunSVZz5GAlU%3D&reserved=3D0 >>>>>>>>>> >>>>>> This publicly archived list offers a means to provide input to the >>>>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>>>> >>>>>> In order to verify user consent to the Feedback License terms and >>>>>> to minimize spam in the list archive, subscription is required >>>>>> before posting. >>>>>> >>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>>>> List help: virtio-comment-help@lists.oasis-open.org >>>>>> List archive:=20 >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= lists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C01%7Cmg= urtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b= 7db39efd9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIj= oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sda= ta=3DT5gTfTBJ9blLPnGVlidRZFIeqrgi%2BcHyOwnVP5xtBPY%3D&reserved=3D0 >>>>>> Feedback License:=20 >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= www.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7C01%7C= mgurtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c= 1b7db39efd9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJW= IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&s= data=3DSmRyku3IZ8O5vITCXasVxF%2BEjcG7zb%2BDi8sZinSuvp4%3D&reserved=3D0 >>>>>> List Guidelines:=20 >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= www.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D04%7C01= %7Cmgurtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d157273= 40c1b7db39efd9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8e= yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&am= p;sdata=3DIV4C1Y%2BE%2Bq9OPzjyYYvoVrPp6975qyNZ%2Fn1%2B0eD58%2FA%3D&rese= rved=3D0 >>>>>> Committee:=20 >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= www.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgurtovoy%= 40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7db39efd= 9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj= AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3D8Bh= 0ufU4zKnzwspmmF0vN3tW3Ppi4RYfcT8pFzo%2FTT8%3D&reserved=3D0 >>>>>> Join OASIS:=20 >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= www.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C= 12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0= %7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV= 2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DjebnTI98aw8LE0zC0B= F2hiP6Rg3YsoKKunSVZz5GAlU%3D&reserved=3D0 >>>>>> >> >> This publicly archived list offers a means to provide input to the >> OASIS Virtual I/O Device (VIRTIO) TC. >> >> In order to verify user consent to the Feedback License terms and >> to minimize spam in the list archive, subscription is required >> before posting. >> >> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >> List help: virtio-comment-help@lists.oasis-open.org >> List archive:=20 >> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Flist= s.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C01%7Cmgurto= voy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7db3= 9efd9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC= 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata= =3DT5gTfTBJ9blLPnGVlidRZFIeqrgi%2BcHyOwnVP5xtBPY%3D&reserved=3D0 >> Feedback License:=20 >> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fwww.= oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7C01%7Cmgur= tovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7d= b39efd9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi= MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata= =3DSmRyku3IZ8O5vITCXasVxF%2BEjcG7zb%2BDi8sZinSuvp4%3D&reserved=3D0 >> List Guidelines:=20 >> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fwww.= oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D04%7C01%7Cm= gurtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1= b7db39efd9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWI= joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sd= ata=3DIV4C1Y%2BE%2Bq9OPzjyYYvoVrPp6975qyNZ%2Fn1%2B0eD58%2FA%3D&reserved= =3D0 >> Committee:=20 >> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fwww.= oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgurtovoy%40nv= idia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7db39efd9ccc= 17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD= AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3D8Bh0ufU= 4zKnzwspmmF0vN3tW3Ppi4RYfcT8pFzo%2FTT8%3D&reserved=3D0 >> Join OASIS:=20 >> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fwww.= oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C1218= 7bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C6= 37650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM= zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DjebnTI98aw8LE0zC0BF2hi= P6Rg3YsoKKunSVZz5GAlU%3D&reserved=3D0 >> > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lis= ts Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> <0d06c26e-f1e7-3cac-a017-059e8985bb44@redhat.com> <74151019-6f78-2bff-5b0a-b5a4da814787@nvidia.com> In-Reply-To: <74151019-6f78-2bff-5b0a-b5a4da814787@nvidia.com> From: Jason Wang Date: Fri, 20 Aug 2021 19:16:21 +0800 Message-ID: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable To: Max Gurtovoy Cc: "Dr. David Alan Gilbert" , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Fri, Aug 20, 2021 at 6:26 PM Max Gurtovoy wrote: > > > On 8/20/2021 5:24 AM, Jason Wang wrote: > > > > =E5=9C=A8 2021/8/19 =E4=B8=8B=E5=8D=8811:20, Max Gurtovoy =E5=86=99=E9= =81=93: > >> > >> On 8/19/2021 5:24 PM, Dr. David Alan Gilbert wrote: > >>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: > >>>> On 8/19/2021 2:12 PM, Dr. David Alan Gilbert wrote: > >>>>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: > >>>>>> On 8/18/2021 1:46 PM, Jason Wang wrote: > >>>>>>> On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy > >>>>>>> wrote: > >>>>>>>> On 8/17/2021 12:44 PM, Jason Wang wrote: > >>>>>>>>> On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy > >>>>>>>>> wrote: > >>>>>>>>>> On 8/17/2021 11:51 AM, Jason Wang wrote: > >>>>>>>>>>> =E5=9C=A8 2021/8/12 =E4=B8=8B=E5=8D=888:08, Max Gurtovoy =E5= =86=99=E9=81=93: > >>>>>>>>>>>> Hi all, > >>>>>>>>>>>> > >>>>>>>>>>>> Live migration is one of the most important features of > >>>>>>>>>>>> virtualization and virtio devices are oftenly found in virtu= al > >>>>>>>>>>>> environments. > >>>>>>>>>>>> > >>>>>>>>>>>> The migration process is managed by a migration SW that is > >>>>>>>>>>>> running on > >>>>>>>>>>>> the hypervisor and the VM is not aware of the process at all= . > >>>>>>>>>>>> > >>>>>>>>>>>> Unlike the vDPA case, a real pci Virtual Function state > >>>>>>>>>>>> resides in > >>>>>>>>>>>> the HW. > >>>>>>>>>>>> > >>>>>>>>>>> vDPA doesn't prevent you from having HW states. Actually > >>>>>>>>>>> from the view > >>>>>>>>>>> of the VMM(Qemu), it doesn't care whether or not a state is > >>>>>>>>>>> stored in > >>>>>>>>>>> the software or hardware. A well designed VMM should be able > >>>>>>>>>>> to hide > >>>>>>>>>>> the virtio device implementation from the migration layer, > >>>>>>>>>>> that is how > >>>>>>>>>>> Qemu is wrote who doesn't care about whether or not it's a > >>>>>>>>>>> software > >>>>>>>>>>> virtio/vDPA device or not. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> In our vision, in order to fulfil the Live migration > >>>>>>>>>>>> requirements for > >>>>>>>>>>>> virtual functions, each physical function device must > >>>>>>>>>>>> implement > >>>>>>>>>>>> migration operations. Using these operations, it will be > >>>>>>>>>>>> able to > >>>>>>>>>>>> master the migration process for the virtual function > >>>>>>>>>>>> devices. Each > >>>>>>>>>>>> capable physical function device has a supervisor > >>>>>>>>>>>> permissions to > >>>>>>>>>>>> change the virtual function operational states, > >>>>>>>>>>>> save/restore its > >>>>>>>>>>>> internal state and start/stop dirty pages tracking. > >>>>>>>>>>>> > >>>>>>>>>>> For "supervisor permissions", is this from the software > >>>>>>>>>>> point of view? > >>>>>>>>>>> Maybe it's better to give an example for this. > >>>>>>>>>> A permission to a PF device for quiesce and freeze a VF > >>>>>>>>>> device for example. > >>>>>>>>> Note that for safety, VMM (e.g Qemu) is usually running > >>>>>>>>> without any privileges. > >>>>>>>> You're mixing layers here. > >>>>>>>> > >>>>>>>> QEMU is not involved here. It's only sending IOCTLs to > >>>>>>>> migration driver. > >>>>>>>> The migration driver will control the migration process of the > >>>>>>>> VF using > >>>>>>>> the PF communication channel. > >>>>>>> So who will be granted the "permission" you mentioned here? > >>>>>> This is just an expression. > >>>>>> > >>>>>> What is not clear ? > >>>>>> > >>>>>> The PF device will have an option to quiesce/freeze the VF device. > >>>>>> > >>>>>> This is simple. Why are you looking for some sophisticated > >>>>>> problems ? > >>>>> I'm trying to follow along here and have not completely; but I > >>>>> think the issue is a > >>>>> security separation one. > >>>>> The VMM (e.g. qemu) that has been given access to one of the VF's i= s > >>>>> isolated and shouldn't be able to go poking at other devices; so it > >>>>> can't go poking at the PF (it probably doesn't even have the PF > >>>>> device > >>>>> node accessible) - so then the question is who has access to the > >>>>> migration driver and how do you make sure it can only deal with VF'= s > >>>>> that it's supposed to be able to migrate. > >>>> The QEMU/userspace doesn't know or care about the PF connection and > >>>> internal > >>>> virtio_vfio_pci driver implementation. > >>> OK > >>> > >>>> You shouldn't change 1 line of code in the VM driver nor in QEMU. > >>> Hmm OK. > >>> > >>>> QEMU does not have access to the PF. Only the kernel driver that > >>>> has access > >>>> to the VF will have access to the PF communication channel. There > >>>> is no > >>>> permission problem here. > >>>> > >>>> The kernel driver of the VF will do this internally, and make sure > >>>> that the > >>>> commands it build will only impact the VF originating them. > >>>> > >>> Now that confuses me; isn't the kernel driver that has access to the = VF > >>> running inside the guest? If it's inside the guest we can't trust > >>> it to > >>> do anything about stopping impact to other devices. > >> > >> No. The driver is in the hypervisor (virtio_vfio_pci). This is the > >> migration driver, right ? > > > > > > Well, talking things like virtio_vfio_pci that is not mentioned before > > and not justified on the list may easily confuse people. As pointed > > out in another thread, it has too many disadvantages over the existing > > virtio-pci vdpa driver. And it just duplicates a partial function of > > what virtio-pci vdpa driver can do. I don't think we will go that way. > > This was just an example for David to help with understanding the > solution since he thought that the guest drivers somehow should be change= d. > > David I'm sorry if I confused you. > > Again Jason, you try to propose your vDPA solution that is not what > we're trying to achieve in this work. Think of a world without vDPA. Well, I'd say, let's think vDPA a superset of virtio, not just the acceleration technologies. > Also I don't understand how vDPA is related to virtio specification > decisions ? So how is VFIO related to virtio specific decisions? That's why I think we should avoid talking about software architecture here. It's the wrong community. > make vDPA into virtio and then we can open a discussion. > > I'm interesting in virtio migration of HW devices. > > The proposal in this thread is actually get support from Michal AFAIU > and also others were happy with. All beside of you. So I think I've clairfied my several times :( - I'm fairly ok with the proposal - but we decouple the basic facility out of the admin virtqueue and this seems agreed by Michael: Let's take the dirty page tracking as an example: 1) let's first define that as one of the basic facility 2) then we can introduce admin virtqueue or other stuffs as an interface for that facility Does this work for you? > > We do it in mlx5 and we didn't see any issues with that design. > If we seperate things as I suggested, I'm totally fine. > I don't think you can say that we "go that way". For "go that way" I meant the method of using vfio_virtio_pci, it has nothing related to the discussion of "using PF to control VF" on the spec. > > You're trying to build a complementary solution for creating scalable > functions and for some reason trying to sabotage NVIDIA efforts to add > new important functionality to virtio. Well, it's a completely different topic. And it doesn't conflict with anything that is proposed here by you. I think I've stated this several times. I don't think we block each other, it's just some unification work if one of the proposals is merged first. I sent them recently because it will be used as a material for my talk on the KVM Forum which is really near. > > This also sabotage the evolvment of virtio as a standard. > > You're trying to enforce some un-finished idea that should work on some > future specific HW platform instead of helping defining a good spec for > virtio. Let's open another thread for this if you wish, it has nothing related to the spec but how it is implemented in Linux. If you search the archive, something similar to "vfio_virtio_pci" has been proposed several years before by Intel. The idea has been rejected, and we have leveraged Linux vDPA bus for virtio-pci devices. > > And all is for having users to choose vDPA framework instead of using > plain virtio. > > We believe in our solution and we have a working prototype. We'll > continue with our discussion to convince the community with it. Again, it looks like there's a lot of misunderstanding. Let's open a thread on the suitable list instead of talking about any specific software solution or architecture here. This will speed up things. Thanks > > Thanks. > > > > > Thanks > > > > > >> > >> The guest is running as usual. It doesn't aware on the migration at al= l. > >> > >> This is the point I try to make here. I don't (and I can't) change > >> even 1 line of code in the guest. > >> > >> e.g: > >> > >> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor > >> (bounded to VF5) --> send admin command on PF adminq to start > >> tracking dirty pages for VF5 --> PF device will do it > >> > >> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor > >> (bounded to VF5) --> send admin command on PF adminq to quiesce VF5 > >> --> PF device will do it > >> > >> You can take a look how we implement mlx5_vfio_pci in the link I > >> provided. > >> > >>> > >>> Dave > >>> > >>> > >>>> We already do this in mlx5 NIC migration. The kernel is secured and > >>>> QEMU > >>>> interface is the VF. > >>>> > >>>>> Dave > >>>>> > >>>>>>>>>>>> An example of this approach can be seen in the way NVIDIA > >>>>>>>>>>>> performs > >>>>>>>>>>>> live migration of a ConnectX NIC function: > >>>>>>>>>>>> > >>>>>>>>>>>> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> NVIDIAs SNAP technology enables hardware-accelerated > >>>>>>>>>>>> software defined > >>>>>>>>>>>> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for > >>>>>>>>>>>> storage > >>>>>>>>>>>> and networking solutions. The host OS/hypervisor uses its > >>>>>>>>>>>> standard > >>>>>>>>>>>> drivers that are implemented according to a well-known VIRTI= O > >>>>>>>>>>>> specifications. > >>>>>>>>>>>> > >>>>>>>>>>>> In order to implement Live Migration for these virtual > >>>>>>>>>>>> function > >>>>>>>>>>>> devices, that use a standard drivers as mentioned, the > >>>>>>>>>>>> specification > >>>>>>>>>>>> should define how HW vendor should build their devices and > >>>>>>>>>>>> for SW > >>>>>>>>>>>> developers to adjust the drivers. > >>>>>>>>>>>> > >>>>>>>>>>>> This will enable specification compliant vendor agnostic > >>>>>>>>>>>> solution. > >>>>>>>>>>>> > >>>>>>>>>>>> This is exactly how we built the migration driver for Connec= tX > >>>>>>>>>>>> (internal HW design doc) and I guess that this is the way > >>>>>>>>>>>> other > >>>>>>>>>>>> vendors work. > >>>>>>>>>>>> > >>>>>>>>>>>> For that, I would like to know if the approach of =E2=80=9CP= F that > >>>>>>>>>>>> controls > >>>>>>>>>>>> the VF live migration process=E2=80=9D is acceptable by the = VIRTIO > >>>>>>>>>>>> technical > >>>>>>>>>>>> group ? > >>>>>>>>>>>> > >>>>>>>>>>> I'm not sure but I think it's better to start from the genera= l > >>>>>>>>>>> facility for all transports, then develop features for a > >>>>>>>>>>> specific > >>>>>>>>>>> transport. > >>>>>>>>>> a general facility for all transports can be a generic admin > >>>>>>>>>> queue ? > >>>>>>>>> It could be a virtqueue or a transport specific method (pcie > >>>>>>>>> capability). > >>>>>>>> No. You said a general facility for all transports. > >>>>>>> For general facility, I mean the chapter 2 of the spec which is > >>>>>>> general > >>>>>>> > >>>>>>> " > >>>>>>> 2 Basic Facilities of a Virtio Device > >>>>>>> " > >>>>>>> > >>>>>> It will be in chapter 2. Right after "2.11 Exporting Object" I > >>>>>> can add "2.12 > >>>>>> Admin Virtqueues" and this is what I did in the RFC. > >>>>>> > >>>>>>>> Transport specific is not general. > >>>>>>> The transport is in charge of implementing the interface for > >>>>>>> those facilities. > >>>>>> Transport specific is not general. > >>>>>> > >>>>>> > >>>>>>>>> E.g we can define what needs to be migrated for the virtio-blk > >>>>>>>>> first > >>>>>>>>> (the device state). Then we can define the interface to get > >>>>>>>>> and set > >>>>>>>>> those states via admin virtqueue. Such decoupling may ease the > >>>>>>>>> future > >>>>>>>>> development of the transport specific migration interface. > >>>>>>>> I asked a simple question here. > >>>>>>>> > >>>>>>>> Lets stick to this. > >>>>>>> I answered this question. > >>>>>> No you didn't answer. > >>>>>> > >>>>>> I asked if the approach of =E2=80=9CPF that controls the VF live > >>>>>> migration process=E2=80=9D > >>>>>> is acceptable by the VIRTIO technical group ? > >>>>>> > >>>>>> And you take the discussion to your direction instead of > >>>>>> answering a Yes/No > >>>>>> question. > >>>>>> > >>>>>>> The virtqueue could be one of the > >>>>>>> approaches. And it's your responsibility to convince the communit= y > >>>>>>> about that approach. Having an example may help people to > >>>>>>> understand > >>>>>>> your proposal. > >>>>>>> > >>>>>>>> I'm not referring to internal state definitions. > >>>>>>> Without an example, how do we know if it can work well? > >>>>>>> > >>>>>>>> Can you please not change the subject of my initial intent in > >>>>>>>> the email ? > >>>>>>> Did I? Basically, I'm asking how a virtio-blk can be migrated wit= h > >>>>>>> your proposal. > >>>>>> The virtio-blk PF admin queue will be used to manage the > >>>>>> virtio-blk VF > >>>>>> migration. > >>>>>> > >>>>>> This is the whole discussion. I don't want to get into resolution. > >>>>>> > >>>>>> Since you already know the answer as I published 4 RFCs already > >>>>>> with all the > >>>>>> flow. > >>>>>> > >>>>>> Lets stick to my question. > >>>>>> > >>>>>>> Thanks > >>>>>>> > >>>>>>>> Thanks. > >>>>>>>> > >>>>>>>> > >>>>>>>>> Thanks > >>>>>>>>> > >>>>>>>>>>> Thanks > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> > >>>>>>>>>>>> -Max. > >>>>>>>>>>>> > >>>>>>>>>> This publicly archived list offers a means to provide input > >>>>>>>>>> to the > >>>>>>>>>> OASIS Virtual I/O Device (VIRTIO) TC. > >>>>>>>>>> > >>>>>>>>>> In order to verify user consent to the Feedback License terms > >>>>>>>>>> and > >>>>>>>>>> to minimize spam in the list archive, subscription is required > >>>>>>>>>> before posting. > >>>>>>>>>> > >>>>>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org > >>>>>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > >>>>>>>>>> List help: virtio-comment-help@lists.oasis-open.org > >>>>>>>>>> List archive: > >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C0= 1%7Cmgurtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727= 340c1b7db39efd9ccc17a%7C0%7C0%7C637650231017968126%7CUnknown%7CTWFpbGZsb3d8= eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&a= mp;sdata=3DtqLsKdPEjF4uMUeqrwpeBD7J4q%2BrWKH1xRUXozJ4KEI%3D&reserved=3D= 0 > >>>>>>>>>> Feedback License: > >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7= C01%7Cmgurtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d157= 27340c1b7db39efd9ccc17a%7C0%7C0%7C637650231017968126%7CUnknown%7CTWFpbGZsb3= d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000= &sdata=3D2tg8X5uT17Vtj1hWZ7kEskf%2BeG9jV5HhJA3yepZksrg%3D&reserved= =3D0 > >>>>>>>>>> List Guidelines: > >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D0= 4%7C01%7Cmgurtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d= 15727340c1b7db39efd9ccc17a%7C0%7C0%7C637650231017968126%7CUnknown%7CTWFpbGZ= sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1= 000&sdata=3DYnuvd9xCIEDhTjjxuAgMFVCek%2B6wQGhGjnUyrsKCamM%3D&reserv= ed=3D0 > >>>>>>>>>> Committee: > >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgur= tovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7d= b39efd9ccc17a%7C0%7C0%7C637650231017968126%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi= MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata= =3DQyJw7q7fwlq8oFjiaI%2BZICxARQEvXhrhRbiKkIyFmik%3D&reserved=3D0 > >>>>>>>>>> Join OASIS: > >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.= com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7db39efd9ccc17a%7= C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ= QIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DjebnTI98aw8L= E0zC0BF2hiP6Rg3YsoKKunSVZz5GAlU%3D&reserved=3D0 > >>>>>>>>>> > >>>>>> This publicly archived list offers a means to provide input to the > >>>>>> OASIS Virtual I/O Device (VIRTIO) TC. > >>>>>> > >>>>>> In order to verify user consent to the Feedback License terms and > >>>>>> to minimize spam in the list archive, subscription is required > >>>>>> before posting. > >>>>>> > >>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org > >>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > >>>>>> List help: virtio-comment-help@lists.oasis-open.org > >>>>>> List archive: > >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C01%7C= mgurtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c= 1b7db39efd9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJW= IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&s= data=3DT5gTfTBJ9blLPnGVlidRZFIeqrgi%2BcHyOwnVP5xtBPY%3D&reserved=3D0 > >>>>>> Feedback License: > >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7C01%= 7Cmgurtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d1572734= 0c1b7db39efd9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8ey= JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&= ;sdata=3DSmRyku3IZ8O5vITCXasVxF%2BEjcG7zb%2BDi8sZinSuvp4%3D&reserved=3D= 0 > >>>>>> List Guidelines: > >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D04%7C= 01%7Cmgurtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d1572= 7340c1b7db39efd9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d= 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&= amp;sdata=3DIV4C1Y%2BE%2Bq9OPzjyYYvoVrPp6975qyNZ%2Fn1%2B0eD58%2FA%3D&re= served=3D0 > >>>>>> Committee: > >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgurtovo= y%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7db39e= fd9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w= LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3D8= Bh0ufU4zKnzwspmmF0vN3tW3Ppi4RYfcT8pFzo%2FTT8%3D&reserved=3D0 > >>>>>> Join OASIS: > >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.com%= 7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7db39efd9ccc17a%7C0%7= C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo= iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DjebnTI98aw8LE0zC= 0BF2hiP6Rg3YsoKKunSVZz5GAlU%3D&reserved=3D0 > >>>>>> > >> > >> This publicly archived list offers a means to provide input to the > >> OASIS Virtual I/O Device (VIRTIO) TC. > >> > >> In order to verify user consent to the Feedback License terms and > >> to minimize spam in the list archive, subscription is required > >> before posting. > >> > >> Subscribe: virtio-comment-subscribe@lists.oasis-open.org > >> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > >> List help: virtio-comment-help@lists.oasis-open.org > >> List archive: > >> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fli= sts.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C01%7Cmgur= tovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7d= b39efd9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi= MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata= =3DT5gTfTBJ9blLPnGVlidRZFIeqrgi%2BcHyOwnVP5xtBPY%3D&reserved=3D0 > >> Feedback License: > >> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fww= w.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7C01%7Cmg= urtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b= 7db39efd9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIj= oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sda= ta=3DSmRyku3IZ8O5vITCXasVxF%2BEjcG7zb%2BDi8sZinSuvp4%3D&reserved=3D0 > >> List Guidelines: > >> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fww= w.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D04%7C01%7= Cmgurtovoy%40nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340= c1b7db39efd9ccc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJ= WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&= sdata=3DIV4C1Y%2BE%2Bq9OPzjyYYvoVrPp6975qyNZ%2Fn1%2B0eD58%2FA%3D&reserv= ed=3D0 > >> Committee: > >> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fww= w.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgurtovoy%40= nvidia.com%7C12187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7db39efd9c= cc17a%7C0%7C0%7C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw= MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3D8Bh0u= fU4zKnzwspmmF0vN3tW3Ppi4RYfcT8pFzo%2FTT8%3D&reserved=3D0 > >> Join OASIS: > >> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fww= w.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C12= 187bc5e0d94abdd7bb08d96381b438%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7= C637650231017978092%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2l= uMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DjebnTI98aw8LE0zC0BF2= hiP6Rg3YsoKKunSVZz5GAlU%3D&reserved=3D0 > >> > > > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-l= ists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> <0d06c26e-f1e7-3cac-a017-059e8985bb44@redhat.com> <74151019-6f78-2bff-5b0a-b5a4da814787@nvidia.com> From: Max Gurtovoy Message-ID: <41fbd78a-f1d8-9056-3929-1e7b6b57a49b@nvidia.com> Date: Sun, 22 Aug 2021 13:05:04 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable Content-Language: en-US To: Jason Wang Cc: "Dr. David Alan Gilbert" , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On 8/20/2021 2:16 PM, Jason Wang wrote: > On Fri, Aug 20, 2021 at 6:26 PM Max Gurtovoy wrote= : >> >> On 8/20/2021 5:24 AM, Jason Wang wrote: >>> =E5=9C=A8 2021/8/19 =E4=B8=8B=E5=8D=8811:20, Max Gurtovoy =E5=86=99=E9= =81=93: >>>> On 8/19/2021 5:24 PM, Dr. David Alan Gilbert wrote: >>>>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: >>>>>> On 8/19/2021 2:12 PM, Dr. David Alan Gilbert wrote: >>>>>>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: >>>>>>>> On 8/18/2021 1:46 PM, Jason Wang wrote: >>>>>>>>> On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy >>>>>>>>> wrote: >>>>>>>>>> On 8/17/2021 12:44 PM, Jason Wang wrote: >>>>>>>>>>> On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy >>>>>>>>>>> wrote: >>>>>>>>>>>> On 8/17/2021 11:51 AM, Jason Wang wrote: >>>>>>>>>>>>> =E5=9C=A8 2021/8/12 =E4=B8=8B=E5=8D=888:08, Max Gurtovoy =E5= =86=99=E9=81=93: >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Live migration is one of the most important features of >>>>>>>>>>>>>> virtualization and virtio devices are oftenly found in virtu= al >>>>>>>>>>>>>> environments. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The migration process is managed by a migration SW that is >>>>>>>>>>>>>> running on >>>>>>>>>>>>>> the hypervisor and the VM is not aware of the process at all= . >>>>>>>>>>>>>> >>>>>>>>>>>>>> Unlike the vDPA case, a real pci Virtual Function state >>>>>>>>>>>>>> resides in >>>>>>>>>>>>>> the HW. >>>>>>>>>>>>>> >>>>>>>>>>>>> vDPA doesn't prevent you from having HW states. Actually >>>>>>>>>>>>> from the view >>>>>>>>>>>>> of the VMM(Qemu), it doesn't care whether or not a state is >>>>>>>>>>>>> stored in >>>>>>>>>>>>> the software or hardware. A well designed VMM should be able >>>>>>>>>>>>> to hide >>>>>>>>>>>>> the virtio device implementation from the migration layer, >>>>>>>>>>>>> that is how >>>>>>>>>>>>> Qemu is wrote who doesn't care about whether or not it's a >>>>>>>>>>>>> software >>>>>>>>>>>>> virtio/vDPA device or not. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> In our vision, in order to fulfil the Live migration >>>>>>>>>>>>>> requirements for >>>>>>>>>>>>>> virtual functions, each physical function device must >>>>>>>>>>>>>> implement >>>>>>>>>>>>>> migration operations. Using these operations, it will be >>>>>>>>>>>>>> able to >>>>>>>>>>>>>> master the migration process for the virtual function >>>>>>>>>>>>>> devices. Each >>>>>>>>>>>>>> capable physical function device has a supervisor >>>>>>>>>>>>>> permissions to >>>>>>>>>>>>>> change the virtual function operational states, >>>>>>>>>>>>>> save/restore its >>>>>>>>>>>>>> internal state and start/stop dirty pages tracking. >>>>>>>>>>>>>> >>>>>>>>>>>>> For "supervisor permissions", is this from the software >>>>>>>>>>>>> point of view? >>>>>>>>>>>>> Maybe it's better to give an example for this. >>>>>>>>>>>> A permission to a PF device for quiesce and freeze a VF >>>>>>>>>>>> device for example. >>>>>>>>>>> Note that for safety, VMM (e.g Qemu) is usually running >>>>>>>>>>> without any privileges. >>>>>>>>>> You're mixing layers here. >>>>>>>>>> >>>>>>>>>> QEMU is not involved here. It's only sending IOCTLs to >>>>>>>>>> migration driver. >>>>>>>>>> The migration driver will control the migration process of the >>>>>>>>>> VF using >>>>>>>>>> the PF communication channel. >>>>>>>>> So who will be granted the "permission" you mentioned here? >>>>>>>> This is just an expression. >>>>>>>> >>>>>>>> What is not clear ? >>>>>>>> >>>>>>>> The PF device will have an option to quiesce/freeze the VF device. >>>>>>>> >>>>>>>> This is simple. Why are you looking for some sophisticated >>>>>>>> problems ? >>>>>>> I'm trying to follow along here and have not completely; but I >>>>>>> think the issue is a >>>>>>> security separation one. >>>>>>> The VMM (e.g. qemu) that has been given access to one of the VF's i= s >>>>>>> isolated and shouldn't be able to go poking at other devices; so it >>>>>>> can't go poking at the PF (it probably doesn't even have the PF >>>>>>> device >>>>>>> node accessible) - so then the question is who has access to the >>>>>>> migration driver and how do you make sure it can only deal with VF'= s >>>>>>> that it's supposed to be able to migrate. >>>>>> The QEMU/userspace doesn't know or care about the PF connection and >>>>>> internal >>>>>> virtio_vfio_pci driver implementation. >>>>> OK >>>>> >>>>>> You shouldn't change 1 line of code in the VM driver nor in QEMU. >>>>> Hmm OK. >>>>> >>>>>> QEMU does not have access to the PF. Only the kernel driver that >>>>>> has access >>>>>> to the VF will have access to the PF communication channel. There >>>>>> is no >>>>>> permission problem here. >>>>>> >>>>>> The kernel driver of the VF will do this internally, and make sure >>>>>> that the >>>>>> commands it build will only impact the VF originating them. >>>>>> >>>>> Now that confuses me; isn't the kernel driver that has access to the = VF >>>>> running inside the guest? If it's inside the guest we can't trust >>>>> it to >>>>> do anything about stopping impact to other devices. >>>> No. The driver is in the hypervisor (virtio_vfio_pci). This is the >>>> migration driver, right ? >>> >>> Well, talking things like virtio_vfio_pci that is not mentioned before >>> and not justified on the list may easily confuse people. As pointed >>> out in another thread, it has too many disadvantages over the existing >>> virtio-pci vdpa driver. And it just duplicates a partial function of >>> what virtio-pci vdpa driver can do. I don't think we will go that way. >> This was just an example for David to help with understanding the >> solution since he thought that the guest drivers somehow should be chang= ed. >> >> David I'm sorry if I confused you. >> >> Again Jason, you try to propose your vDPA solution that is not what >> we're trying to achieve in this work. Think of a world without vDPA. > Well, I'd say, let's think vDPA a superset of virtio, not just the > acceleration technologies. I'm sorry but vDPA is not relevant to this discussion. Anyhow, I don't see any problem for vDPA driver to work on top of the=20 design proposed here. >> Also I don't understand how vDPA is related to virtio specification >> decisions ? > So how is VFIO related to virtio specific decisions? That's why I > think we should avoid talking about software architecture here. It's > the wrong community. VFIO is not related to virtio spec. It was an example for David. What is the problem with giving examples to=20 ease on people to understand the solution ? Where did you see that the design is referring to VFIO ? > >> make vDPA into virtio and then we can open a discussion. >> >> I'm interesting in virtio migration of HW devices. >> >> The proposal in this thread is actually get support from Michal AFAIU >> and also others were happy with. All beside of you. > So I think I've clairfied my several times :( > > - I'm fairly ok with the proposal It doesn't seems like that. > - but we decouple the basic facility out of the admin virtqueue and > this seems agreed by Michael: > > Let's take the dirty page tracking as an example: > > 1) let's first define that as one of the basic facility > 2) then we can introduce admin virtqueue or other stuffs as an > interface for that facility > > Does this work for you? What I really want is to agree that the right way to manage migration=20 process of a virtio VF. My proposal is doing so by creating a=20 communication channel in its parent PF. I think I got a confirmation here. This communication channel is not introduced in this thread, but=20 obviously it should be an adminq. For your future scalable functions, the Parent Device (lets call it PD)=20 will manage the creation/migration/destruction process for its Virtual=20 Devices (lets call them VDs) using the PD adminq. Agreed ? Please don't answer that this is not a "must". This is my proposal. If=20 you have another proposal, please propose. > >> We do it in mlx5 and we didn't see any issues with that design. >> > If we seperate things as I suggested, I'm totally fine. separate what ? Why should I create different interfaces for different management tasks. I have a virtual/scalable device that I want to=C2=A0 refer to from the=20 physical/parent device using some interface. This interface is adminq. This interface will be used for dirty_page=20 tracking and operational state changing and get/set internal state as=20 well. And more (create/destroy SF for example). You can think of this in some other way, i'm fine with it. As long as=20 the final conclusion is the same. > >> I don't think you can say that we "go that way". > For "go that way" I meant the method of using vfio_virtio_pci, it has > nothing related to the discussion of "using PF to control VF" on the > spec. This was an example. Please leave it as an example for David. >> You're trying to build a complementary solution for creating scalable >> functions and for some reason trying to sabotage NVIDIA efforts to add >> new important functionality to virtio. > Well, it's a completely different topic. And it doesn't conflict with > anything that is proposed here by you. I think I've stated this > several times. I don't think we block each other, it's just some > unification work if one of the proposals is merged first. I sent them > recently because it will be used as a material for my talk on the KVM > Forum which is really near. In theory you're right. We shouldn't block each other, and I don't block=20 you. But for some reason I see that you do try to block my proposal and=20 I don't understand why. I feel like I wasted 2 months on a discussion instead of progressing. But now I do see a progress. A PF to manage VF migration is the way to=20 go forward. And the following RFC will take this into consideration. > >> This also sabotage the evolvment of virtio as a standard. >> >> You're trying to enforce some un-finished idea that should work on some >> future specific HW platform instead of helping defining a good spec for >> virtio. > Let's open another thread for this if you wish, it has nothing related > to the spec but how it is implemented in Linux. If you search the > archive, something similar to "vfio_virtio_pci" has been proposed > several years before by Intel. The idea has been rejected, and we have > leveraged Linux vDPA bus for virtio-pci devices. I don't know this history. And I will happy to hear about it one day. But for our discussion in Linux, virtio_vfio_pci will happen. And it=20 will implement the migration logic of a virtio device with PCI transport=20 for VFs using the PF admin queue. We at NVIDIA, currently upstreaming (alongside with AlexW and Cornelia)=20 a vfio-pci separation that will enable an easy creation of vfio-pci=20 vendor/protocol drivers to do some specific tasks. New drivers such as mlx5_vfio_pci, hns_vfio_pci, virtio_vfio_pci and=20 nvme_vfio_pci should be implemented in the near future in Linux to=20 enable migration of these devices. This is just an example. And it's not related to the spec nor the=20 proposal at all. > >> And all is for having users to choose vDPA framework instead of using >> plain virtio. >> >> We believe in our solution and we have a working prototype. We'll >> continue with our discussion to convince the community with it. > Again, it looks like there's a lot of misunderstanding. Let's open a > thread on the suitable list instead of talking about any specific > software solution or architecture here. This will speed up things. I prefer to finish the specification first. SW arch is clear for us in=20 Linux. We did it already for mlx5 devices and it will be the same for=20 virtio if the spec changes will be accepted. Thanks. > > Thanks > >> Thanks. >> >>> Thanks >>> >>> >>>> The guest is running as usual. It doesn't aware on the migration at al= l. >>>> >>>> This is the point I try to make here. I don't (and I can't) change >>>> even 1 line of code in the guest. >>>> >>>> e.g: >>>> >>>> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor >>>> (bounded to VF5) --> send admin command on PF adminq to start >>>> tracking dirty pages for VF5 --> PF device will do it >>>> >>>> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor >>>> (bounded to VF5) --> send admin command on PF adminq to quiesce VF5 >>>> --> PF device will do it >>>> >>>> You can take a look how we implement mlx5_vfio_pci in the link I >>>> provided. >>>> >>>>> Dave >>>>> >>>>> >>>>>> We already do this in mlx5 NIC migration. The kernel is secured and >>>>>> QEMU >>>>>> interface is the VF. >>>>>> >>>>>>> Dave >>>>>>> >>>>>>>>>>>>>> An example of this approach can be seen in the way NVIDIA >>>>>>>>>>>>>> performs >>>>>>>>>>>>>> live migration of a ConnectX NIC function: >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> NVIDIAs SNAP technology enables hardware-accelerated >>>>>>>>>>>>>> software defined >>>>>>>>>>>>>> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for >>>>>>>>>>>>>> storage >>>>>>>>>>>>>> and networking solutions. The host OS/hypervisor uses its >>>>>>>>>>>>>> standard >>>>>>>>>>>>>> drivers that are implemented according to a well-known VIRTI= O >>>>>>>>>>>>>> specifications. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In order to implement Live Migration for these virtual >>>>>>>>>>>>>> function >>>>>>>>>>>>>> devices, that use a standard drivers as mentioned, the >>>>>>>>>>>>>> specification >>>>>>>>>>>>>> should define how HW vendor should build their devices and >>>>>>>>>>>>>> for SW >>>>>>>>>>>>>> developers to adjust the drivers. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This will enable specification compliant vendor agnostic >>>>>>>>>>>>>> solution. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is exactly how we built the migration driver for Connec= tX >>>>>>>>>>>>>> (internal HW design doc) and I guess that this is the way >>>>>>>>>>>>>> other >>>>>>>>>>>>>> vendors work. >>>>>>>>>>>>>> >>>>>>>>>>>>>> For that, I would like to know if the approach of =E2=80=9CP= F that >>>>>>>>>>>>>> controls >>>>>>>>>>>>>> the VF live migration process=E2=80=9D is acceptable by the = VIRTIO >>>>>>>>>>>>>> technical >>>>>>>>>>>>>> group ? >>>>>>>>>>>>>> >>>>>>>>>>>>> I'm not sure but I think it's better to start from the genera= l >>>>>>>>>>>>> facility for all transports, then develop features for a >>>>>>>>>>>>> specific >>>>>>>>>>>>> transport. >>>>>>>>>>>> a general facility for all transports can be a generic admin >>>>>>>>>>>> queue ? >>>>>>>>>>> It could be a virtqueue or a transport specific method (pcie >>>>>>>>>>> capability). >>>>>>>>>> No. You said a general facility for all transports. >>>>>>>>> For general facility, I mean the chapter 2 of the spec which is >>>>>>>>> general >>>>>>>>> >>>>>>>>> " >>>>>>>>> 2 Basic Facilities of a Virtio Device >>>>>>>>> " >>>>>>>>> >>>>>>>> It will be in chapter 2. Right after "2.11 Exporting Object" I >>>>>>>> can add "2.12 >>>>>>>> Admin Virtqueues" and this is what I did in the RFC. >>>>>>>> >>>>>>>>>> Transport specific is not general. >>>>>>>>> The transport is in charge of implementing the interface for >>>>>>>>> those facilities. >>>>>>>> Transport specific is not general. >>>>>>>> >>>>>>>> >>>>>>>>>>> E.g we can define what needs to be migrated for the virtio-blk >>>>>>>>>>> first >>>>>>>>>>> (the device state). Then we can define the interface to get >>>>>>>>>>> and set >>>>>>>>>>> those states via admin virtqueue. Such decoupling may ease the >>>>>>>>>>> future >>>>>>>>>>> development of the transport specific migration interface. >>>>>>>>>> I asked a simple question here. >>>>>>>>>> >>>>>>>>>> Lets stick to this. >>>>>>>>> I answered this question. >>>>>>>> No you didn't answer. >>>>>>>> >>>>>>>> I asked if the approach of =E2=80=9CPF that controls the VF live >>>>>>>> migration process=E2=80=9D >>>>>>>> is acceptable by the VIRTIO technical group ? >>>>>>>> >>>>>>>> And you take the discussion to your direction instead of >>>>>>>> answering a Yes/No >>>>>>>> question. >>>>>>>> >>>>>>>>> The virtqueue could be one of the >>>>>>>>> approaches. And it's your responsibility to convince the communit= y >>>>>>>>> about that approach. Having an example may help people to >>>>>>>>> understand >>>>>>>>> your proposal. >>>>>>>>> >>>>>>>>>> I'm not referring to internal state definitions. >>>>>>>>> Without an example, how do we know if it can work well? >>>>>>>>> >>>>>>>>>> Can you please not change the subject of my initial intent in >>>>>>>>>> the email ? >>>>>>>>> Did I? Basically, I'm asking how a virtio-blk can be migrated wit= h >>>>>>>>> your proposal. >>>>>>>> The virtio-blk PF admin queue will be used to manage the >>>>>>>> virtio-blk VF >>>>>>>> migration. >>>>>>>> >>>>>>>> This is the whole discussion. I don't want to get into resolution. >>>>>>>> >>>>>>>> Since you already know the answer as I published 4 RFCs already >>>>>>>> with all the >>>>>>>> flow. >>>>>>>> >>>>>>>> Lets stick to my question. >>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Max. >>>>>>>>>>>>>> >>>>>>>>>>>> This publicly archived list offers a means to provide input >>>>>>>>>>>> to the >>>>>>>>>>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>>>>>>>>>> >>>>>>>>>>>> In order to verify user consent to the Feedback License terms >>>>>>>>>>>> and >>>>>>>>>>>> to minimize spam in the list archive, subscription is required >>>>>>>>>>>> before posting. >>>>>>>>>>>> >>>>>>>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>>>>>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>>>>>>>>>> List help: virtio-comment-help@lists.oasis-open.org >>>>>>>>>>>> List archive: >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C0= 1%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727= 340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011068512%7CUnknown%7CTWFpbGZsb3d8= eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&a= mp;sdata=3DZOtYROgA9trASrl6vZpAgWKqd69wSO%2FIIehl%2F%2FV9Ji0%3D&reserve= d=3D0 >>>>>>>>>>>> Feedback License: >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7= C01%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d157= 27340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011068512%7CUnknown%7CTWFpbGZsb3= d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000= &sdata=3DRdR%2Bs7zilTU%2BwJii1XwwpObuMlBr0ThWy4ur0TacYYo%3D&reserve= d=3D0 >>>>>>>>>>>> List Guidelines: >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D0= 4%7C01%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d= 15727340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011068512%7CUnknown%7CTWFpbGZ= sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1= 000&sdata=3Dz2bXjUR8N0unB5xCki960drEzcAOM0oM6ULLh58B%2Bw0%3D&reserv= ed=3D0 >>>>>>>>>>>> Committee: >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgur= tovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7d= b39efd9ccc17a%7C0%7C0%7C637650550011068512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi= MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata= =3D%2BMvgIFetSa3nZiMQ9NQNwQ23nf61tfoHF%2F7ZK3%2BjtVA%3D&reserved=3D0 >>>>>>>>>>>> Join OASIS: >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.= com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7db39efd9ccc17a%7= C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ= QIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DdWHo4Wz8oQ9o= 7VZDui%2Fsdg9iQbrFi1syTJuBDULZ%2BWs%3D&reserved=3D0 >>>>>>>>>>>> >>>>>>>> This publicly archived list offers a means to provide input to the >>>>>>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>>>>>> >>>>>>>> In order to verify user consent to the Feedback License terms and >>>>>>>> to minimize spam in the list archive, subscription is required >>>>>>>> before posting. >>>>>>>> >>>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>>>>>> List help: virtio-comment-help@lists.oasis-open.org >>>>>>>> List archive: >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C01%7C= mgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c= 1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJW= IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&s= data=3DLMnn0oZKj%2Bf2kE9pVeC2uSfROFFSawi6MJYUHmclJb0%3D&reserved=3D0 >>>>>>>> Feedback License: >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7C01%= 7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d1572734= 0c1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8ey= JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&= ;sdata=3DZ7M42HYu%2FZ%2B8JdnMo3mp%2FV%2Bwz1VHLnjQJZqum4fwY0M%3D&reserve= d=3D0 >>>>>>>> List Guidelines: >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D04%7C= 01%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d1572= 7340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d= 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&= amp;sdata=3DEv2Z09T2ADqE9oxTw%2Bj5rlhrc939Xp4vd7D5j3Sa19M%3D&reserved= =3D0 >>>>>>>> Committee: >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgurtovo= y%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7db39e= fd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w= LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3Dw= %2FY1e4h4QFkGQg8PQRbZKIC4FhjSYE9%2FU9l4mX7E%2Fq4%3D&reserved=3D0 >>>>>>>> Join OASIS: >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.com%= 7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7db39efd9ccc17a%7C0%7= C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo= iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DdWHo4Wz8oQ9o7VZD= ui%2Fsdg9iQbrFi1syTJuBDULZ%2BWs%3D&reserved=3D0 >>>>>>>> >>>> This publicly archived list offers a means to provide input to the >>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>> >>>> In order to verify user consent to the Feedback License terms and >>>> to minimize spam in the list archive, subscription is required >>>> before posting. >>>> >>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>> List help: virtio-comment-help@lists.oasis-open.org >>>> List archive: >>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fli= sts.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C01%7Cmgur= tovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7d= b39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi= MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata= =3DLMnn0oZKj%2Bf2kE9pVeC2uSfROFFSawi6MJYUHmclJb0%3D&reserved=3D0 >>>> Feedback License: >>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fww= w.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7C01%7Cmg= urtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b= 7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIj= oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sda= ta=3DZ7M42HYu%2FZ%2B8JdnMo3mp%2FV%2Bwz1VHLnjQJZqum4fwY0M%3D&reserved=3D= 0 >>>> List Guidelines: >>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fww= w.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D04%7C01%7= Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340= c1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJ= WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&= sdata=3DEv2Z09T2ADqE9oxTw%2Bj5rlhrc939Xp4vd7D5j3Sa19M%3D&reserved=3D0 >>>> Committee: >>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fww= w.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgurtovoy%40= nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7db39efd9c= cc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw= MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3Dw%2FY= 1e4h4QFkGQg8PQRbZKIC4FhjSYE9%2FU9l4mX7E%2Fq4%3D&reserved=3D0 >>>> Join OASIS: >>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fww= w.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7Cf3= 1455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7= C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2l= uMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DdWHo4Wz8oQ9o7VZDui%2= Fsdg9iQbrFi1syTJuBDULZ%2BWs%3D&reserved=3D0 >>>> >> This publicly archived list offers a means to provide input to the >> OASIS Virtual I/O Device (VIRTIO) TC. >> >> In order to verify user consent to the Feedback License terms and >> to minimize spam in the list archive, subscription is required >> before posting. >> >> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >> List help: virtio-comment-help@lists.oasis-open.org >> List archive: https://nam11.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04= %7C01%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d1= 5727340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZs= b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C10= 00&sdata=3DLMnn0oZKj%2Bf2kE9pVeC2uSfROFFSawi6MJYUHmclJb0%3D&reserve= d=3D0 >> Feedback License: https://nam11.safelinks.protection.outlook.com/?url=3D= https%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&dat= a=3D04%7C01%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C4= 3083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTW= FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3= D%7C1000&sdata=3DZ7M42HYu%2FZ%2B8JdnMo3mp%2FV%2Bwz1VHLnjQJZqum4fwY0M%3D= &reserved=3D0 >> List Guidelines: https://nam11.safelinks.protection.outlook.com/?url=3Dh= ttps%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&d= ata=3D04%7C01%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7= C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7C= TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0= %3D%7C1000&sdata=3DEv2Z09T2ADqE9oxTw%2Bj5rlhrc939Xp4vd7D5j3Sa19M%3D&= ;reserved=3D0 >> Committee: https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3= A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgu= rtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7= db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjo= iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdat= a=3Dw%2FY1e4h4QFkGQg8PQRbZKIC4FhjSYE9%2FU9l4mX7E%2Fq4%3D&reserved=3D0 >> Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%= 3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidi= a.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7db39efd9ccc17a= %7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL= CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DdWHo4Wz8oQ= 9o7VZDui%2Fsdg9iQbrFi1syTJuBDULZ%2BWs%3D&reserved=3D0 >> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> <0d06c26e-f1e7-3cac-a017-059e8985bb44@redhat.com> <74151019-6f78-2bff-5b0a-b5a4da814787@nvidia.com> <41fbd78a-f1d8-9056-3929-1e7b6b57a49b@nvidia.com> In-Reply-To: <41fbd78a-f1d8-9056-3929-1e7b6b57a49b@nvidia.com> From: Jason Wang Date: Mon, 23 Aug 2021 11:10:09 +0800 Message-ID: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable To: Max Gurtovoy Cc: "Dr. David Alan Gilbert" , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Sun, Aug 22, 2021 at 6:05 PM Max Gurtovoy wrote: > > > On 8/20/2021 2:16 PM, Jason Wang wrote: > > On Fri, Aug 20, 2021 at 6:26 PM Max Gurtovoy wro= te: > >> > >> On 8/20/2021 5:24 AM, Jason Wang wrote: > >>> =E5=9C=A8 2021/8/19 =E4=B8=8B=E5=8D=8811:20, Max Gurtovoy =E5=86=99= =E9=81=93: > >>>> On 8/19/2021 5:24 PM, Dr. David Alan Gilbert wrote: > >>>>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: > >>>>>> On 8/19/2021 2:12 PM, Dr. David Alan Gilbert wrote: > >>>>>>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: > >>>>>>>> On 8/18/2021 1:46 PM, Jason Wang wrote: > >>>>>>>>> On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy > >>>>>>>>> wrote: > >>>>>>>>>> On 8/17/2021 12:44 PM, Jason Wang wrote: > >>>>>>>>>>> On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy > >>>>>>>>>>> wrote: > >>>>>>>>>>>> On 8/17/2021 11:51 AM, Jason Wang wrote: > >>>>>>>>>>>>> =E5=9C=A8 2021/8/12 =E4=B8=8B=E5=8D=888:08, Max Gurtovoy = =E5=86=99=E9=81=93: > >>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Live migration is one of the most important features of > >>>>>>>>>>>>>> virtualization and virtio devices are oftenly found in vir= tual > >>>>>>>>>>>>>> environments. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The migration process is managed by a migration SW that is > >>>>>>>>>>>>>> running on > >>>>>>>>>>>>>> the hypervisor and the VM is not aware of the process at a= ll. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Unlike the vDPA case, a real pci Virtual Function state > >>>>>>>>>>>>>> resides in > >>>>>>>>>>>>>> the HW. > >>>>>>>>>>>>>> > >>>>>>>>>>>>> vDPA doesn't prevent you from having HW states. Actually > >>>>>>>>>>>>> from the view > >>>>>>>>>>>>> of the VMM(Qemu), it doesn't care whether or not a state is > >>>>>>>>>>>>> stored in > >>>>>>>>>>>>> the software or hardware. A well designed VMM should be abl= e > >>>>>>>>>>>>> to hide > >>>>>>>>>>>>> the virtio device implementation from the migration layer, > >>>>>>>>>>>>> that is how > >>>>>>>>>>>>> Qemu is wrote who doesn't care about whether or not it's a > >>>>>>>>>>>>> software > >>>>>>>>>>>>> virtio/vDPA device or not. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> In our vision, in order to fulfil the Live migration > >>>>>>>>>>>>>> requirements for > >>>>>>>>>>>>>> virtual functions, each physical function device must > >>>>>>>>>>>>>> implement > >>>>>>>>>>>>>> migration operations. Using these operations, it will be > >>>>>>>>>>>>>> able to > >>>>>>>>>>>>>> master the migration process for the virtual function > >>>>>>>>>>>>>> devices. Each > >>>>>>>>>>>>>> capable physical function device has a supervisor > >>>>>>>>>>>>>> permissions to > >>>>>>>>>>>>>> change the virtual function operational states, > >>>>>>>>>>>>>> save/restore its > >>>>>>>>>>>>>> internal state and start/stop dirty pages tracking. > >>>>>>>>>>>>>> > >>>>>>>>>>>>> For "supervisor permissions", is this from the software > >>>>>>>>>>>>> point of view? > >>>>>>>>>>>>> Maybe it's better to give an example for this. > >>>>>>>>>>>> A permission to a PF device for quiesce and freeze a VF > >>>>>>>>>>>> device for example. > >>>>>>>>>>> Note that for safety, VMM (e.g Qemu) is usually running > >>>>>>>>>>> without any privileges. > >>>>>>>>>> You're mixing layers here. > >>>>>>>>>> > >>>>>>>>>> QEMU is not involved here. It's only sending IOCTLs to > >>>>>>>>>> migration driver. > >>>>>>>>>> The migration driver will control the migration process of the > >>>>>>>>>> VF using > >>>>>>>>>> the PF communication channel. > >>>>>>>>> So who will be granted the "permission" you mentioned here? > >>>>>>>> This is just an expression. > >>>>>>>> > >>>>>>>> What is not clear ? > >>>>>>>> > >>>>>>>> The PF device will have an option to quiesce/freeze the VF devic= e. > >>>>>>>> > >>>>>>>> This is simple. Why are you looking for some sophisticated > >>>>>>>> problems ? > >>>>>>> I'm trying to follow along here and have not completely; but I > >>>>>>> think the issue is a > >>>>>>> security separation one. > >>>>>>> The VMM (e.g. qemu) that has been given access to one of the VF's= is > >>>>>>> isolated and shouldn't be able to go poking at other devices; so = it > >>>>>>> can't go poking at the PF (it probably doesn't even have the PF > >>>>>>> device > >>>>>>> node accessible) - so then the question is who has access to the > >>>>>>> migration driver and how do you make sure it can only deal with V= F's > >>>>>>> that it's supposed to be able to migrate. > >>>>>> The QEMU/userspace doesn't know or care about the PF connection an= d > >>>>>> internal > >>>>>> virtio_vfio_pci driver implementation. > >>>>> OK > >>>>> > >>>>>> You shouldn't change 1 line of code in the VM driver nor in QEMU. > >>>>> Hmm OK. > >>>>> > >>>>>> QEMU does not have access to the PF. Only the kernel driver that > >>>>>> has access > >>>>>> to the VF will have access to the PF communication channel. There > >>>>>> is no > >>>>>> permission problem here. > >>>>>> > >>>>>> The kernel driver of the VF will do this internally, and make sure > >>>>>> that the > >>>>>> commands it build will only impact the VF originating them. > >>>>>> > >>>>> Now that confuses me; isn't the kernel driver that has access to th= e VF > >>>>> running inside the guest? If it's inside the guest we can't trust > >>>>> it to > >>>>> do anything about stopping impact to other devices. > >>>> No. The driver is in the hypervisor (virtio_vfio_pci). This is the > >>>> migration driver, right ? > >>> > >>> Well, talking things like virtio_vfio_pci that is not mentioned befor= e > >>> and not justified on the list may easily confuse people. As pointed > >>> out in another thread, it has too many disadvantages over the existin= g > >>> virtio-pci vdpa driver. And it just duplicates a partial function of > >>> what virtio-pci vdpa driver can do. I don't think we will go that way= . > >> This was just an example for David to help with understanding the > >> solution since he thought that the guest drivers somehow should be cha= nged. > >> > >> David I'm sorry if I confused you. > >> > >> Again Jason, you try to propose your vDPA solution that is not what > >> we're trying to achieve in this work. Think of a world without vDPA. > > Well, I'd say, let's think vDPA a superset of virtio, not just the > > acceleration technologies. > > I'm sorry but vDPA is not relevant to this discussion. Well, it's you that mention the software things like VFIO first. > > Anyhow, I don't see any problem for vDPA driver to work on top of the > design proposed here. > > >> Also I don't understand how vDPA is related to virtio specification > >> decisions ? > > So how is VFIO related to virtio specific decisions? That's why I > > think we should avoid talking about software architecture here. It's > > the wrong community. > > VFIO is not related to virtio spec. Of course. > > It was an example for David. What is the problem with giving examples to > ease on people to understand the solution ? I don't think your example ease the understanding. > > Where did you see that the design is referring to VFIO ? > > > > >> make vDPA into virtio and then we can open a discussion. > >> > >> I'm interesting in virtio migration of HW devices. > >> > >> The proposal in this thread is actually get support from Michal AFAIU > >> and also others were happy with. All beside of you. > > So I think I've clairfied my several times :( > > > > - I'm fairly ok with the proposal > > It doesn't seems like that. > > > - but we decouple the basic facility out of the admin virtqueue and > > this seems agreed by Michael: > > > > Let's take the dirty page tracking as an example: > > > > 1) let's first define that as one of the basic facility > > 2) then we can introduce admin virtqueue or other stuffs as an > > interface for that facility > > > > Does this work for you? > > What I really want is to agree that the right way to manage migration > process of a virtio VF. My proposal is doing so by creating a > communication channel in its parent PF. It looks to me you never answer the question "why it must be done by PF". All the functions provided by PF so far for software is not expected to be used by VMM like Qemu. Those functions usually requires capability or privileges for the management software to use. You mentioned things like "supervisor" and "permission", but it looks to me you are still unaware how it connect to the security stuffs. > > I think I got a confirmation here. > > This communication channel is not introduced in this thread, but > obviously it should be an adminq. Let me clarify. What I want to say is admin should be one of the possible channels. > > For your future scalable functions, the Parent Device (lets call it PD) > will manage the creation/migration/destruction process for its Virtual > Devices (lets call them VDs) using the PD adminq. > > Agreed ? They are two different set of functions: - provisioning/creation/destruction: requires privilege and we don't have any plan to expose them to the guest. It should be done via PF or PD for security as you mentioned above. - migration: doesn't require privilege, and it can be expose to the guest, if can be done in either PF or VF. To me using VF is much more natural, but using PF is also fine. An exception for the migration is the dirty page tracking, without DMA isolation, we may end up with security issue if we do that in the VF. > > Please don't answer that this is not a "must". This is my proposal. If > you have another proposal, please propose. Well, you are asking for the comments instead of enforcing things right? And it's as simple as: 1) introduce admin virtqueue, and bind migration features to admin virtqueu= e or 2) introduce migration features and admin virtqueue independently What's the problem of do trivial modifications like 2)? Is that conflict with your proposal? > > > > >> We do it in mlx5 and we didn't see any issues with that design. > >> > > If we seperate things as I suggested, I'm totally fine. > > separate what ? > > Why should I create different interfaces for different management tasks. I don't say you need to create different interfaces. It's for future extens= ions: 1) When VIRTIO_F_ADMIN_VQ is negotiated, the interface is admin virtqueue 2) When other features is negotiated, the interface is other. In order to make 2) work, we need introduce migration and admin virtqueue separately. Migration is not management task which doesn't require any privilege. > > I have a virtual/scalable device that I want to refer to from the > physical/parent device using some interface. > > This interface is adminq. This interface will be used for dirty_page > tracking and operational state changing and get/set internal state as > well. And more (create/destroy SF for example). > > You can think of this in some other way, i'm fine with it. As long as > the final conclusion is the same. > > > > >> I don't think you can say that we "go that way". > > For "go that way" I meant the method of using vfio_virtio_pci, it has > > nothing related to the discussion of "using PF to control VF" on the > > spec. > > This was an example. Please leave it as an example for David. > > > >> You're trying to build a complementary solution for creating scalable > >> functions and for some reason trying to sabotage NVIDIA efforts to add > >> new important functionality to virtio. > > Well, it's a completely different topic. And it doesn't conflict with > > anything that is proposed here by you. I think I've stated this > > several times. I don't think we block each other, it's just some > > unification work if one of the proposals is merged first. I sent them > > recently because it will be used as a material for my talk on the KVM > > Forum which is really near. > > In theory you're right. We shouldn't block each other, and I don't block > you. But for some reason I see that you do try to block my proposal and > I don't understand why. I don't want to block your proposal, let's decouple the migration feature out of admin virtqueue. Then it's fine. The problem I see is that, you tend to refuse such a trivial but beneficial change. That's what I don't understand. > > I feel like I wasted 2 months on a discussion instead of progressing. Well, I'm not sure 2 months is short, but it's usually take more than a year for huge project in Linux. Patience may help us to understand the points of each other better. > > But now I do see a progress. A PF to manage VF migration is the way to > go forward. > > And the following RFC will take this into consideration. > > > > >> This also sabotage the evolvment of virtio as a standard. > >> > >> You're trying to enforce some un-finished idea that should work on som= e > >> future specific HW platform instead of helping defining a good spec fo= r > >> virtio. > > Let's open another thread for this if you wish, it has nothing related > > to the spec but how it is implemented in Linux. If you search the > > archive, something similar to "vfio_virtio_pci" has been proposed > > several years before by Intel. The idea has been rejected, and we have > > leveraged Linux vDPA bus for virtio-pci devices. > > I don't know this history. And I will happy to hear about it one day. > > But for our discussion in Linux, virtio_vfio_pci will happen. And it > will implement the migration logic of a virtio device with PCI transport > for VFs using the PF admin queue. > > We at NVIDIA, currently upstreaming (alongside with AlexW and Cornelia) > a vfio-pci separation that will enable an easy creation of vfio-pci > vendor/protocol drivers to do some specific tasks. > > New drivers such as mlx5_vfio_pci, hns_vfio_pci, virtio_vfio_pci and > nvme_vfio_pci should be implemented in the near future in Linux to > enable migration of these devices. > > This is just an example. And it's not related to the spec nor the > proposal at all. Let's move those discussions to the right list. I'm pretty sure there will a long debate there. Please prepare for that. > > > > >> And all is for having users to choose vDPA framework instead of using > >> plain virtio. > >> > >> We believe in our solution and we have a working prototype. We'll > >> continue with our discussion to convince the community with it. > > Again, it looks like there's a lot of misunderstanding. Let's open a > > thread on the suitable list instead of talking about any specific > > software solution or architecture here. This will speed up things. > > I prefer to finish the specification first. SW arch is clear for us in > Linux. We did it already for mlx5 devices and it will be the same for > virtio if the spec changes will be accepted. I disagree, but let's separate software discussion out of the spec discussion here. Thanks > > Thanks. > > > > > > Thanks > > > >> Thanks. > >> > >>> Thanks > >>> > >>> > >>>> The guest is running as usual. It doesn't aware on the migration at = all. > >>>> > >>>> This is the point I try to make here. I don't (and I can't) change > >>>> even 1 line of code in the guest. > >>>> > >>>> e.g: > >>>> > >>>> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor > >>>> (bounded to VF5) --> send admin command on PF adminq to start > >>>> tracking dirty pages for VF5 --> PF device will do it > >>>> > >>>> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor > >>>> (bounded to VF5) --> send admin command on PF adminq to quiesce VF5 > >>>> --> PF device will do it > >>>> > >>>> You can take a look how we implement mlx5_vfio_pci in the link I > >>>> provided. > >>>> > >>>>> Dave > >>>>> > >>>>> > >>>>>> We already do this in mlx5 NIC migration. The kernel is secured an= d > >>>>>> QEMU > >>>>>> interface is the VF. > >>>>>> > >>>>>>> Dave > >>>>>>> > >>>>>>>>>>>>>> An example of this approach can be seen in the way NVIDIA > >>>>>>>>>>>>>> performs > >>>>>>>>>>>>>> live migration of a ConnectX NIC function: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> NVIDIAs SNAP technology enables hardware-accelerated > >>>>>>>>>>>>>> software defined > >>>>>>>>>>>>>> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used fo= r > >>>>>>>>>>>>>> storage > >>>>>>>>>>>>>> and networking solutions. The host OS/hypervisor uses its > >>>>>>>>>>>>>> standard > >>>>>>>>>>>>>> drivers that are implemented according to a well-known VIR= TIO > >>>>>>>>>>>>>> specifications. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> In order to implement Live Migration for these virtual > >>>>>>>>>>>>>> function > >>>>>>>>>>>>>> devices, that use a standard drivers as mentioned, the > >>>>>>>>>>>>>> specification > >>>>>>>>>>>>>> should define how HW vendor should build their devices and > >>>>>>>>>>>>>> for SW > >>>>>>>>>>>>>> developers to adjust the drivers. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This will enable specification compliant vendor agnostic > >>>>>>>>>>>>>> solution. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This is exactly how we built the migration driver for Conn= ectX > >>>>>>>>>>>>>> (internal HW design doc) and I guess that this is the way > >>>>>>>>>>>>>> other > >>>>>>>>>>>>>> vendors work. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> For that, I would like to know if the approach of =E2=80= =9CPF that > >>>>>>>>>>>>>> controls > >>>>>>>>>>>>>> the VF live migration process=E2=80=9D is acceptable by th= e VIRTIO > >>>>>>>>>>>>>> technical > >>>>>>>>>>>>>> group ? > >>>>>>>>>>>>>> > >>>>>>>>>>>>> I'm not sure but I think it's better to start from the gene= ral > >>>>>>>>>>>>> facility for all transports, then develop features for a > >>>>>>>>>>>>> specific > >>>>>>>>>>>>> transport. > >>>>>>>>>>>> a general facility for all transports can be a generic admin > >>>>>>>>>>>> queue ? > >>>>>>>>>>> It could be a virtqueue or a transport specific method (pcie > >>>>>>>>>>> capability). > >>>>>>>>>> No. You said a general facility for all transports. > >>>>>>>>> For general facility, I mean the chapter 2 of the spec which is > >>>>>>>>> general > >>>>>>>>> > >>>>>>>>> " > >>>>>>>>> 2 Basic Facilities of a Virtio Device > >>>>>>>>> " > >>>>>>>>> > >>>>>>>> It will be in chapter 2. Right after "2.11 Exporting Object" I > >>>>>>>> can add "2.12 > >>>>>>>> Admin Virtqueues" and this is what I did in the RFC. > >>>>>>>> > >>>>>>>>>> Transport specific is not general. > >>>>>>>>> The transport is in charge of implementing the interface for > >>>>>>>>> those facilities. > >>>>>>>> Transport specific is not general. > >>>>>>>> > >>>>>>>> > >>>>>>>>>>> E.g we can define what needs to be migrated for the virtio-bl= k > >>>>>>>>>>> first > >>>>>>>>>>> (the device state). Then we can define the interface to get > >>>>>>>>>>> and set > >>>>>>>>>>> those states via admin virtqueue. Such decoupling may ease th= e > >>>>>>>>>>> future > >>>>>>>>>>> development of the transport specific migration interface. > >>>>>>>>>> I asked a simple question here. > >>>>>>>>>> > >>>>>>>>>> Lets stick to this. > >>>>>>>>> I answered this question. > >>>>>>>> No you didn't answer. > >>>>>>>> > >>>>>>>> I asked if the approach of =E2=80=9CPF that controls the VF liv= e > >>>>>>>> migration process=E2=80=9D > >>>>>>>> is acceptable by the VIRTIO technical group ? > >>>>>>>> > >>>>>>>> And you take the discussion to your direction instead of > >>>>>>>> answering a Yes/No > >>>>>>>> question. > >>>>>>>> > >>>>>>>>> The virtqueue could be one of the > >>>>>>>>> approaches. And it's your responsibility to convince the commun= ity > >>>>>>>>> about that approach. Having an example may help people to > >>>>>>>>> understand > >>>>>>>>> your proposal. > >>>>>>>>> > >>>>>>>>>> I'm not referring to internal state definitions. > >>>>>>>>> Without an example, how do we know if it can work well? > >>>>>>>>> > >>>>>>>>>> Can you please not change the subject of my initial intent in > >>>>>>>>>> the email ? > >>>>>>>>> Did I? Basically, I'm asking how a virtio-blk can be migrated w= ith > >>>>>>>>> your proposal. > >>>>>>>> The virtio-blk PF admin queue will be used to manage the > >>>>>>>> virtio-blk VF > >>>>>>>> migration. > >>>>>>>> > >>>>>>>> This is the whole discussion. I don't want to get into resolutio= n. > >>>>>>>> > >>>>>>>> Since you already know the answer as I published 4 RFCs already > >>>>>>>> with all the > >>>>>>>> flow. > >>>>>>>> > >>>>>>>> Lets stick to my question. > >>>>>>>> > >>>>>>>>> Thanks > >>>>>>>>> > >>>>>>>>>> Thanks. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> Thanks > >>>>>>>>>>> > >>>>>>>>>>>>> Thanks > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -Max. > >>>>>>>>>>>>>> > >>>>>>>>>>>> This publicly archived list offers a means to provide input > >>>>>>>>>>>> to the > >>>>>>>>>>>> OASIS Virtual I/O Device (VIRTIO) TC. > >>>>>>>>>>>> > >>>>>>>>>>>> In order to verify user consent to the Feedback License term= s > >>>>>>>>>>>> and > >>>>>>>>>>>> to minimize spam in the list archive, subscription is requir= ed > >>>>>>>>>>>> before posting. > >>>>>>>>>>>> > >>>>>>>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org > >>>>>>>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > >>>>>>>>>>>> List help: virtio-comment-help@lists.oasis-open.org > >>>>>>>>>>>> List archive: > >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%= 3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7= C01%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d157= 27340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011068512%7CUnknown%7CTWFpbGZsb3= d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000= &sdata=3DZOtYROgA9trASrl6vZpAgWKqd69wSO%2FIIehl%2F%2FV9Ji0%3D&reser= ved=3D0 > >>>>>>>>>>>> Feedback License: > >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%= 3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04= %7C01%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d1= 5727340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011068512%7CUnknown%7CTWFpbGZs= b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C10= 00&sdata=3DRdR%2Bs7zilTU%2BwJii1XwwpObuMlBr0ThWy4ur0TacYYo%3D&reser= ved=3D0 > >>>>>>>>>>>> List Guidelines: > >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%= 3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data= =3D04%7C01%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43= 083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011068512%7CUnknown%7CTWF= pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D= %7C1000&sdata=3Dz2bXjUR8N0unB5xCki960drEzcAOM0oM6ULLh58B%2Bw0%3D&re= served=3D0 > >>>>>>>>>>>> Committee: > >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%= 3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmg= urtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b= 7db39efd9ccc17a%7C0%7C0%7C637650550011068512%7CUnknown%7CTWFpbGZsb3d8eyJWIj= oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sda= ta=3D%2BMvgIFetSa3nZiMQ9NQNwQ23nf61tfoHF%2F7ZK3%2BjtVA%3D&reserved=3D0 > >>>>>>>>>>>> Join OASIS: > >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%= 3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidi= a.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7db39efd9ccc17a= %7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL= CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DdWHo4Wz8oQ= 9o7VZDui%2Fsdg9iQbrFi1syTJuBDULZ%2BWs%3D&reserved=3D0 > >>>>>>>>>>>> > >>>>>>>> This publicly archived list offers a means to provide input to t= he > >>>>>>>> OASIS Virtual I/O Device (VIRTIO) TC. > >>>>>>>> > >>>>>>>> In order to verify user consent to the Feedback License terms an= d > >>>>>>>> to minimize spam in the list archive, subscription is required > >>>>>>>> before posting. > >>>>>>>> > >>>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org > >>>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > >>>>>>>> List help: virtio-comment-help@lists.oasis-open.org > >>>>>>>> List archive: > >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C01%= 7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d1572734= 0c1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8ey= JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&= ;sdata=3DLMnn0oZKj%2Bf2kE9pVeC2uSfROFFSawi6MJYUHmclJb0%3D&reserved=3D0 > >>>>>>>> Feedback License: > >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7C0= 1%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727= 340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8= eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&a= mp;sdata=3DZ7M42HYu%2FZ%2B8JdnMo3mp%2FV%2Bwz1VHLnjQJZqum4fwY0M%3D&reser= ved=3D0 > >>>>>>>> List Guidelines: > >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D04%= 7C01%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15= 727340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb= 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C100= 0&sdata=3DEv2Z09T2ADqE9oxTw%2Bj5rlhrc939Xp4vd7D5j3Sa19M%3D&reserved= =3D0 > >>>>>>>> Committee: > >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgurto= voy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7db3= 9efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC= 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata= =3Dw%2FY1e4h4QFkGQg8PQRbZKIC4FhjSYE9%2FU9l4mX7E%2Fq4%3D&reserved=3D0 > >>>>>>>> Join OASIS: > >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.co= m%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7db39efd9ccc17a%7C0= %7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI= joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DdWHo4Wz8oQ9o7V= ZDui%2Fsdg9iQbrFi1syTJuBDULZ%2BWs%3D&reserved=3D0 > >>>>>>>> > >>>> This publicly archived list offers a means to provide input to the > >>>> OASIS Virtual I/O Device (VIRTIO) TC. > >>>> > >>>> In order to verify user consent to the Feedback License terms and > >>>> to minimize spam in the list archive, subscription is required > >>>> before posting. > >>>> > >>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org > >>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > >>>> List help: virtio-comment-help@lists.oasis-open.org > >>>> List archive: > >>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= lists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C01%7Cmg= urtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b= 7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIj= oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sda= ta=3DLMnn0oZKj%2Bf2kE9pVeC2uSfROFFSawi6MJYUHmclJb0%3D&reserved=3D0 > >>>> Feedback License: > >>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= www.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7C01%7C= mgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c= 1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJW= IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&s= data=3DZ7M42HYu%2FZ%2B8JdnMo3mp%2FV%2Bwz1VHLnjQJZqum4fwY0M%3D&reserved= =3D0 > >>>> List Guidelines: > >>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= www.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D04%7C01= %7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d157273= 40c1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8e= yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&am= p;sdata=3DEv2Z09T2ADqE9oxTw%2Bj5rlhrc939Xp4vd7D5j3Sa19M%3D&reserved=3D0 > >>>> Committee: > >>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= www.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgurtovoy%= 40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7db39efd= 9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj= AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3Dw%2= FY1e4h4QFkGQg8PQRbZKIC4FhjSYE9%2FU9l4mX7E%2Fq4%3D&reserved=3D0 > >>>> Join OASIS: > >>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= www.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C= f31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0= %7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV= 2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DdWHo4Wz8oQ9o7VZDui= %2Fsdg9iQbrFi1syTJuBDULZ%2BWs%3D&reserved=3D0 > >>>> > >> This publicly archived list offers a means to provide input to the > >> OASIS Virtual I/O Device (VIRTIO) TC. > >> > >> In order to verify user consent to the Feedback License terms and > >> to minimize spam in the list archive, subscription is required > >> before posting. > >> > >> Subscribe: virtio-comment-subscribe@lists.oasis-open.org > >> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > >> List help: virtio-comment-help@lists.oasis-open.org > >> List archive: https://nam11.safelinks.protection.outlook.com/?url=3Dht= tps%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D= 04%7C01%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083= d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbG= Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C= 1000&sdata=3DLMnn0oZKj%2Bf2kE9pVeC2uSfROFFSawi6MJYUHmclJb0%3D&reser= ved=3D0 > >> Feedback License: https://nam11.safelinks.protection.outlook.com/?url= =3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&= data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%= 7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7= CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn= 0%3D%7C1000&sdata=3DZ7M42HYu%2FZ%2B8JdnMo3mp%2FV%2Bwz1VHLnjQJZqum4fwY0M= %3D&reserved=3D0 > >> List Guidelines: https://nam11.safelinks.protection.outlook.com/?url= =3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&a= mp;data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfa= f7%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknow= n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI= 6Mn0%3D%7C1000&sdata=3DEv2Z09T2ADqE9oxTw%2Bj5rlhrc939Xp4vd7D5j3Sa19M%3D= &reserved=3D0 > >> Committee: https://nam11.safelinks.protection.outlook.com/?url=3Dhttps= %3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cm= gurtovoy%40nvidia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1= b7db39efd9ccc17a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWI= joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sd= ata=3Dw%2FY1e4h4QFkGQg8PQRbZKIC4FhjSYE9%2FU9l4mX7E%2Fq4%3D&reserved=3D0 > >> Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvi= dia.com%7Cf31455a4a77448afbf4208d963cbfaf7%7C43083d15727340c1b7db39efd9ccc1= 7a%7C0%7C0%7C637650550011078463%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDA= iLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DdWHo4Wz8= oQ9o7VZDui%2Fsdg9iQbrFi1syTJuBDULZ%2BWs%3D&reserved=3D0 > >> > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> <0d06c26e-f1e7-3cac-a017-059e8985bb44@redhat.com> <74151019-6f78-2bff-5b0a-b5a4da814787@nvidia.com> <41fbd78a-f1d8-9056-3929-1e7b6b57a49b@nvidia.com> From: Max Gurtovoy Message-ID: <0252a058-f3d2-db34-08a0-02c3cdd0e0bb@nvidia.com> Date: Mon, 23 Aug 2021 11:55:22 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable Content-Language: en-US To: Jason Wang Cc: "Dr. David Alan Gilbert" , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On 8/23/2021 6:10 AM, Jason Wang wrote: > On Sun, Aug 22, 2021 at 6:05 PM Max Gurtovoy wrote= : >> >> On 8/20/2021 2:16 PM, Jason Wang wrote: >>> On Fri, Aug 20, 2021 at 6:26 PM Max Gurtovoy wro= te: >>>> On 8/20/2021 5:24 AM, Jason Wang wrote: >>>>> =E5=9C=A8 2021/8/19 =E4=B8=8B=E5=8D=8811:20, Max Gurtovoy =E5=86=99= =E9=81=93: >>>>>> On 8/19/2021 5:24 PM, Dr. David Alan Gilbert wrote: >>>>>>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: >>>>>>>> On 8/19/2021 2:12 PM, Dr. David Alan Gilbert wrote: >>>>>>>>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: >>>>>>>>>> On 8/18/2021 1:46 PM, Jason Wang wrote: >>>>>>>>>>> On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy >>>>>>>>>>> wrote: >>>>>>>>>>>> On 8/17/2021 12:44 PM, Jason Wang wrote: >>>>>>>>>>>>> On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> On 8/17/2021 11:51 AM, Jason Wang wrote: >>>>>>>>>>>>>>> =E5=9C=A8 2021/8/12 =E4=B8=8B=E5=8D=888:08, Max Gurtovoy = =E5=86=99=E9=81=93: >>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Live migration is one of the most important features of >>>>>>>>>>>>>>>> virtualization and virtio devices are oftenly found in vir= tual >>>>>>>>>>>>>>>> environments. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The migration process is managed by a migration SW that is >>>>>>>>>>>>>>>> running on >>>>>>>>>>>>>>>> the hypervisor and the VM is not aware of the process at a= ll. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Unlike the vDPA case, a real pci Virtual Function state >>>>>>>>>>>>>>>> resides in >>>>>>>>>>>>>>>> the HW. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> vDPA doesn't prevent you from having HW states. Actually >>>>>>>>>>>>>>> from the view >>>>>>>>>>>>>>> of the VMM(Qemu), it doesn't care whether or not a state is >>>>>>>>>>>>>>> stored in >>>>>>>>>>>>>>> the software or hardware. A well designed VMM should be abl= e >>>>>>>>>>>>>>> to hide >>>>>>>>>>>>>>> the virtio device implementation from the migration layer, >>>>>>>>>>>>>>> that is how >>>>>>>>>>>>>>> Qemu is wrote who doesn't care about whether or not it's a >>>>>>>>>>>>>>> software >>>>>>>>>>>>>>> virtio/vDPA device or not. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In our vision, in order to fulfil the Live migration >>>>>>>>>>>>>>>> requirements for >>>>>>>>>>>>>>>> virtual functions, each physical function device must >>>>>>>>>>>>>>>> implement >>>>>>>>>>>>>>>> migration operations. Using these operations, it will be >>>>>>>>>>>>>>>> able to >>>>>>>>>>>>>>>> master the migration process for the virtual function >>>>>>>>>>>>>>>> devices. Each >>>>>>>>>>>>>>>> capable physical function device has a supervisor >>>>>>>>>>>>>>>> permissions to >>>>>>>>>>>>>>>> change the virtual function operational states, >>>>>>>>>>>>>>>> save/restore its >>>>>>>>>>>>>>>> internal state and start/stop dirty pages tracking. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For "supervisor permissions", is this from the software >>>>>>>>>>>>>>> point of view? >>>>>>>>>>>>>>> Maybe it's better to give an example for this. >>>>>>>>>>>>>> A permission to a PF device for quiesce and freeze a VF >>>>>>>>>>>>>> device for example. >>>>>>>>>>>>> Note that for safety, VMM (e.g Qemu) is usually running >>>>>>>>>>>>> without any privileges. >>>>>>>>>>>> You're mixing layers here. >>>>>>>>>>>> >>>>>>>>>>>> QEMU is not involved here. It's only sending IOCTLs to >>>>>>>>>>>> migration driver. >>>>>>>>>>>> The migration driver will control the migration process of the >>>>>>>>>>>> VF using >>>>>>>>>>>> the PF communication channel. >>>>>>>>>>> So who will be granted the "permission" you mentioned here? >>>>>>>>>> This is just an expression. >>>>>>>>>> >>>>>>>>>> What is not clear ? >>>>>>>>>> >>>>>>>>>> The PF device will have an option to quiesce/freeze the VF devic= e. >>>>>>>>>> >>>>>>>>>> This is simple. Why are you looking for some sophisticated >>>>>>>>>> problems ? >>>>>>>>> I'm trying to follow along here and have not completely; but I >>>>>>>>> think the issue is a >>>>>>>>> security separation one. >>>>>>>>> The VMM (e.g. qemu) that has been given access to one of the VF's= is >>>>>>>>> isolated and shouldn't be able to go poking at other devices; so = it >>>>>>>>> can't go poking at the PF (it probably doesn't even have the PF >>>>>>>>> device >>>>>>>>> node accessible) - so then the question is who has access to the >>>>>>>>> migration driver and how do you make sure it can only deal with V= F's >>>>>>>>> that it's supposed to be able to migrate. >>>>>>>> The QEMU/userspace doesn't know or care about the PF connection an= d >>>>>>>> internal >>>>>>>> virtio_vfio_pci driver implementation. >>>>>>> OK >>>>>>> >>>>>>>> You shouldn't change 1 line of code in the VM driver nor in QEMU. >>>>>>> Hmm OK. >>>>>>> >>>>>>>> QEMU does not have access to the PF. Only the kernel driver that >>>>>>>> has access >>>>>>>> to the VF will have access to the PF communication channel. There >>>>>>>> is no >>>>>>>> permission problem here. >>>>>>>> >>>>>>>> The kernel driver of the VF will do this internally, and make sure >>>>>>>> that the >>>>>>>> commands it build will only impact the VF originating them. >>>>>>>> >>>>>>> Now that confuses me; isn't the kernel driver that has access to th= e VF >>>>>>> running inside the guest? If it's inside the guest we can't trust >>>>>>> it to >>>>>>> do anything about stopping impact to other devices. >>>>>> No. The driver is in the hypervisor (virtio_vfio_pci). This is the >>>>>> migration driver, right ? >>>>> Well, talking things like virtio_vfio_pci that is not mentioned befor= e >>>>> and not justified on the list may easily confuse people. As pointed >>>>> out in another thread, it has too many disadvantages over the existin= g >>>>> virtio-pci vdpa driver. And it just duplicates a partial function of >>>>> what virtio-pci vdpa driver can do. I don't think we will go that way= . >>>> This was just an example for David to help with understanding the >>>> solution since he thought that the guest drivers somehow should be cha= nged. >>>> >>>> David I'm sorry if I confused you. >>>> >>>> Again Jason, you try to propose your vDPA solution that is not what >>>> we're trying to achieve in this work. Think of a world without vDPA. >>> Well, I'd say, let's think vDPA a superset of virtio, not just the >>> acceleration technologies. >> I'm sorry but vDPA is not relevant to this discussion. > Well, it's you that mention the software things like VFIO first. > >> Anyhow, I don't see any problem for vDPA driver to work on top of the >> design proposed here. >> >>>> Also I don't understand how vDPA is related to virtio specification >>>> decisions ? >>> So how is VFIO related to virtio specific decisions? That's why I >>> think we should avoid talking about software architecture here. It's >>> the wrong community. >> VFIO is not related to virtio spec. > Of course. > >> It was an example for David. What is the problem with giving examples to >> ease on people to understand the solution ? > I don't think your example ease the understanding. > >> Where did you see that the design is referring to VFIO ? >> >>>> make vDPA into virtio and then we can open a discussion. >>>> >>>> I'm interesting in virtio migration of HW devices. >>>> >>>> The proposal in this thread is actually get support from Michal AFAIU >>>> and also others were happy with. All beside of you. >>> So I think I've clairfied my several times :( >>> >>> - I'm fairly ok with the proposal >> It doesn't seems like that. >> >>> - but we decouple the basic facility out of the admin virtqueue and >>> this seems agreed by Michael: >>> >>> Let's take the dirty page tracking as an example: >>> >>> 1) let's first define that as one of the basic facility >>> 2) then we can introduce admin virtqueue or other stuffs as an >>> interface for that facility >>> >>> Does this work for you? >> What I really want is to agree that the right way to manage migration >> process of a virtio VF. My proposal is doing so by creating a >> communication channel in its parent PF. > It looks to me you never answer the question "why it must be done by PF". This is not relevant question. In our profession you can solve a problem=20 with more than 1 way. We need to find the robust one. > > All the functions provided by PF so far for software is not expected > to be used by VMM like Qemu. Those functions usually requires > capability or privileges for the management software to use. You > mentioned things like "supervisor" and "permission", but it looks to > me you are still unaware how it connect to the security stuffs. I now see that you don't understand at all what I'm proposing here. Maybe you can go back to the questions David asked and read my answers=20 to get better understanding of the solution. > >> I think I got a confirmation here. >> >> This communication channel is not introduced in this thread, but >> obviously it should be an adminq. > Let me clarify. What I want to say is admin should be one of the > possible channels. if you want to fork and create more than 1 way to do things we can check=20 other options. BTW, In the 2019 conference I saw that MST talked about adding LM to the=20 spec and hint that the PF should manage the VF. Adding some non-ready HW platforms consideration, future technologies=20 and hypervisor hacks in the design of virtio LM sounds weird to me. I still don't understand why you can't do all the things you wish doing=20 with simple commands sent via the admin-q and insist of splitting=20 devices and splitting config spaces and bunch of other hacks. Don't you prefer a robust solution to work with any existing platform=20 today ? or do you aim for future solution ? >> For your future scalable functions, the Parent Device (lets call it PD) >> will manage the creation/migration/destruction process for its Virtual >> Devices (lets call them VDs) using the PD adminq. >> >> Agreed ? > They are two different set of functions: > > - provisioning/creation/destruction: requires privilege and we don't > have any plan to expose them to the guest. It should be done via PF or > PD for security as you mentioned above. > - migration: doesn't require privilege, and it can be expose to the > guest, if can be done in either PF or VF. To me using VF is much more > natural, but using PF is also fine. migration exposed to the guest ? No. This is a basic assumption, really. I think this is the problem in the whole discussion. I think all the community agree that guest shouldn't be aware of=20 migration. You must understand this. Once you do, all this process will be easier and we'll progress instead=20 of running in circles. > > An exception for the migration is the dirty page tracking, without DMA > isolation, we may end up with security issue if we do that in the VF. Lets start with basic migration first. In my model the Hypervisor kernel control this. No security issue since=20 the kernel is a secured entity. This is what we do already in our solution for NIC devices. I don't want virtio to be behind. > >> Please don't answer that this is not a "must". This is my proposal. If >> you have another proposal, please propose. > Well, you are asking for the comments instead of enforcing things right? > > And it's as simple as: > > 1) introduce admin virtqueue, and bind migration features to admin virtqu= eue > > or > > 2) introduce migration features and admin virtqueue independently > > What's the problem of do trivial modifications like 2)? Is that > conflict with your proposal? I did #2 already and then you asked me to do #1. If I do #1 you'll ask #2. I'm progressing towards final solution. I got the feedback I need. > >>>> We do it in mlx5 and we didn't see any issues with that design. >>>> >>> If we seperate things as I suggested, I'm totally fine. >> separate what ? >> >> Why should I create different interfaces for different management tasks. > I don't say you need to create different interfaces. It's for future exte= nsions: > > 1) When VIRTIO_F_ADMIN_VQ is negotiated, the interface is admin virtqueue > 2) When other features is negotiated, the interface is other. > > In order to make 2) work, we need introduce migration and admin > virtqueue separately. > > Migration is not management task which doesn't require any privilege. You need to control the operational state of a device, track its dirty=20 pages, save/restore internal HW state. If you think that anyone can do it to a virtio device so lets see this=20 magic works (I believe that only the parent/management device can do it=20 on behalf of the migration software). > >> I have a virtual/scalable device that I want to refer to from the >> physical/parent device using some interface. >> >> This interface is adminq. This interface will be used for dirty_page >> tracking and operational state changing and get/set internal state as >> well. And more (create/destroy SF for example). >> >> You can think of this in some other way, i'm fine with it. As long as >> the final conclusion is the same. >> >>>> I don't think you can say that we "go that way". >>> For "go that way" I meant the method of using vfio_virtio_pci, it has >>> nothing related to the discussion of "using PF to control VF" on the >>> spec. >> This was an example. Please leave it as an example for David. >> >> >>>> You're trying to build a complementary solution for creating scalable >>>> functions and for some reason trying to sabotage NVIDIA efforts to add >>>> new important functionality to virtio. >>> Well, it's a completely different topic. And it doesn't conflict with >>> anything that is proposed here by you. I think I've stated this >>> several times. I don't think we block each other, it's just some >>> unification work if one of the proposals is merged first. I sent them >>> recently because it will be used as a material for my talk on the KVM >>> Forum which is really near. >> In theory you're right. We shouldn't block each other, and I don't block >> you. But for some reason I see that you do try to block my proposal and >> I don't understand why. > I don't want to block your proposal, let's decouple the migration > feature out of admin virtqueue. Then it's fine. > > The problem I see is that, you tend to refuse such a trivial but > beneficial change. That's what I don't understand. I thought I explained it. Nothing keeps you happy. If we A, you ask for=20 B. if we do B you as for A. I continue with the feedback I get from MST. > >> I feel like I wasted 2 months on a discussion instead of progressing. > Well, I'm not sure 2 months is short, but it's usually take more than > a year for huge project in Linux. But if you go in circles it will never end, right ? > > Patience may help us to understand the points of each other better. first I want to agree on the above migration concepts I wrote. If we don't agree on that, the discussion is useless. > >> But now I do see a progress. A PF to manage VF migration is the way to >> go forward. >> >> And the following RFC will take this into consideration. >> >>>> This also sabotage the evolvment of virtio as a standard. >>>> >>>> You're trying to enforce some un-finished idea that should work on som= e >>>> future specific HW platform instead of helping defining a good spec fo= r >>>> virtio. >>> Let's open another thread for this if you wish, it has nothing related >>> to the spec but how it is implemented in Linux. If you search the >>> archive, something similar to "vfio_virtio_pci" has been proposed >>> several years before by Intel. The idea has been rejected, and we have >>> leveraged Linux vDPA bus for virtio-pci devices. >> I don't know this history. And I will happy to hear about it one day. >> >> But for our discussion in Linux, virtio_vfio_pci will happen. And it >> will implement the migration logic of a virtio device with PCI transport >> for VFs using the PF admin queue. >> >> We at NVIDIA, currently upstreaming (alongside with AlexW and Cornelia) >> a vfio-pci separation that will enable an easy creation of vfio-pci >> vendor/protocol drivers to do some specific tasks. >> >> New drivers such as mlx5_vfio_pci, hns_vfio_pci, virtio_vfio_pci and >> nvme_vfio_pci should be implemented in the near future in Linux to >> enable migration of these devices. >> >> This is just an example. And it's not related to the spec nor the >> proposal at all. > Let's move those discussions to the right list. I'm pretty sure there > will a long debate there. Please prepare for that. We already discussed this with AlexW, Cornelia, JasonG, ChristophH and=20 others. And before we have a virtio spec for LM we can't discuss about it in the=20 Linux mailing list. It will waste everyone's time. > >>>> And all is for having users to choose vDPA framework instead of using >>>> plain virtio. >>>> >>>> We believe in our solution and we have a working prototype. We'll >>>> continue with our discussion to convince the community with it. >>> Again, it looks like there's a lot of misunderstanding. Let's open a >>> thread on the suitable list instead of talking about any specific >>> software solution or architecture here. This will speed up things. >> I prefer to finish the specification first. SW arch is clear for us in >> Linux. We did it already for mlx5 devices and it will be the same for >> virtio if the spec changes will be accepted. > I disagree, but let's separate software discussion out of the spec > discussion here. > > Thanks > >> Thanks. >> >> >>> Thanks >>> >>>> Thanks. >>>> >>>>> Thanks >>>>> >>>>> >>>>>> The guest is running as usual. It doesn't aware on the migration at = all. >>>>>> >>>>>> This is the point I try to make here. I don't (and I can't) change >>>>>> even 1 line of code in the guest. >>>>>> >>>>>> e.g: >>>>>> >>>>>> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor >>>>>> (bounded to VF5) --> send admin command on PF adminq to start >>>>>> tracking dirty pages for VF5 --> PF device will do it >>>>>> >>>>>> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor >>>>>> (bounded to VF5) --> send admin command on PF adminq to quiesce VF5 >>>>>> --> PF device will do it >>>>>> >>>>>> You can take a look how we implement mlx5_vfio_pci in the link I >>>>>> provided. >>>>>> >>>>>>> Dave >>>>>>> >>>>>>> >>>>>>>> We already do this in mlx5 NIC migration. The kernel is secured an= d >>>>>>>> QEMU >>>>>>>> interface is the VF. >>>>>>>> >>>>>>>>> Dave >>>>>>>>> >>>>>>>>>>>>>>>> An example of this approach can be seen in the way NVIDIA >>>>>>>>>>>>>>>> performs >>>>>>>>>>>>>>>> live migration of a ConnectX NIC function: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> NVIDIAs SNAP technology enables hardware-accelerated >>>>>>>>>>>>>>>> software defined >>>>>>>>>>>>>>>> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used fo= r >>>>>>>>>>>>>>>> storage >>>>>>>>>>>>>>>> and networking solutions. The host OS/hypervisor uses its >>>>>>>>>>>>>>>> standard >>>>>>>>>>>>>>>> drivers that are implemented according to a well-known VIR= TIO >>>>>>>>>>>>>>>> specifications. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In order to implement Live Migration for these virtual >>>>>>>>>>>>>>>> function >>>>>>>>>>>>>>>> devices, that use a standard drivers as mentioned, the >>>>>>>>>>>>>>>> specification >>>>>>>>>>>>>>>> should define how HW vendor should build their devices and >>>>>>>>>>>>>>>> for SW >>>>>>>>>>>>>>>> developers to adjust the drivers. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This will enable specification compliant vendor agnostic >>>>>>>>>>>>>>>> solution. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This is exactly how we built the migration driver for Conn= ectX >>>>>>>>>>>>>>>> (internal HW design doc) and I guess that this is the way >>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>> vendors work. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For that, I would like to know if the approach of =E2=80= =9CPF that >>>>>>>>>>>>>>>> controls >>>>>>>>>>>>>>>> the VF live migration process=E2=80=9D is acceptable by th= e VIRTIO >>>>>>>>>>>>>>>> technical >>>>>>>>>>>>>>>> group ? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm not sure but I think it's better to start from the gene= ral >>>>>>>>>>>>>>> facility for all transports, then develop features for a >>>>>>>>>>>>>>> specific >>>>>>>>>>>>>>> transport. >>>>>>>>>>>>>> a general facility for all transports can be a generic admin >>>>>>>>>>>>>> queue ? >>>>>>>>>>>>> It could be a virtqueue or a transport specific method (pcie >>>>>>>>>>>>> capability). >>>>>>>>>>>> No. You said a general facility for all transports. >>>>>>>>>>> For general facility, I mean the chapter 2 of the spec which is >>>>>>>>>>> general >>>>>>>>>>> >>>>>>>>>>> " >>>>>>>>>>> 2 Basic Facilities of a Virtio Device >>>>>>>>>>> " >>>>>>>>>>> >>>>>>>>>> It will be in chapter 2. Right after "2.11 Exporting Object" I >>>>>>>>>> can add "2.12 >>>>>>>>>> Admin Virtqueues" and this is what I did in the RFC. >>>>>>>>>> >>>>>>>>>>>> Transport specific is not general. >>>>>>>>>>> The transport is in charge of implementing the interface for >>>>>>>>>>> those facilities. >>>>>>>>>> Transport specific is not general. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> E.g we can define what needs to be migrated for the virtio-bl= k >>>>>>>>>>>>> first >>>>>>>>>>>>> (the device state). Then we can define the interface to get >>>>>>>>>>>>> and set >>>>>>>>>>>>> those states via admin virtqueue. Such decoupling may ease th= e >>>>>>>>>>>>> future >>>>>>>>>>>>> development of the transport specific migration interface. >>>>>>>>>>>> I asked a simple question here. >>>>>>>>>>>> >>>>>>>>>>>> Lets stick to this. >>>>>>>>>>> I answered this question. >>>>>>>>>> No you didn't answer. >>>>>>>>>> >>>>>>>>>> I asked if the approach of =E2=80=9CPF that controls the VF liv= e >>>>>>>>>> migration process=E2=80=9D >>>>>>>>>> is acceptable by the VIRTIO technical group ? >>>>>>>>>> >>>>>>>>>> And you take the discussion to your direction instead of >>>>>>>>>> answering a Yes/No >>>>>>>>>> question. >>>>>>>>>> >>>>>>>>>>> The virtqueue could be one of the >>>>>>>>>>> approaches. And it's your responsibility to convince the commun= ity >>>>>>>>>>> about that approach. Having an example may help people to >>>>>>>>>>> understand >>>>>>>>>>> your proposal. >>>>>>>>>>> >>>>>>>>>>>> I'm not referring to internal state definitions. >>>>>>>>>>> Without an example, how do we know if it can work well? >>>>>>>>>>> >>>>>>>>>>>> Can you please not change the subject of my initial intent in >>>>>>>>>>>> the email ? >>>>>>>>>>> Did I? Basically, I'm asking how a virtio-blk can be migrated w= ith >>>>>>>>>>> your proposal. >>>>>>>>>> The virtio-blk PF admin queue will be used to manage the >>>>>>>>>> virtio-blk VF >>>>>>>>>> migration. >>>>>>>>>> >>>>>>>>>> This is the whole discussion. I don't want to get into resolutio= n. >>>>>>>>>> >>>>>>>>>> Since you already know the answer as I published 4 RFCs already >>>>>>>>>> with all the >>>>>>>>>> flow. >>>>>>>>>> >>>>>>>>>> Lets stick to my question. >>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Max. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> This publicly archived list offers a means to provide input >>>>>>>>>>>>>> to the >>>>>>>>>>>>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In order to verify user consent to the Feedback License term= s >>>>>>>>>>>>>> and >>>>>>>>>>>>>> to minimize spam in the list archive, subscription is requir= ed >>>>>>>>>>>>>> before posting. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>>>>>>>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>>>>>>>>>>>> List help: virtio-comment-help@lists.oasis-open.org >>>>>>>>>>>>>> List archive: >>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%= 3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7= C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d157= 27340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3= d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000= &sdata=3D1%2FZMUIr0f%2FfEgDjB8MQ9a2lHiXr4SkLqSG44r6kgJeQ%3D&reserve= d=3D0 >>>>>>>>>>>>>> Feedback License: >>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%= 3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04= %7C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d1= 5727340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZs= b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C10= 00&sdata=3Dm%2BnMdp9ZA%2BqBU8PzC8HWJB1ouzyUx35VQApAFV8HeSg%3D&reser= ved=3D0 >>>>>>>>>>>>>> List Guidelines: >>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%= 3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data= =3D04%7C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43= 083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWF= pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D= %7C1000&sdata=3D57RGeurJjgZWxGajJPJh6BeR1vQ2OYLYdjTNba2HsPM%3D&rese= rved=3D0 >>>>>>>>>>>>>> Committee: >>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%= 3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmg= urtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b= 7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8eyJWIj= oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sda= ta=3DWlkao282wqvghnPNiM2ER6I6GKO%2Fhe2LbDCFH%2FOnkko%3D&reserved=3D0 >>>>>>>>>>>>>> Join OASIS: >>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%= 3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidi= a.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b7db39efd9ccc17a= %7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL= CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3D0q2g8y0CtJ= h5dqRKNE%2FzDC3wOC5kqn%2FDVjnNhj3FFGo%3D&reserved=3D0 >>>>>>>>>>>>>> >>>>>>>>>> This publicly archived list offers a means to provide input to t= he >>>>>>>>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>>>>>>>> >>>>>>>>>> In order to verify user consent to the Feedback License terms an= d >>>>>>>>>> to minimize spam in the list archive, subscription is required >>>>>>>>>> before posting. >>>>>>>>>> >>>>>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>>>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>>>>>>>> List help: virtio-comment-help@lists.oasis-open.org >>>>>>>>>> List archive: >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C01%= 7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d1572734= 0c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8ey= JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&= ;sdata=3D1%2FZMUIr0f%2FfEgDjB8MQ9a2lHiXr4SkLqSG44r6kgJeQ%3D&reserved=3D= 0 >>>>>>>>>> Feedback License: >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7C0= 1%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727= 340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8= eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&a= mp;sdata=3Dm%2BnMdp9ZA%2BqBU8PzC8HWJB1ouzyUx35VQApAFV8HeSg%3D&reserved= =3D0 >>>>>>>>>> List Guidelines: >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D04%= 7C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15= 727340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb= 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C100= 0&sdata=3D57RGeurJjgZWxGajJPJh6BeR1vQ2OYLYdjTNba2HsPM%3D&reserved= =3D0 >>>>>>>>>> Committee: >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgurto= voy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b7db3= 9efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC= 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata= =3DWlkao282wqvghnPNiM2ER6I6GKO%2Fhe2LbDCFH%2FOnkko%3D&reserved=3D0 >>>>>>>>>> Join OASIS: >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.co= m%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b7db39efd9ccc17a%7C0= %7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI= joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3D0q2g8y0CtJh5dq= RKNE%2FzDC3wOC5kqn%2FDVjnNhj3FFGo%3D&reserved=3D0 >>>>>>>>>> >>>>>> This publicly archived list offers a means to provide input to the >>>>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>>>> >>>>>> In order to verify user consent to the Feedback License terms and >>>>>> to minimize spam in the list archive, subscription is required >>>>>> before posting. >>>>>> >>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>>>> List help: virtio-comment-help@lists.oasis-open.org >>>>>> List archive: >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= lists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C01%7Cmg= urtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b= 7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8eyJWIj= oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sda= ta=3D1%2FZMUIr0f%2FfEgDjB8MQ9a2lHiXr4SkLqSG44r6kgJeQ%3D&reserved=3D0 >>>>>> Feedback License: >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= www.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7C01%7C= mgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c= 1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8eyJW= IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&s= data=3Dm%2BnMdp9ZA%2BqBU8PzC8HWJB1ouzyUx35VQApAFV8HeSg%3D&reserved=3D0 >>>>>> List Guidelines: >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= www.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D04%7C01= %7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d157273= 40c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8e= yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&am= p;sdata=3D57RGeurJjgZWxGajJPJh6BeR1vQ2OYLYdjTNba2HsPM%3D&reserved=3D0 >>>>>> Committee: >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= www.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgurtovoy%= 40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b7db39efd= 9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj= AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DWlk= ao282wqvghnPNiM2ER6I6GKO%2Fhe2LbDCFH%2FOnkko%3D&reserved=3D0 >>>>>> Join OASIS: >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2F= www.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C= 0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0= %7C637652851190375162%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV= 2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DQKSFAtueKKrXjhe2pI= E1yVJ3pjNC0F%2FGvcXotSbnlCw%3D&reserved=3D0 >>>>>> >>>> This publicly archived list offers a means to provide input to the >>>> OASIS Virtual I/O Device (VIRTIO) TC. >>>> >>>> In order to verify user consent to the Feedback License terms and >>>> to minimize spam in the list archive, subscription is required >>>> before posting. >>>> >>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org >>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org >>>> List help: virtio-comment-help@lists.oasis-open.org >>>> List archive: https://nam11.safelinks.protection.outlook.com/?url=3Dht= tps%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D= 04%7C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083= d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190375162%7CUnknown%7CTWFpbG= Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C= 1000&sdata=3D%2FvradAyVbbFzSdoy7vFrIo61VQNV%2Fgn9swdf5kTaiQU%3D&res= erved=3D0 >>>> Feedback License: https://nam11.safelinks.protection.outlook.com/?url= =3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&= data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%= 7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190375162%7CUnknown%7= CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn= 0%3D%7C1000&sdata=3D1inerAxzEQMA7QcJL5SmBE88VcW98PyxM0qJ5k%2B2B1c%3D&am= p;reserved=3D0 >>>> List Guidelines: https://nam11.safelinks.protection.outlook.com/?url= =3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&a= mp;data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38e= f4%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190375162%7CUnknow= n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI= 6Mn0%3D%7C1000&sdata=3DuLlc0YNz7wXRXwO99ieHX25nBwKCyTBqVatNoc6BbSg%3D&a= mp;reserved=3D0 >>>> Committee: https://nam11.safelinks.protection.outlook.com/?url=3Dhttps= %3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cm= gurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1= b7db39efd9ccc17a%7C0%7C0%7C637652851190375162%7CUnknown%7CTWFpbGZsb3d8eyJWI= joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sd= ata=3DrGmEPe%2FZR%2FaxPUMfZdnLuQdszA4l39gccvEQkNUl9ds%3D&reserved=3D0 >>>> Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvi= dia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b7db39efd9ccc1= 7a%7C0%7C0%7C637652851190375162%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDA= iLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DQKSFAtue= KKrXjhe2pIE1yVJ3pjNC0F%2FGvcXotSbnlCw%3D&reserved=3D0 >>>> From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> <0d06c26e-f1e7-3cac-a017-059e8985bb44@redhat.com> <74151019-6f78-2bff-5b0a-b5a4da814787@nvidia.com> <41fbd78a-f1d8-9056-3929-1e7b6b57a49b@nvidia.com> <0252a058-f3d2-db34-08a0-02c3cdd0e0bb@nvidia.com> In-Reply-To: <0252a058-f3d2-db34-08a0-02c3cdd0e0bb@nvidia.com> From: Jason Wang Date: Tue, 24 Aug 2021 10:41:54 +0800 Message-ID: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable To: Max Gurtovoy Cc: "Dr. David Alan Gilbert" , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Mon, Aug 23, 2021 at 4:55 PM Max Gurtovoy wrote: > > > On 8/23/2021 6:10 AM, Jason Wang wrote: > > On Sun, Aug 22, 2021 at 6:05 PM Max Gurtovoy wro= te: > >> > >> On 8/20/2021 2:16 PM, Jason Wang wrote: > >>> On Fri, Aug 20, 2021 at 6:26 PM Max Gurtovoy w= rote: > >>>> On 8/20/2021 5:24 AM, Jason Wang wrote: > >>>>> =E5=9C=A8 2021/8/19 =E4=B8=8B=E5=8D=8811:20, Max Gurtovoy =E5=86=99= =E9=81=93: > >>>>>> On 8/19/2021 5:24 PM, Dr. David Alan Gilbert wrote: > >>>>>>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: > >>>>>>>> On 8/19/2021 2:12 PM, Dr. David Alan Gilbert wrote: > >>>>>>>>> * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: > >>>>>>>>>> On 8/18/2021 1:46 PM, Jason Wang wrote: > >>>>>>>>>>> On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy > >>>>>>>>>>> wrote: > >>>>>>>>>>>> On 8/17/2021 12:44 PM, Jason Wang wrote: > >>>>>>>>>>>>> On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> On 8/17/2021 11:51 AM, Jason Wang wrote: > >>>>>>>>>>>>>>> =E5=9C=A8 2021/8/12 =E4=B8=8B=E5=8D=888:08, Max Gurtovoy = =E5=86=99=E9=81=93: > >>>>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Live migration is one of the most important features of > >>>>>>>>>>>>>>>> virtualization and virtio devices are oftenly found in v= irtual > >>>>>>>>>>>>>>>> environments. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The migration process is managed by a migration SW that = is > >>>>>>>>>>>>>>>> running on > >>>>>>>>>>>>>>>> the hypervisor and the VM is not aware of the process at= all. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Unlike the vDPA case, a real pci Virtual Function state > >>>>>>>>>>>>>>>> resides in > >>>>>>>>>>>>>>>> the HW. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> vDPA doesn't prevent you from having HW states. Actually > >>>>>>>>>>>>>>> from the view > >>>>>>>>>>>>>>> of the VMM(Qemu), it doesn't care whether or not a state = is > >>>>>>>>>>>>>>> stored in > >>>>>>>>>>>>>>> the software or hardware. A well designed VMM should be a= ble > >>>>>>>>>>>>>>> to hide > >>>>>>>>>>>>>>> the virtio device implementation from the migration layer= , > >>>>>>>>>>>>>>> that is how > >>>>>>>>>>>>>>> Qemu is wrote who doesn't care about whether or not it's = a > >>>>>>>>>>>>>>> software > >>>>>>>>>>>>>>> virtio/vDPA device or not. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> In our vision, in order to fulfil the Live migration > >>>>>>>>>>>>>>>> requirements for > >>>>>>>>>>>>>>>> virtual functions, each physical function device must > >>>>>>>>>>>>>>>> implement > >>>>>>>>>>>>>>>> migration operations. Using these operations, it will be > >>>>>>>>>>>>>>>> able to > >>>>>>>>>>>>>>>> master the migration process for the virtual function > >>>>>>>>>>>>>>>> devices. Each > >>>>>>>>>>>>>>>> capable physical function device has a supervisor > >>>>>>>>>>>>>>>> permissions to > >>>>>>>>>>>>>>>> change the virtual function operational states, > >>>>>>>>>>>>>>>> save/restore its > >>>>>>>>>>>>>>>> internal state and start/stop dirty pages tracking. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> For "supervisor permissions", is this from the software > >>>>>>>>>>>>>>> point of view? > >>>>>>>>>>>>>>> Maybe it's better to give an example for this. > >>>>>>>>>>>>>> A permission to a PF device for quiesce and freeze a VF > >>>>>>>>>>>>>> device for example. > >>>>>>>>>>>>> Note that for safety, VMM (e.g Qemu) is usually running > >>>>>>>>>>>>> without any privileges. > >>>>>>>>>>>> You're mixing layers here. > >>>>>>>>>>>> > >>>>>>>>>>>> QEMU is not involved here. It's only sending IOCTLs to > >>>>>>>>>>>> migration driver. > >>>>>>>>>>>> The migration driver will control the migration process of t= he > >>>>>>>>>>>> VF using > >>>>>>>>>>>> the PF communication channel. > >>>>>>>>>>> So who will be granted the "permission" you mentioned here? > >>>>>>>>>> This is just an expression. > >>>>>>>>>> > >>>>>>>>>> What is not clear ? > >>>>>>>>>> > >>>>>>>>>> The PF device will have an option to quiesce/freeze the VF dev= ice. > >>>>>>>>>> > >>>>>>>>>> This is simple. Why are you looking for some sophisticated > >>>>>>>>>> problems ? > >>>>>>>>> I'm trying to follow along here and have not completely; but I > >>>>>>>>> think the issue is a > >>>>>>>>> security separation one. > >>>>>>>>> The VMM (e.g. qemu) that has been given access to one of the VF= 's is > >>>>>>>>> isolated and shouldn't be able to go poking at other devices; s= o it > >>>>>>>>> can't go poking at the PF (it probably doesn't even have the PF > >>>>>>>>> device > >>>>>>>>> node accessible) - so then the question is who has access to th= e > >>>>>>>>> migration driver and how do you make sure it can only deal with= VF's > >>>>>>>>> that it's supposed to be able to migrate. > >>>>>>>> The QEMU/userspace doesn't know or care about the PF connection = and > >>>>>>>> internal > >>>>>>>> virtio_vfio_pci driver implementation. > >>>>>>> OK > >>>>>>> > >>>>>>>> You shouldn't change 1 line of code in the VM driver nor in QEMU= . > >>>>>>> Hmm OK. > >>>>>>> > >>>>>>>> QEMU does not have access to the PF. Only the kernel driver that > >>>>>>>> has access > >>>>>>>> to the VF will have access to the PF communication channel. Ther= e > >>>>>>>> is no > >>>>>>>> permission problem here. > >>>>>>>> > >>>>>>>> The kernel driver of the VF will do this internally, and make su= re > >>>>>>>> that the > >>>>>>>> commands it build will only impact the VF originating them. > >>>>>>>> > >>>>>>> Now that confuses me; isn't the kernel driver that has access to = the VF > >>>>>>> running inside the guest? If it's inside the guest we can't trus= t > >>>>>>> it to > >>>>>>> do anything about stopping impact to other devices. > >>>>>> No. The driver is in the hypervisor (virtio_vfio_pci). This is the > >>>>>> migration driver, right ? > >>>>> Well, talking things like virtio_vfio_pci that is not mentioned bef= ore > >>>>> and not justified on the list may easily confuse people. As pointed > >>>>> out in another thread, it has too many disadvantages over the exist= ing > >>>>> virtio-pci vdpa driver. And it just duplicates a partial function o= f > >>>>> what virtio-pci vdpa driver can do. I don't think we will go that w= ay. > >>>> This was just an example for David to help with understanding the > >>>> solution since he thought that the guest drivers somehow should be c= hanged. > >>>> > >>>> David I'm sorry if I confused you. > >>>> > >>>> Again Jason, you try to propose your vDPA solution that is not what > >>>> we're trying to achieve in this work. Think of a world without vDPA. > >>> Well, I'd say, let's think vDPA a superset of virtio, not just the > >>> acceleration technologies. > >> I'm sorry but vDPA is not relevant to this discussion. > > Well, it's you that mention the software things like VFIO first. > > > >> Anyhow, I don't see any problem for vDPA driver to work on top of the > >> design proposed here. > >> > >>>> Also I don't understand how vDPA is related to virtio specification > >>>> decisions ? > >>> So how is VFIO related to virtio specific decisions? That's why I > >>> think we should avoid talking about software architecture here. It's > >>> the wrong community. > >> VFIO is not related to virtio spec. > > Of course. > > > >> It was an example for David. What is the problem with giving examples = to > >> ease on people to understand the solution ? > > I don't think your example ease the understanding. > > > >> Where did you see that the design is referring to VFIO ? > >> > >>>> make vDPA into virtio and then we can open a discussion. > >>>> > >>>> I'm interesting in virtio migration of HW devices. > >>>> > >>>> The proposal in this thread is actually get support from Michal AFAI= U > >>>> and also others were happy with. All beside of you. > >>> So I think I've clairfied my several times :( > >>> > >>> - I'm fairly ok with the proposal > >> It doesn't seems like that. > >> > >>> - but we decouple the basic facility out of the admin virtqueue and > >>> this seems agreed by Michael: > >>> > >>> Let's take the dirty page tracking as an example: > >>> > >>> 1) let's first define that as one of the basic facility > >>> 2) then we can introduce admin virtqueue or other stuffs as an > >>> interface for that facility > >>> > >>> Does this work for you? > >> What I really want is to agree that the right way to manage migration > >> process of a virtio VF. My proposal is doing so by creating a > >> communication channel in its parent PF. > > It looks to me you never answer the question "why it must be done by PF= ". > > This is not relevant question. In our profession you can solve a problem > with more than 1 way. > > We need to find the robust one. Then you need to prove how robust your proposal is. > > > > > All the functions provided by PF so far for software is not expected > > to be used by VMM like Qemu. Those functions usually requires > > capability or privileges for the management software to use. You > > mentioned things like "supervisor" and "permission", but it looks to > > me you are still unaware how it connect to the security stuffs. > > I now see that you don't understand at all what I'm proposing here. > > Maybe you can go back to the questions David asked and read my answers > to get better understanding of the solution. > > > > >> I think I got a confirmation here. > >> > >> This communication channel is not introduced in this thread, but > >> obviously it should be an adminq. > > Let me clarify. What I want to say is admin should be one of the > > possible channels. > > if you want to fork and create more than 1 way to do things we can check > other options. > > BTW, In the 2019 conference I saw that MST talked about adding LM to the > spec and hint that the PF should manage the VF. > > Adding some non-ready HW platforms consideration, future technologies > and hypervisor hacks in the design of virtio LM sounds weird to me. What do you mean by "non-ready"? I don't think I suggest you add anything, it's just about a restructure of your current proposal. > > I still don't understand why you can't do all the things you wish doing > with simple commands sent via the admin-q and insist of splitting > devices and splitting config spaces and bunch of other hacks. > > Don't you prefer a robust solution to work with any existing platform > today ? or do you aim for future solution ? You never explain why it is robust. That's why I ask why it must be done in that way. > > >> For your future scalable functions, the Parent Device (lets call it PD= ) > >> will manage the creation/migration/destruction process for its Virtual > >> Devices (lets call them VDs) using the PD adminq. > >> > >> Agreed ? > > They are two different set of functions: > > > > - provisioning/creation/destruction: requires privilege and we don't > > have any plan to expose them to the guest. It should be done via PF or > > PD for security as you mentioned above. > > - migration: doesn't require privilege, and it can be expose to the > > guest, if can be done in either PF or VF. To me using VF is much more > > natural, but using PF is also fine. > > migration exposed to the guest ? No. Can you explain why? > > This is a basic assumption, really. That's just your assumption. Nested virt has been supported by some cloud vendors. > > I think this is the problem in the whole discussion. No, if you tie any feature to the admin virtqueue, it can't be used by the guest. Migration is just an example. > > I think all the community agree that guest shouldn't be aware of > migration. You must understand this. I just make a minimal effort so we can enable this capability in the future, why not? > > Once you do, all this process will be easier and we'll progress instead > of running in circles. I gave you a simple suggestion to make nested migration to work. > > > > > An exception for the migration is the dirty page tracking, without DMA > > isolation, we may end up with security issue if we do that in the VF. > > Lets start with basic migration first. > > In my model the Hypervisor kernel control this. No security issue since > the kernel is a secured entity. Well, if you do things in VF, is it unsafe? > > This is what we do already in our solution for NIC devices. > > I don't want virtio to be behind. > > > > >> Please don't answer that this is not a "must". This is my proposal. If > >> you have another proposal, please propose. > > Well, you are asking for the comments instead of enforcing things right= ? > > > > And it's as simple as: > > > > 1) introduce admin virtqueue, and bind migration features to admin virt= queue > > > > or > > > > 2) introduce migration features and admin virtqueue independently > > > > What's the problem of do trivial modifications like 2)? Is that > > conflict with your proposal? > > I did #2 already and then you asked me to do #1. Where? I don't think you decouple migration out of the admin virtqueue in any of the previous versions. If you think I do that, I would like to clarify once again then. > > If I do #1 you'll ask #2. How do you know that? > > I'm progressing towards final solution. I got the feedback I need. > > > > >>>> We do it in mlx5 and we didn't see any issues with that design. > >>>> > >>> If we seperate things as I suggested, I'm totally fine. > >> separate what ? > >> > >> Why should I create different interfaces for different management task= s. > > I don't say you need to create different interfaces. It's for future ex= tensions: > > > > 1) When VIRTIO_F_ADMIN_VQ is negotiated, the interface is admin virtque= ue > > 2) When other features is negotiated, the interface is other. > > > > In order to make 2) work, we need introduce migration and admin > > virtqueue separately. > > > > Migration is not management task which doesn't require any privilege. > > You need to control the operational state of a device, track its dirty > pages, save/restore internal HW state. > > If you think that anyone can do it to a virtio device so lets see this > magic works (I believe that only the parent/management device can do it > on behalf of the migration software). Well, I think both of us want to make progress, let's do: 1) decouple the migration features out of admin virtqueue, this has been agreed by Michael 2) introduce admin virtqueue as the interface for this Then that's all fine. > > > > >> I have a virtual/scalable device that I want to refer to from the > >> physical/parent device using some interface. > >> > >> This interface is adminq. This interface will be used for dirty_page > >> tracking and operational state changing and get/set internal state as > >> well. And more (create/destroy SF for example). > >> > >> You can think of this in some other way, i'm fine with it. As long as > >> the final conclusion is the same. > >> > >>>> I don't think you can say that we "go that way". > >>> For "go that way" I meant the method of using vfio_virtio_pci, it has > >>> nothing related to the discussion of "using PF to control VF" on the > >>> spec. > >> This was an example. Please leave it as an example for David. > >> > >> > >>>> You're trying to build a complementary solution for creating scalabl= e > >>>> functions and for some reason trying to sabotage NVIDIA efforts to a= dd > >>>> new important functionality to virtio. > >>> Well, it's a completely different topic. And it doesn't conflict with > >>> anything that is proposed here by you. I think I've stated this > >>> several times. I don't think we block each other, it's just some > >>> unification work if one of the proposals is merged first. I sent them > >>> recently because it will be used as a material for my talk on the KVM > >>> Forum which is really near. > >> In theory you're right. We shouldn't block each other, and I don't blo= ck > >> you. But for some reason I see that you do try to block my proposal an= d > >> I don't understand why. > > I don't want to block your proposal, let's decouple the migration > > feature out of admin virtqueue. Then it's fine. > > > > The problem I see is that, you tend to refuse such a trivial but > > beneficial change. That's what I don't understand. > > I thought I explained it. Nothing keeps you happy. If we A, you ask for > B. if we do B you as for A. Firstly, I never do that, as mentioned I can clarify things if you give a pointer to the previous discussion that can prove this. Secondly, for technical discussion, it's not rare: 1) we start from A, and get the comments to see if we can go B 2) when we propose B, people think it's too complicated and ask us to go back to A 3) a new version goes back to A That's pretty natural, and it's not an endless circular. > > I continue with the feedback I get from MST. Michael agreed to decouple the basic function out of admin virtqueue. > > > > >> I feel like I wasted 2 months on a discussion instead of progressing. > > Well, I'm not sure 2 months is short, but it's usually take more than > > a year for huge project in Linux. > > But if you go in circles it will never end, right ? See above. > > > > > Patience may help us to understand the points of each other better. > > first I want to agree on the above migration concepts I wrote. > > If we don't agree on that, the discussion is useless. > > > > >> But now I do see a progress. A PF to manage VF migration is the way to > >> go forward. > >> > >> And the following RFC will take this into consideration. > >> > >>>> This also sabotage the evolvment of virtio as a standard. > >>>> > >>>> You're trying to enforce some un-finished idea that should work on s= ome > >>>> future specific HW platform instead of helping defining a good spec = for > >>>> virtio. > >>> Let's open another thread for this if you wish, it has nothing relate= d > >>> to the spec but how it is implemented in Linux. If you search the > >>> archive, something similar to "vfio_virtio_pci" has been proposed > >>> several years before by Intel. The idea has been rejected, and we hav= e > >>> leveraged Linux vDPA bus for virtio-pci devices. > >> I don't know this history. And I will happy to hear about it one day. > >> > >> But for our discussion in Linux, virtio_vfio_pci will happen. And it > >> will implement the migration logic of a virtio device with PCI transpo= rt > >> for VFs using the PF admin queue. > >> > >> We at NVIDIA, currently upstreaming (alongside with AlexW and Cornelia= ) > >> a vfio-pci separation that will enable an easy creation of vfio-pci > >> vendor/protocol drivers to do some specific tasks. > >> > >> New drivers such as mlx5_vfio_pci, hns_vfio_pci, virtio_vfio_pci and > >> nvme_vfio_pci should be implemented in the near future in Linux to > >> enable migration of these devices. > >> > >> This is just an example. And it's not related to the spec nor the > >> proposal at all. > > Let's move those discussions to the right list. I'm pretty sure there > > will a long debate there. Please prepare for that. > > We already discussed this with AlexW, Cornelia, JasonG, ChristophH and > others. Vendor specific drivers are not interested here. And google "nvme_vfio_pci" gives nothing to me. Where is the discussion? For virtio, I need to make sure the design is generic with sufficient ability to be extended in the future instead of a feature that can only work for some specific vendor or platform. Your proposal works only for PCI with SR-IOV. And I want to leverage it to be useful for other platforms or transport. That's all my motivation. Thanks > > And before we have a virtio spec for LM we can't discuss about it in the > Linux mailing list. > > It will waste everyone's time. > > > > >>>> And all is for having users to choose vDPA framework instead of usin= g > >>>> plain virtio. > >>>> > >>>> We believe in our solution and we have a working prototype. We'll > >>>> continue with our discussion to convince the community with it. > >>> Again, it looks like there's a lot of misunderstanding. Let's open a > >>> thread on the suitable list instead of talking about any specific > >>> software solution or architecture here. This will speed up things. > >> I prefer to finish the specification first. SW arch is clear for us in > >> Linux. We did it already for mlx5 devices and it will be the same for > >> virtio if the spec changes will be accepted. > > I disagree, but let's separate software discussion out of the spec > > discussion here. > > > > Thanks > > > >> Thanks. > >> > >> > >>> Thanks > >>> > >>>> Thanks. > >>>> > >>>>> Thanks > >>>>> > >>>>> > >>>>>> The guest is running as usual. It doesn't aware on the migration a= t all. > >>>>>> > >>>>>> This is the point I try to make here. I don't (and I can't) change > >>>>>> even 1 line of code in the guest. > >>>>>> > >>>>>> e.g: > >>>>>> > >>>>>> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor > >>>>>> (bounded to VF5) --> send admin command on PF adminq to start > >>>>>> tracking dirty pages for VF5 --> PF device will do it > >>>>>> > >>>>>> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor > >>>>>> (bounded to VF5) --> send admin command on PF adminq to quiesce VF= 5 > >>>>>> --> PF device will do it > >>>>>> > >>>>>> You can take a look how we implement mlx5_vfio_pci in the link I > >>>>>> provided. > >>>>>> > >>>>>>> Dave > >>>>>>> > >>>>>>> > >>>>>>>> We already do this in mlx5 NIC migration. The kernel is secured = and > >>>>>>>> QEMU > >>>>>>>> interface is the VF. > >>>>>>>> > >>>>>>>>> Dave > >>>>>>>>> > >>>>>>>>>>>>>>>> An example of this approach can be seen in the way NVIDI= A > >>>>>>>>>>>>>>>> performs > >>>>>>>>>>>>>>>> live migration of a ConnectX NIC function: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pc= i > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> NVIDIAs SNAP technology enables hardware-accelerated > >>>>>>>>>>>>>>>> software defined > >>>>>>>>>>>>>>>> PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used = for > >>>>>>>>>>>>>>>> storage > >>>>>>>>>>>>>>>> and networking solutions. The host OS/hypervisor uses it= s > >>>>>>>>>>>>>>>> standard > >>>>>>>>>>>>>>>> drivers that are implemented according to a well-known V= IRTIO > >>>>>>>>>>>>>>>> specifications. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> In order to implement Live Migration for these virtual > >>>>>>>>>>>>>>>> function > >>>>>>>>>>>>>>>> devices, that use a standard drivers as mentioned, the > >>>>>>>>>>>>>>>> specification > >>>>>>>>>>>>>>>> should define how HW vendor should build their devices a= nd > >>>>>>>>>>>>>>>> for SW > >>>>>>>>>>>>>>>> developers to adjust the drivers. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> This will enable specification compliant vendor agnostic > >>>>>>>>>>>>>>>> solution. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> This is exactly how we built the migration driver for Co= nnectX > >>>>>>>>>>>>>>>> (internal HW design doc) and I guess that this is the wa= y > >>>>>>>>>>>>>>>> other > >>>>>>>>>>>>>>>> vendors work. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> For that, I would like to know if the approach of =E2=80= =9CPF that > >>>>>>>>>>>>>>>> controls > >>>>>>>>>>>>>>>> the VF live migration process=E2=80=9D is acceptable by = the VIRTIO > >>>>>>>>>>>>>>>> technical > >>>>>>>>>>>>>>>> group ? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I'm not sure but I think it's better to start from the ge= neral > >>>>>>>>>>>>>>> facility for all transports, then develop features for a > >>>>>>>>>>>>>>> specific > >>>>>>>>>>>>>>> transport. > >>>>>>>>>>>>>> a general facility for all transports can be a generic adm= in > >>>>>>>>>>>>>> queue ? > >>>>>>>>>>>>> It could be a virtqueue or a transport specific method (pci= e > >>>>>>>>>>>>> capability). > >>>>>>>>>>>> No. You said a general facility for all transports. > >>>>>>>>>>> For general facility, I mean the chapter 2 of the spec which = is > >>>>>>>>>>> general > >>>>>>>>>>> > >>>>>>>>>>> " > >>>>>>>>>>> 2 Basic Facilities of a Virtio Device > >>>>>>>>>>> " > >>>>>>>>>>> > >>>>>>>>>> It will be in chapter 2. Right after "2.11 Exporting Object" I > >>>>>>>>>> can add "2.12 > >>>>>>>>>> Admin Virtqueues" and this is what I did in the RFC. > >>>>>>>>>> > >>>>>>>>>>>> Transport specific is not general. > >>>>>>>>>>> The transport is in charge of implementing the interface for > >>>>>>>>>>> those facilities. > >>>>>>>>>> Transport specific is not general. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>>> E.g we can define what needs to be migrated for the virtio-= blk > >>>>>>>>>>>>> first > >>>>>>>>>>>>> (the device state). Then we can define the interface to get > >>>>>>>>>>>>> and set > >>>>>>>>>>>>> those states via admin virtqueue. Such decoupling may ease = the > >>>>>>>>>>>>> future > >>>>>>>>>>>>> development of the transport specific migration interface. > >>>>>>>>>>>> I asked a simple question here. > >>>>>>>>>>>> > >>>>>>>>>>>> Lets stick to this. > >>>>>>>>>>> I answered this question. > >>>>>>>>>> No you didn't answer. > >>>>>>>>>> > >>>>>>>>>> I asked if the approach of =E2=80=9CPF that controls the VF l= ive > >>>>>>>>>> migration process=E2=80=9D > >>>>>>>>>> is acceptable by the VIRTIO technical group ? > >>>>>>>>>> > >>>>>>>>>> And you take the discussion to your direction instead of > >>>>>>>>>> answering a Yes/No > >>>>>>>>>> question. > >>>>>>>>>> > >>>>>>>>>>> The virtqueue could be one of the > >>>>>>>>>>> approaches. And it's your responsibility to convince the comm= unity > >>>>>>>>>>> about that approach. Having an example may help people to > >>>>>>>>>>> understand > >>>>>>>>>>> your proposal. > >>>>>>>>>>> > >>>>>>>>>>>> I'm not referring to internal state definitions. > >>>>>>>>>>> Without an example, how do we know if it can work well? > >>>>>>>>>>> > >>>>>>>>>>>> Can you please not change the subject of my initial intent i= n > >>>>>>>>>>>> the email ? > >>>>>>>>>>> Did I? Basically, I'm asking how a virtio-blk can be migrated= with > >>>>>>>>>>> your proposal. > >>>>>>>>>> The virtio-blk PF admin queue will be used to manage the > >>>>>>>>>> virtio-blk VF > >>>>>>>>>> migration. > >>>>>>>>>> > >>>>>>>>>> This is the whole discussion. I don't want to get into resolut= ion. > >>>>>>>>>> > >>>>>>>>>> Since you already know the answer as I published 4 RFCs alread= y > >>>>>>>>>> with all the > >>>>>>>>>> flow. > >>>>>>>>>> > >>>>>>>>>> Lets stick to my question. > >>>>>>>>>> > >>>>>>>>>>> Thanks > >>>>>>>>>>> > >>>>>>>>>>>> Thanks. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> Thanks > >>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> -Max. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> This publicly archived list offers a means to provide inpu= t > >>>>>>>>>>>>>> to the > >>>>>>>>>>>>>> OASIS Virtual I/O Device (VIRTIO) TC. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> In order to verify user consent to the Feedback License te= rms > >>>>>>>>>>>>>> and > >>>>>>>>>>>>>> to minimize spam in the list archive, subscription is requ= ired > >>>>>>>>>>>>>> before posting. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org > >>>>>>>>>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.o= rg > >>>>>>>>>>>>>> List help: virtio-comment-help@lists.oasis-open.org > >>>>>>>>>>>>>> List archive: > >>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04= %7C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d1= 5727340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZs= b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C10= 00&sdata=3D1%2FZMUIr0f%2FfEgDjB8MQ9a2lHiXr4SkLqSG44r6kgJeQ%3D&reser= ved=3D0 > >>>>>>>>>>>>>> Feedback License: > >>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D= 04%7C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083= d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbG= Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C= 1000&sdata=3Dm%2BnMdp9ZA%2BqBU8PzC8HWJB1ouzyUx35VQApAFV8HeSg%3D&res= erved=3D0 > >>>>>>>>>>>>>> List Guidelines: > >>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data= =3D04%7C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43= 083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWF= pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D= %7C1000&sdata=3D57RGeurJjgZWxGajJPJh6BeR1vQ2OYLYdjTNba2HsPM%3D&rese= rved=3D0 > >>>>>>>>>>>>>> Committee: > >>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7C= mgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c= 1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8eyJW= IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&s= data=3DWlkao282wqvghnPNiM2ER6I6GKO%2Fhe2LbDCFH%2FOnkko%3D&reserved=3D0 > >>>>>>>>>>>>>> Join OASIS: > >>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttp= s%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvi= dia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b7db39efd9ccc1= 7a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDA= iLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3D0q2g8y0C= tJh5dqRKNE%2FzDC3wOC5kqn%2FDVjnNhj3FFGo%3D&reserved=3D0 > >>>>>>>>>>>>>> > >>>>>>>>>> This publicly archived list offers a means to provide input to= the > >>>>>>>>>> OASIS Virtual I/O Device (VIRTIO) TC. > >>>>>>>>>> > >>>>>>>>>> In order to verify user consent to the Feedback License terms = and > >>>>>>>>>> to minimize spam in the list archive, subscription is required > >>>>>>>>>> before posting. > >>>>>>>>>> > >>>>>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org > >>>>>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > >>>>>>>>>> List help: virtio-comment-help@lists.oasis-open.org > >>>>>>>>>> List archive: > >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C0= 1%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727= 340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8= eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&a= mp;sdata=3D1%2FZMUIr0f%2FfEgDjB8MQ9a2lHiXr4SkLqSG44r6kgJeQ%3D&reserved= =3D0 > >>>>>>>>>> Feedback License: > >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7= C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d157= 27340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3= d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000= &sdata=3Dm%2BnMdp9ZA%2BqBU8PzC8HWJB1ouzyUx35VQApAFV8HeSg%3D&reserve= d=3D0 > >>>>>>>>>> List Guidelines: > >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D0= 4%7C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d= 15727340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZ= sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1= 000&sdata=3D57RGeurJjgZWxGajJPJh6BeR1vQ2OYLYdjTNba2HsPM%3D&reserved= =3D0 > >>>>>>>>>> Committee: > >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgur= tovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b7d= b39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi= MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata= =3DWlkao282wqvghnPNiM2ER6I6GKO%2Fhe2LbDCFH%2FOnkko%3D&reserved=3D0 > >>>>>>>>>> Join OASIS: > >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A= %2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.= com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b7db39efd9ccc17a%7= C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ= QIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3D0q2g8y0CtJh5= dqRKNE%2FzDC3wOC5kqn%2FDVjnNhj3FFGo%3D&reserved=3D0 > >>>>>>>>>> > >>>>>> This publicly archived list offers a means to provide input to the > >>>>>> OASIS Virtual I/O Device (VIRTIO) TC. > >>>>>> > >>>>>> In order to verify user consent to the Feedback License terms and > >>>>>> to minimize spam in the list archive, subscription is required > >>>>>> before posting. > >>>>>> > >>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org > >>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > >>>>>> List help: virtio-comment-help@lists.oasis-open.org > >>>>>> List archive: > >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=3D04%7C01%7C= mgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c= 1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8eyJW= IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&s= data=3D1%2FZMUIr0f%2FfEgDjB8MQ9a2lHiXr4SkLqSG44r6kgJeQ%3D&reserved=3D0 > >>>>>> Feedback License: > >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=3D04%7C01%= 7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d1572734= 0c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8ey= JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&= ;sdata=3Dm%2BnMdp9ZA%2BqBU8PzC8HWJB1ouzyUx35VQApAFV8HeSg%3D&reserved=3D= 0 > >>>>>> List Guidelines: > >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=3D04%7C= 01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d1572= 7340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d= 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&= amp;sdata=3D57RGeurJjgZWxGajJPJh6BeR1vQ2OYLYdjTNba2HsPM%3D&reserved=3D0 > >>>>>> Committee: > >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7Cmgurtovo= y%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b7db39e= fd9ccc17a%7C0%7C0%7C637652851190365210%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w= LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DW= lkao282wqvghnPNiM2ER6I6GKO%2Fhe2LbDCFH%2FOnkko%3D&reserved=3D0 > >>>>>> Join OASIS: > >>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%= 2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40nvidia.com%= 7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b7db39efd9ccc17a%7C0%7= C0%7C637652851190375162%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo= iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DQKSFAtueKKrXjhe2= pIE1yVJ3pjNC0F%2FGvcXotSbnlCw%3D&reserved=3D0 > >>>>>> > >>>> This publicly archived list offers a means to provide input to the > >>>> OASIS Virtual I/O Device (VIRTIO) TC. > >>>> > >>>> In order to verify user consent to the Feedback License terms and > >>>> to minimize spam in the list archive, subscription is required > >>>> before posting. > >>>> > >>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org > >>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > >>>> List help: virtio-comment-help@lists.oasis-open.org > >>>> List archive: https://nam11.safelinks.protection.outlook.com/?url=3D= https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data= =3D04%7C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43= 083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190375162%7CUnknown%7CTWF= pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D= %7C1000&sdata=3D%2FvradAyVbbFzSdoy7vFrIo61VQNV%2Fgn9swdf5kTaiQU%3D&= reserved=3D0 > >>>> Feedback License: https://nam11.safelinks.protection.outlook.com/?ur= l=3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&= ;data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4= %7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190375162%7CUnknown%= 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6M= n0%3D%7C1000&sdata=3D1inerAxzEQMA7QcJL5SmBE88VcW98PyxM0qJ5k%2B2B1c%3D&a= mp;reserved=3D0 > >>>> List Guidelines: https://nam11.safelinks.protection.outlook.com/?url= =3Dhttps%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&a= mp;data=3D04%7C01%7Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38e= f4%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637652851190375162%7CUnknow= n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI= 6Mn0%3D%7C1000&sdata=3DuLlc0YNz7wXRXwO99ieHX25nBwKCyTBqVatNoc6BbSg%3D&a= mp;reserved=3D0 > >>>> Committee: https://nam11.safelinks.protection.outlook.com/?url=3Dhtt= ps%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=3D04%7C01%7= Cmgurtovoy%40nvidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340= c1b7db39efd9ccc17a%7C0%7C0%7C637652851190375162%7CUnknown%7CTWFpbGZsb3d8eyJ= WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&= sdata=3DrGmEPe%2FZR%2FaxPUMfZdnLuQdszA4l39gccvEQkNUl9ds%3D&reserved=3D0 > >>>> Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=3Dht= tps%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=3D04%7C01%7Cmgurtovoy%40n= vidia.com%7C0110223ec8a341665c2c08d965e38ef4%7C43083d15727340c1b7db39efd9cc= c17a%7C0%7C0%7C637652851190375162%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM= DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3DQKSFAt= ueKKrXjhe2pIE1yVJ3pjNC0F%2FGvcXotSbnlCw%3D&reserved=3D0 > >>>> > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 24 Aug 2021 10:10:07 -0300 From: Jason Gunthorpe Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Message-ID: <20210824131007.GT1721383@nvidia.com> References: <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> <0d06c26e-f1e7-3cac-a017-059e8985bb44@redhat.com> <74151019-6f78-2bff-5b0a-b5a4da814787@nvidia.com> <41fbd78a-f1d8-9056-3929-1e7b6b57a49b@nvidia.com> <0252a058-f3d2-db34-08a0-02c3cdd0e0bb@nvidia.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline To: Jason Wang Cc: Max Gurtovoy , "Dr. David Alan Gilbert" , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Tue, Aug 24, 2021 at 10:41:54AM +0800, Jason Wang wrote: > > migration exposed to the guest ? No. > > Can you explain why? For the SRIOV case migration is a privileged operation of the hypervisor. The guest must not be allowed to interact with it in any way otherwise the hypervisor migration could be attacked from the guest and this has definite security implications. In practice this means that nothing related to migration can be located on the MMIO pages/queues/etc of the VF. The reasons for this are a bit complicated and has to do with the limitations of IO isolation with VFIO - eg you can't reliably split a single PCI BDF into hypervisor/guest security domains without PASID. We recently revisited this concept again with a HNS vfio driver. IIRC Intel messed it up in their mdev driver too. > > >>> Let's open another thread for this if you wish, it has nothing related > > >>> to the spec but how it is implemented in Linux. If you search the > > >>> archive, something similar to "vfio_virtio_pci" has been proposed > > >>> several years before by Intel. The idea has been rejected, and we have > > >>> leveraged Linux vDPA bus for virtio-pci devices. That was largely because Intel was proposing to use mdevs to create an entire VDPA subsystem hidden inside VFIO. We've invested in a pure VFIO solution which should be merged soon: https://lore.kernel.org/kvm/20210819161914.7ad2e80e.alex.williamson@redhat.com/ It does not rely on mdevs. It is not trying to recreate VDPA. Instead the HW provides a fully functional virto VF and the solution uses normal SRIOV approaches. You can contrast this with the two virtio-net solutions mlx5 will support: - One is the existing hypervisor assisted VDPA solution where the mlx5 driver does HW accelerated queue processing. - The other one is a full PCI VF that provides a virtio-net function without any hypervisor assistance. In this case we will have a VFIO migration driver as above that to provide SRIOV VF live migration. I see in this thread that these two things are becoming quite confused. They are very different, have different security postures and use different parts of the hypervisor stack, and intended for quite different use cases. > Your proposal works only for PCI with SR-IOV. And I want to leverage > it to be useful for other platforms or transport. That's all my > motivation. I've read most of the emails here I still don't see what the use case is for this beyond PCI SRIOV. In a general sense it requires virtio to specify how PASID works. No matter what we must create a split secure/guest world where DMAs from each world are uniquely tagged. In the pure PCI world this means either using PF/VF or VF/PASID. In general PASID still has a long road to go before it is working in Linux: https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/ So, IMHO, it make sense to focus on the PF/VF definition for spec purposes. I agree it would be good spec design to have a general concept of a secure and guest world and specific sections that defines how it works for different scenarios, but that seems like a language remark and not one about the design. For instance the admin queue Max is adding is clearly part of the secure world and putting it on the PF is the only option for the SRIOV mode. Jason From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 References: <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> <0d06c26e-f1e7-3cac-a017-059e8985bb44@redhat.com> <74151019-6f78-2bff-5b0a-b5a4da814787@nvidia.com> <41fbd78a-f1d8-9056-3929-1e7b6b57a49b@nvidia.com> <0252a058-f3d2-db34-08a0-02c3cdd0e0bb@nvidia.com> <20210824131007.GT1721383@nvidia.com> In-Reply-To: <20210824131007.GT1721383@nvidia.com> From: Jason Wang Date: Wed, 25 Aug 2021 12:58:01 +0800 Message-ID: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Content-Type: text/plain; charset="UTF-8" To: Jason Gunthorpe Cc: Max Gurtovoy , "Dr. David Alan Gilbert" , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Tue, Aug 24, 2021 at 9:10 PM Jason Gunthorpe wrote: > > On Tue, Aug 24, 2021 at 10:41:54AM +0800, Jason Wang wrote: > > > > migration exposed to the guest ? No. > > > > Can you explain why? > > For the SRIOV case migration is a privileged operation of the > hypervisor. The guest must not be allowed to interact with it in any > way otherwise the hypervisor migration could be attacked from the > guest and this has definite security implications. > > In practice this means that nothing related to migration can be > located on the MMIO pages/queues/etc of the VF. The reasons for this > are a bit complicated and has to do with the limitations of IO > isolation with VFIO - eg you can't reliably split a single PCI BDF > into hypervisor/guest security domains without PASID. So exposing the migration function can be done indirectly: In L0, the hardware implements the function via PF, Qemu will present an emulated PCI device then Qemu can expose those functions via a capability for L1 guests. When L1 driver tries to use those functions, it goes: L1 virtio-net driver -(emulated PCI-E BAR)-> Qemu -(ioctl)-> L0 kernel VF driver -> L0 kernel PF driver -(virtio interface)-> virtio PF In this approach, there's no way for the L1 driver to control the or see what is implemented in the hardware (PF). The details were hidden by Qemu. This works even if DMA is required for the L0 kernel PF driver to talk with the hardware since for L1 we didn't present a DMA interface. With the future PASID support, we can even present a DMA interface to L1. > > We recently revisited this concept again with a HNS vfio driver. IIRC > Intel messed it up in their mdev driver too. > > > > >>> Let's open another thread for this if you wish, it has nothing related > > > >>> to the spec but how it is implemented in Linux. If you search the > > > >>> archive, something similar to "vfio_virtio_pci" has been proposed > > > >>> several years before by Intel. The idea has been rejected, and we have > > > >>> leveraged Linux vDPA bus for virtio-pci devices. > > That was largely because Intel was proposing to use mdevs to create an > entire VDPA subsystem hidden inside VFIO. > > We've invested in a pure VFIO solution which should be merged soon: > > https://lore.kernel.org/kvm/20210819161914.7ad2e80e.alex.williamson@redhat.com/ > > It does not rely on mdevs. It is not trying to recreate VDPA. Instead > the HW provides a fully functional virto VF and the solution uses > normal SRIOV approaches. > > You can contrast this with the two virtio-net solutions mlx5 will > support: > > - One is the existing hypervisor assisted VDPA solution where the mlx5 > driver does HW accelerated queue processing. > > - The other one is a full PCI VF that provides a virtio-net function > without any hypervisor assistance. In this case we will have a VFIO > migration driver as above that to provide SRIOV VF live migration. This part I understand. > > I see in this thread that these two things are becoming quite > confused. They are very different, have different security postures > and use different parts of the hypervisor stack, and intended for > quite different use cases. It looks like the full PCI VF could go via the virtio-pci vDPA driver as well (drivers/vdpa/virtio-pci). So what's the advantages of exposing the migration of virtio via vfio instead of vhost-vDPA? With the vhost, we can have a lot of benefits: 1) migration compatibility with the existing software virtio and vhost/vDPA implementations 2) presenting a virtio device instead of a virtio-pci device, this makes it possible to be used by the case that guest doesn't need a PCI at all (firecracker or micro vm) 3) management infrastructure is almost ready (what Parav did) > > > Your proposal works only for PCI with SR-IOV. And I want to leverage > > it to be useful for other platforms or transport. That's all my > > motivation. > > I've read most of the emails here I still don't see what the use case > is for this beyond PCI SRIOV. So we have transports other than PCI. The basic functions for migration is common: - device freeze/stop - device states - dirty page tracking (not a must) > > In a general sense it requires virtio to specify how PASID works. No > matter what we must create a split secure/guest world where DMAs from > each world are uniquely tagged. In the pure PCI world this means > either using PF/VF or VF/PASID. > > In general PASID still has a long road to go before it is working in > Linux: > > https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/ > Yes, I think we have agreed that it is something we want and vDPA will support that for sure. > So, IMHO, it make sense to focus on the PF/VF definition for spec > purposes. That's fine. > > I agree it would be good spec design to have a general concept of a > secure and guest world and specific sections that defines how it works > for different scenarios, but that seems like a language remark and not > one about the design. For instance the admin queue Max is adding is > clearly part of the secure world and putting it on the PF is the only > option for the SRIOV mode. Yes, but let's move common functionality that is required for all transports to the chapter of "basic device facility". We don't need to define how it works in other different scenarios now. Thanks > > Jason > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 25 Aug 2021 15:13:48 -0300 From: Jason Gunthorpe Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Message-ID: <20210825181348.GL1721383@nvidia.com> References: <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> <0d06c26e-f1e7-3cac-a017-059e8985bb44@redhat.com> <74151019-6f78-2bff-5b0a-b5a4da814787@nvidia.com> <41fbd78a-f1d8-9056-3929-1e7b6b57a49b@nvidia.com> <0252a058-f3d2-db34-08a0-02c3cdd0e0bb@nvidia.com> <20210824131007.GT1721383@nvidia.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline To: Jason Wang Cc: Max Gurtovoy , "Dr. David Alan Gilbert" , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Wed, Aug 25, 2021 at 12:58:01PM +0800, Jason Wang wrote: > On Tue, Aug 24, 2021 at 9:10 PM Jason Gunthorpe wrote: > > > > On Tue, Aug 24, 2021 at 10:41:54AM +0800, Jason Wang wrote: > > > > > > migration exposed to the guest ? No. > > > > > > Can you explain why? > > > > For the SRIOV case migration is a privileged operation of the > > hypervisor. The guest must not be allowed to interact with it in any > > way otherwise the hypervisor migration could be attacked from the > > guest and this has definite security implications. > > > > In practice this means that nothing related to migration can be > > located on the MMIO pages/queues/etc of the VF. The reasons for this > > are a bit complicated and has to do with the limitations of IO > > isolation with VFIO - eg you can't reliably split a single PCI BDF > > into hypervisor/guest security domains without PASID. > > So exposing the migration function can be done indirectly: > > In L0, the hardware implements the function via PF, Qemu will present > an emulated PCI device then Qemu can expose those functions via a > capability for L1 guests. When L1 driver tries to use those functions, > it goes: > > L1 virtio-net driver -(emulated PCI-E BAR)-> Qemu -(ioctl)-> L0 kernel > VF driver -> L0 kernel PF driver -(virtio interface)-> virtio PF > > In this approach, there's no way for the L1 driver to control the or > see what is implemented in the hardware (PF). The details were hidden > by Qemu. This works even if DMA is required for the L0 kernel PF > driver to talk with the hardware since for L1 we didn't present a DMA > interface. With the future PASID support, we can even present a DMA > interface to L1. Sure, you can do this, but that isn't what is being talked about here, and honestly seems like a highly contrived use case. Further, in this mode I'd expect the hypervisor kernel driver to provide the migration support without requiring any special HW function. > > I see in this thread that these two things are becoming quite > > confused. They are very different, have different security postures > > and use different parts of the hypervisor stack, and intended for > > quite different use cases. > > It looks like the full PCI VF could go via the virtio-pci vDPA driver > as well (drivers/vdpa/virtio-pci). So what's the advantages of > exposing the migration of virtio via vfio instead of vhost-vDPA? Can't say, both are possibly valid approaches with different trade offs. Off hand I think it is just unneeded complexity to use VDPA if the device is already exposing a fully functional virtio-pci interface. I see VDPA as being useful to create HW accelerated virtio interface from HW that does not natively speak full virtio. > 1) migration compatibility with the existing software virtio and > vhost/vDPA implementations IMHO the the virtio spec should define the format of the migration state and I'd expect interworking between all the different implementations. > > I agree it would be good spec design to have a general concept of a > > secure and guest world and specific sections that defines how it works > > for different scenarios, but that seems like a language remark and not > > one about the design. For instance the admin queue Max is adding is > > clearly part of the secure world and putting it on the PF is the only > > option for the SRIOV mode. > > Yes, but let's move common functionality that is required for all > transports to the chapter of "basic device facility". We don't need to > define how it works in other different scenarios now. It seems like a reasonable way to write the spec. I'd define a secure admin queue and define how the ops on that queue work Then seperately define how to instantiate the secure admin queue in all the relevant scenarios. Jason From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 References: <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> <0d06c26e-f1e7-3cac-a017-059e8985bb44@redhat.com> <74151019-6f78-2bff-5b0a-b5a4da814787@nvidia.com> <41fbd78a-f1d8-9056-3929-1e7b6b57a49b@nvidia.com> <0252a058-f3d2-db34-08a0-02c3cdd0e0bb@nvidia.com> <20210824131007.GT1721383@nvidia.com> <20210825181348.GL1721383@nvidia.com> In-Reply-To: <20210825181348.GL1721383@nvidia.com> From: Jason Wang Date: Thu, 26 Aug 2021 11:15:25 +0800 Message-ID: Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Content-Type: text/plain; charset="UTF-8" To: Jason Gunthorpe Cc: Max Gurtovoy , "Dr. David Alan Gilbert" , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Thu, Aug 26, 2021 at 2:13 AM Jason Gunthorpe wrote: > > On Wed, Aug 25, 2021 at 12:58:01PM +0800, Jason Wang wrote: > > On Tue, Aug 24, 2021 at 9:10 PM Jason Gunthorpe wrote: > > > > > > On Tue, Aug 24, 2021 at 10:41:54AM +0800, Jason Wang wrote: > > > > > > > > migration exposed to the guest ? No. > > > > > > > > Can you explain why? > > > > > > For the SRIOV case migration is a privileged operation of the > > > hypervisor. The guest must not be allowed to interact with it in any > > > way otherwise the hypervisor migration could be attacked from the > > > guest and this has definite security implications. > > > > > > In practice this means that nothing related to migration can be > > > located on the MMIO pages/queues/etc of the VF. The reasons for this > > > are a bit complicated and has to do with the limitations of IO > > > isolation with VFIO - eg you can't reliably split a single PCI BDF > > > into hypervisor/guest security domains without PASID. > > > > So exposing the migration function can be done indirectly: > > > > In L0, the hardware implements the function via PF, Qemu will present > > an emulated PCI device then Qemu can expose those functions via a > > capability for L1 guests. When L1 driver tries to use those functions, > > it goes: > > > > L1 virtio-net driver -(emulated PCI-E BAR)-> Qemu -(ioctl)-> L0 kernel > > VF driver -> L0 kernel PF driver -(virtio interface)-> virtio PF > > > > In this approach, there's no way for the L1 driver to control the or > > see what is implemented in the hardware (PF). The details were hidden > > by Qemu. This works even if DMA is required for the L0 kernel PF > > driver to talk with the hardware since for L1 we didn't present a DMA > > interface. With the future PASID support, we can even present a DMA > > interface to L1. > > Sure, you can do this, but that isn't what is being talked about here, > and honestly seems like a highly contrived use case. It's basically how virtio-net / vhost is implemented so far in Qemu. And if we want to do this sometime in the future, we need another interface (e.g BAR or capability) in the spec for the emulated device to allow the L1 to access those functions. That's another reason I think we need to describe the migration in the chapter "basic device facility". It eases the future extension of the spec. > > Further, in this mode I'd expect the hypervisor kernel driver to > provide the migration support without requiring any special HW > function. For 'special HW function' do you mean PASID? If yes, I agree. But I think we know that the PASID will be ready in the near future. > > > > I see in this thread that these two things are becoming quite > > > confused. They are very different, have different security postures > > > and use different parts of the hypervisor stack, and intended for > > > quite different use cases. > > > > It looks like the full PCI VF could go via the virtio-pci vDPA driver > > as well (drivers/vdpa/virtio-pci). So what's the advantages of > > exposing the migration of virtio via vfio instead of vhost-vDPA? > > Can't say, both are possibly valid approaches with different trade > offs. > > Off hand I think it is just unneeded complexity to use VDPA if the > device is already exposing a fully functional virtio-pci interface. I > see VDPA as being useful to create HW accelerated virtio interface > from HW that does not natively speak full virtio. I think it depends on how we view vDPA. If we treat vDPA as a vendor specific control path and think the virtio spec is a "vendor" then virtio can go within vDPA. For the complexity, it's true that we need to build everything from scratch. But the virtio/vhost model has been implemented in Qemu for more than 10 years, and the kernel has already supported vhost-vDPA. So it's not a lot of engineering effort . Hiding the hardware details via vhost may have broader use cases. > > > 1) migration compatibility with the existing software virtio and > > vhost/vDPA implementations > > IMHO the the virtio spec should define the format of the migration > state and I'd expect interworking between all the different > implementations. Yes, so assuming spec has defined the device state, the hypervisor still can choose to convert them into another byte stream. Qemu has already defined the migration stream format for the virtio-pci device, and it works seamlessly with vhost(vDPA). For the vfio way, this means it requires extra work in the Qemu (a dedicated migration module or other) to make the migration work to convert the state to the existing virtio-pci, and it needs to care about the migration compatibility among different Qemu machine types and versions. And it needs to teach the management to know that a migration between "-device vfio-pci" and "-device virtio-net-pci" can work which is not easy. > > > > I agree it would be good spec design to have a general concept of a > > > secure and guest world and specific sections that defines how it works > > > for different scenarios, but that seems like a language remark and not > > > one about the design. For instance the admin queue Max is adding is > > > clearly part of the secure world and putting it on the PF is the only > > > option for the SRIOV mode. > > > > Yes, but let's move common functionality that is required for all > > transports to the chapter of "basic device facility". We don't need to > > define how it works in other different scenarios now. > > It seems like a reasonable way to write the spec. I'd define a secure > admin queue and define how the ops on that queue work > Yes. > Then seperately define how to instantiate the secure admin queue in > all the relevant scenarios. I don't object to this. So just to clarify, what I meant is: 1) having one subsection in the "basic device facility" to describe migration related functions: the dirty page tracking, device states. 2) having another subsection in the "basic device facility" to describe the admin virtqueue and the ops for the migration functions mentioned above I think it doesn't conflict with what Max and you propose here. And it eases the future extensions and makes sure the core migration facility is stable. Thanks > > Jason > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 26 Aug 2021 09:27:50 -0300 From: Jason Gunthorpe Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Message-ID: <20210826122750.GO1721383@nvidia.com> References: <74151019-6f78-2bff-5b0a-b5a4da814787@nvidia.com> <41fbd78a-f1d8-9056-3929-1e7b6b57a49b@nvidia.com> <0252a058-f3d2-db34-08a0-02c3cdd0e0bb@nvidia.com> <20210824131007.GT1721383@nvidia.com> <20210825181348.GL1721383@nvidia.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline To: Jason Wang Cc: Max Gurtovoy , "Dr. David Alan Gilbert" , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: On Thu, Aug 26, 2021 at 11:15:25AM +0800, Jason Wang wrote: > On Thu, Aug 26, 2021 at 2:13 AM Jason Gunthorpe wrote: > > > > On Wed, Aug 25, 2021 at 12:58:01PM +0800, Jason Wang wrote: > > > On Tue, Aug 24, 2021 at 9:10 PM Jason Gunthorpe wrote: > > > > > > > > On Tue, Aug 24, 2021 at 10:41:54AM +0800, Jason Wang wrote: > > > > > > > > > > migration exposed to the guest ? No. > > > > > > > > > > Can you explain why? > > > > > > > > For the SRIOV case migration is a privileged operation of the > > > > hypervisor. The guest must not be allowed to interact with it in any > > > > way otherwise the hypervisor migration could be attacked from the > > > > guest and this has definite security implications. > > > > > > > > In practice this means that nothing related to migration can be > > > > located on the MMIO pages/queues/etc of the VF. The reasons for this > > > > are a bit complicated and has to do with the limitations of IO > > > > isolation with VFIO - eg you can't reliably split a single PCI BDF > > > > into hypervisor/guest security domains without PASID. > > > > > > So exposing the migration function can be done indirectly: > > > > > > In L0, the hardware implements the function via PF, Qemu will present > > > an emulated PCI device then Qemu can expose those functions via a > > > capability for L1 guests. When L1 driver tries to use those functions, > > > it goes: > > > > > > L1 virtio-net driver -(emulated PCI-E BAR)-> Qemu -(ioctl)-> L0 kernel > > > VF driver -> L0 kernel PF driver -(virtio interface)-> virtio PF > > > > > > In this approach, there's no way for the L1 driver to control the or > > > see what is implemented in the hardware (PF). The details were hidden > > > by Qemu. This works even if DMA is required for the L0 kernel PF > > > driver to talk with the hardware since for L1 we didn't present a DMA > > > interface. With the future PASID support, we can even present a DMA > > > interface to L1. > > > > Sure, you can do this, but that isn't what is being talked about here, > > and honestly seems like a highly contrived use case. > > It's basically how virtio-net / vhost is implemented so far in Qemu. Well, a "L1 no DMA interface" is completely not interesting for this work. People that want a "no DMA" workflow can use the existing netdev mechanisms and don't need HW assisted migration. > And if we want to do this sometime in the future, we need another > interface (e.g BAR or capability) in the spec for the emulated device > to allow the L1 to access those functions. That's another reason I > think we need to describe the migration in the chapter "basic device > facility". It eases the future extension of the spec. The L1 has the same issue as the bare metal, the migration function is secure and how the two security domains are exposed and interact with the vIOMMU must be defined. The L0/L1 scenario above doesn't change anything, you still cannot expose the migration function in the bar or capability block of the virtio function because it becomes bundled with the security domain of the function and rendered useless for its purpose. > > Further, in this mode I'd expect the hypervisor kernel driver to > > provide the migration support without requiring any special HW > > function. > > For 'special HW function' do you mean PASID? If yes, I agree. But I > think we know that the PASID will be ready in the near future. I mean the HW support to execute virtio suspend/resume/dirty page tracking. If you have no DMA and a SW layer in the middle the hypervisor driver can just do this directly in SW. > I think it depends on how we view vDPA. If we treat vDPA as a vendor > specific control path and think the virtio spec is a "vendor" then > virtio can go within vDPA. It can, but why? The whole point of vDPA is to create a virtio interface, if I already have a perfectly functional virtio interface why would I want to wrapper more software around it just to get back to where I started? This can only create problems in the long run. Jason From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 23 Aug 2021 13:18:31 +0100 From: "Dr. David Alan Gilbert" Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function Message-ID: References: <62bd1c8d-c56e-fc98-f833-61d9c999f814@redhat.com> <5eb7d5b4-a715-5ef2-81f7-9721d865d6ac@nvidia.com> <755ff192-33ac-9f6a-a7ad-b44b14afd5d2@nvidia.com> <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> MIME-Version: 1.0 In-Reply-To: <39536c3c-e455-5602-9391-0b21add7e22f@nvidia.com> Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit To: Max Gurtovoy Cc: Jason Wang , "virtio-comment@lists.oasis-open.org" , "Michael S. Tsirkin" , "cohuck@redhat.com" , Parav Pandit , Shahaf Shuler , Ariel Adam , Amnon Ilan , Bodong Wang , Jason Gunthorpe , Stefan Hajnoczi , Eugenio Perez Martin , Liran Liss , Oren Duer List-ID: * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: > > On 8/19/2021 5:24 PM, Dr. David Alan Gilbert wrote: > > * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: > > > On 8/19/2021 2:12 PM, Dr. David Alan Gilbert wrote: > > > > * Max Gurtovoy (mgurtovoy@nvidia.com) wrote: > > > > > On 8/18/2021 1:46 PM, Jason Wang wrote: > > > > > > On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy wrote: > > > > > > > On 8/17/2021 12:44 PM, Jason Wang wrote: > > > > > > > > On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy wrote: > > > > > > > > > On 8/17/2021 11:51 AM, Jason Wang wrote: > > > > > > > > > > 在 2021/8/12 下午8:08, Max Gurtovoy 写道: > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > Live migration is one of the most important features of > > > > > > > > > > > virtualization and virtio devices are oftenly found in virtual > > > > > > > > > > > environments. > > > > > > > > > > > > > > > > > > > > > > The migration process is managed by a migration SW that is running on > > > > > > > > > > > the hypervisor and the VM is not aware of the process at all. > > > > > > > > > > > > > > > > > > > > > > Unlike the vDPA case, a real pci Virtual Function state resides in > > > > > > > > > > > the HW. > > > > > > > > > > > > > > > > > > > > > vDPA doesn't prevent you from having HW states. Actually from the view > > > > > > > > > > of the VMM(Qemu), it doesn't care whether or not a state is stored in > > > > > > > > > > the software or hardware. A well designed VMM should be able to hide > > > > > > > > > > the virtio device implementation from the migration layer, that is how > > > > > > > > > > Qemu is wrote who doesn't care about whether or not it's a software > > > > > > > > > > virtio/vDPA device or not. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In our vision, in order to fulfil the Live migration requirements for > > > > > > > > > > > virtual functions, each physical function device must implement > > > > > > > > > > > migration operations. Using these operations, it will be able to > > > > > > > > > > > master the migration process for the virtual function devices. Each > > > > > > > > > > > capable physical function device has a supervisor permissions to > > > > > > > > > > > change the virtual function operational states, save/restore its > > > > > > > > > > > internal state and start/stop dirty pages tracking. > > > > > > > > > > > > > > > > > > > > > For "supervisor permissions", is this from the software point of view? > > > > > > > > > > Maybe it's better to give an example for this. > > > > > > > > > A permission to a PF device for quiesce and freeze a VF device for example. > > > > > > > > Note that for safety, VMM (e.g Qemu) is usually running without any privileges. > > > > > > > You're mixing layers here. > > > > > > > > > > > > > > QEMU is not involved here. It's only sending IOCTLs to migration driver. > > > > > > > The migration driver will control the migration process of the VF using > > > > > > > the PF communication channel. > > > > > > So who will be granted the "permission" you mentioned here? > > > > > This is just an expression. > > > > > > > > > > What is not clear ? > > > > > > > > > > The PF device will have an option to quiesce/freeze the VF device. > > > > > > > > > > This is simple. Why are you looking for some sophisticated problems ? > > > > I'm trying to follow along here and have not completely; but I think the issue is a > > > > security separation one. > > > > The VMM (e.g. qemu) that has been given access to one of the VF's is > > > > isolated and shouldn't be able to go poking at other devices; so it > > > > can't go poking at the PF (it probably doesn't even have the PF device > > > > node accessible) - so then the question is who has access to the > > > > migration driver and how do you make sure it can only deal with VF's > > > > that it's supposed to be able to migrate. > > > The QEMU/userspace doesn't know or care about the PF connection and internal > > > virtio_vfio_pci driver implementation. > > OK > > > > > You shouldn't change 1 line of code in the VM driver nor in QEMU. > > Hmm OK. > > > > > QEMU does not have access to the PF. Only the kernel driver that has access > > > to the VF will have access to the PF communication channel.  There is no > > > permission problem here. > > > > > > The kernel driver of the VF will do this internally, and make sure that the > > > commands it build will only impact the VF originating them. > > > > > Now that confuses me; isn't the kernel driver that has access to the VF > > running inside the guest? If it's inside the guest we can't trust it to > > do anything about stopping impact to other devices. > > No. The driver is in the hypervisor (virtio_vfio_pci). This is the migration > driver, right ? Ah OK, the '*host* kernel driver of the VF' - that makes more sense to me, especially with that just being VFIO. > The guest is running as usual. It doesn't aware on the migration at all. > > This is the point I try to make here. I don't (and I can't) change even 1 > line of code in the guest. > > e.g: > > QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor (bounded > to VF5) --> send admin command on PF adminq to start tracking dirty pages > for VF5 --> PF device will do it > > QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor (bounded > to VF5) --> send admin command on PF adminq to quiesce VF5 --> PF device > will do it Yeh that makes more sense. Dave > You can take a look how we implement mlx5_vfio_pci in the link I provided. > > > > > Dave > > > > > > > We already do this in mlx5 NIC migration. The kernel is secured and QEMU > > > interface is the VF. > > > > > > > Dave > > > > > > > > > > > > > > > An example of this approach can be seen in the way NVIDIA performs > > > > > > > > > > > live migration of a ConnectX NIC function: > > > > > > > > > > > > > > > > > > > > > > https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > NVIDIAs SNAP technology enables hardware-accelerated software defined > > > > > > > > > > > PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for storage > > > > > > > > > > > and networking solutions. The host OS/hypervisor uses its standard > > > > > > > > > > > drivers that are implemented according to a well-known VIRTIO > > > > > > > > > > > specifications. > > > > > > > > > > > > > > > > > > > > > > In order to implement Live Migration for these virtual function > > > > > > > > > > > devices, that use a standard drivers as mentioned, the specification > > > > > > > > > > > should define how HW vendor should build their devices and for SW > > > > > > > > > > > developers to adjust the drivers. > > > > > > > > > > > > > > > > > > > > > > This will enable specification compliant vendor agnostic solution. > > > > > > > > > > > > > > > > > > > > > > This is exactly how we built the migration driver for ConnectX > > > > > > > > > > > (internal HW design doc) and I guess that this is the way other > > > > > > > > > > > vendors work. > > > > > > > > > > > > > > > > > > > > > > For that, I would like to know if the approach of “PF that controls > > > > > > > > > > > the VF live migration process” is acceptable by the VIRTIO technical > > > > > > > > > > > group ? > > > > > > > > > > > > > > > > > > > > > I'm not sure but I think it's better to start from the general > > > > > > > > > > facility for all transports, then develop features for a specific > > > > > > > > > > transport. > > > > > > > > > a general facility for all transports can be a generic admin queue ? > > > > > > > > It could be a virtqueue or a transport specific method (pcie capability). > > > > > > > No. You said a general facility for all transports. > > > > > > For general facility, I mean the chapter 2 of the spec which is general > > > > > > > > > > > > " > > > > > > 2 Basic Facilities of a Virtio Device > > > > > > " > > > > > > > > > > > It will be in chapter 2. Right after "2.11 Exporting Object" I can add "2.12 > > > > > Admin Virtqueues" and this is what I did in the RFC. > > > > > > > > > > > > Transport specific is not general. > > > > > > The transport is in charge of implementing the interface for those facilities. > > > > > Transport specific is not general. > > > > > > > > > > > > > > > > > > E.g we can define what needs to be migrated for the virtio-blk first > > > > > > > > (the device state). Then we can define the interface to get and set > > > > > > > > those states via admin virtqueue. Such decoupling may ease the future > > > > > > > > development of the transport specific migration interface. > > > > > > > I asked a simple question here. > > > > > > > > > > > > > > Lets stick to this. > > > > > > I answered this question. > > > > > No you didn't answer. > > > > > > > > > > I asked  if the approach of “PF that controls the VF live migration process” > > > > > is acceptable by the VIRTIO technical group ? > > > > > > > > > > And you take the discussion to your direction instead of answering a Yes/No > > > > > question. > > > > > > > > > > > The virtqueue could be one of the > > > > > > approaches. And it's your responsibility to convince the community > > > > > > about that approach. Having an example may help people to understand > > > > > > your proposal. > > > > > > > > > > > > > I'm not referring to internal state definitions. > > > > > > Without an example, how do we know if it can work well? > > > > > > > > > > > > > Can you please not change the subject of my initial intent in the email ? > > > > > > Did I? Basically, I'm asking how a virtio-blk can be migrated with > > > > > > your proposal. > > > > > The virtio-blk PF admin queue will be used to manage the virtio-blk VF > > > > > migration. > > > > > > > > > > This is the whole discussion. I don't want to get into resolution. > > > > > > > > > > Since you already know the answer as I published 4 RFCs already with all the > > > > > flow. > > > > > > > > > > Lets stick to my question. > > > > > > > > > > > Thanks > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > > > > > -Max. > > > > > > > > > > > > > > > > > > > > This publicly archived list offers a means to provide input to the > > > > > > > > > OASIS Virtual I/O Device (VIRTIO) TC. > > > > > > > > > > > > > > > > > > In order to verify user consent to the Feedback License terms and > > > > > > > > > to minimize spam in the list archive, subscription is required > > > > > > > > > before posting. > > > > > > > > > > > > > > > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > > > > > > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > > > > > > > > > List help: virtio-comment-help@lists.oasis-open.org > > > > > > > > > List archive: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gVamhvYG3lbMVVyMz%2F%2Fq3VBMZKY47pqxvRi94Mp%2B%2B7I%3D&reserved=0 > > > > > > > > > Feedback License: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=U%2FxmgTCEaUTSgG%2BLohnAAuNXLncUOKU8yBxhkEMpmQk%3D&reserved=0 > > > > > > > > > List Guidelines: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nXzbdkD4B4TzAFXbD%2B4Jap8rWmzX2CZ8fVnEE2f4Tdc%3D&reserved=0 > > > > > > > > > Committee: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8Issa0O7V4p6MnuuJOcLDN4MAG77cSMSJ7MSZqvXol4%3D&reserved=0 > > > > > > > > > Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pPqcruawglqgMjakkslrpSVZzaOu%2FCvfkTSuUfMiEh0%3D&reserved=0 > > > > > > > > > > > > > > This publicly archived list offers a means to provide input to the > > > > > OASIS Virtual I/O Device (VIRTIO) TC. > > > > > > > > > > In order to verify user consent to the Feedback License terms and > > > > > to minimize spam in the list archive, subscription is required > > > > > before posting. > > > > > > > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > > > > > List help: virtio-comment-help@lists.oasis-open.org > > > > > List archive: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=gVamhvYG3lbMVVyMz%2F%2Fq3VBMZKY47pqxvRi94Mp%2B%2B7I%3D&reserved=0 > > > > > Feedback License: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=U%2FxmgTCEaUTSgG%2BLohnAAuNXLncUOKU8yBxhkEMpmQk%3D&reserved=0 > > > > > List Guidelines: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nXzbdkD4B4TzAFXbD%2B4Jap8rWmzX2CZ8fVnEE2f4Tdc%3D&reserved=0 > > > > > Committee: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8Issa0O7V4p6MnuuJOcLDN4MAG77cSMSJ7MSZqvXol4%3D&reserved=0 > > > > > Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pPqcruawglqgMjakkslrpSVZzaOu%2FCvfkTSuUfMiEh0%3D&reserved=0 > > > > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK