linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Yishai Hadas <yishaih@nvidia.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"jgg@nvidia.com" <jgg@nvidia.com>,
	"saeedm@nvidia.com" <saeedm@nvidia.com>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"leonro@nvidia.com" <leonro@nvidia.com>,
	"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"mgurtovoy@nvidia.com" <mgurtovoy@nvidia.com>,
	"maorg@nvidia.com" <maorg@nvidia.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"shameerali.kolothum.thodi@huawei.com" 
	<shameerali.kolothum.thodi@huawei.com>
Subject: RE: [PATCH V7 mlx5-next 09/15] vfio: Extend the device migration protocol with RUNNING_P2P
Date: Tue, 15 Feb 2022 10:18:11 +0000	[thread overview]
Message-ID: <BN9PR11MB5276D169554630B8345DB7598C349@BN9PR11MB5276.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20220207172216.206415-10-yishaih@nvidia.com>

> From: Yishai Hadas <yishaih@nvidia.com>
> Sent: Tuesday, February 8, 2022 1:22 AM
> 
> From: Jason Gunthorpe <jgg@nvidia.com>
> 
> The RUNNING_P2P state is designed to support multiple devices in the same
> VM that are doing P2P transactions between themselves. When in
> RUNNING_P2P
> the device must be able to accept incoming P2P transactions but should not
> generate outgoing transactions.

outgoing 'P2P' transactions.

> 
> As an optional extension to the mandatory states it is defined as
> inbetween STOP and RUNNING:
>    STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP
> 
> For drivers that are unable to support RUNNING_P2P the core code silently
> merges RUNNING_P2P and RUNNING together. Drivers that support this will

It would be clearer if following message could be also reflected here:

  + * The optional states cannot be used with SET_STATE if the device does not
  + * support them. The user can discover if these states are supported by using
  + * VFIO_DEVICE_FEATURE_MIGRATION. 

Otherwise the original context reads like RUNNING_P2P can be used as
end state even if the underlying driver doesn't support it then makes me
wonder what is the point of the new capability bit.

> be
> required to implement 4 FSM arcs beyond the basic FSM. 2 of the basic FSM
> arcs become combination transitions.
> 
> Compared to the v1 clarification, NDMA is redefined into FSM states and is
> described in terms of the desired P2P quiescent behavior, noting that
> halting all DMA is an acceptable implementation.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
>  drivers/vfio/vfio.c       | 79 ++++++++++++++++++++++++++++++---------
>  include/linux/vfio.h      |  1 +
>  include/uapi/linux/vfio.h | 34 ++++++++++++++++-
>  3 files changed, 95 insertions(+), 19 deletions(-)
> 
> @@ -1631,17 +1657,36 @@ int vfio_mig_get_next_state(struct vfio_device

[...]

>  	*next_fsm = vfio_from_fsm_table[cur_fsm][new_fsm];
> +	while ((state_flags_table[*next_fsm] & device->migration_flags) !=
> +			state_flags_table[*next_fsm])
> +		*next_fsm = vfio_from_fsm_table[*next_fsm][new_fsm];
> +

A comment highlighting the silent merging of unsupported states would
be informative here.

and I have a puzzle on following messages:

>   *
> + * And 1 optional state to support VFIO_MIGRATION_P2P:
> + *  RUNNING_P2P - RUNNING, except the device cannot do peer to peer
> DMA
>   *

and

> + * RUNNING_P2P -> RUNNING
>   *   While in RUNNING the device is fully operational, the device may
> generate
>   *   interrupts, DMA, respond to MMIO, all vfio device regions are functional,
>   *   and the device may advance its internal state.
>   *

and below

> + * The optional peer to peer (P2P) quiescent state is intended to be a
> quiescent
> + * state for the device for the purposes of managing multiple devices within
> a
> + * user context where peer-to-peer DMA between devices may be active.
> The
> + * RUNNING_P2P states must prevent the device from initiating
> + * any new P2P DMA transactions. If the device can identify P2P transactions
> + * then it can stop only P2P DMA, otherwise it must stop all DMA. The
> migration
> + * driver must complete any such outstanding operations prior to
> completing the
> + * FSM arc into a P2P state. For the purpose of specification the states
> + * behave as though the device was fully running if not supported.

Defining RUNNING_P2P in above way implies that RUNNING_P2P inherits 
all behaviors in RUNNING except blocking outbound P2P:
	* generate interrupts and DMAs
	* respond to MMIO
	* all vfio regions are functional
	* device may advance its internal state
	* drain and block outstanding P2P requests

I think this is not the intended behavior when NDMA was being discussed
in previous threads, as above definition suggests the user could continue
to submit new requests after outstanding P2P requests are completed given
all vfio regions are functional when the device is in RUNNING_P2P.

Though just a naming thing, possibly what we really require is a STOPPING_P2P
state which indicates the device is moving to the STOP (or STOPPED) state.
In this state the device is functional but vfio regions are not so the user still
needs to restrict device access:
	* generate interrupts and DMAs
	* respond to MMIO
	* all vfio regions are NOT functional (no user access)
	* device may advance its internal state
	* drain and block outstanding P2P requests

In virtualization this means Qemu must stop vCPU first before entering
STOPPING_P2P for a device.

Back to your earlier suggestion on reusing RUNNING_P2P to cover vPRI 
usage via a new capability bit [1]:

    "A cap like "running_p2p returns an event fd, doesn't finish until the
    VCPU does stuff, and stops pri as well as p2p" might be all that is
    required here (and not an actual new state)"

vPRI requires a RUNNING semantics. A new capability bit can change 
the behaviors listed above for STOPPING_P2P to below:
	* both P2P and vPRI requests should be drained and blocked;
	* all vfio regions are functional (with a RUNNING behavior) so
	  vCPUs can continue running to help drain vPRI requests;
	* an eventfd is returned for the user to poll-wait the completion
	  of state transition;

and in this regard possibly it makes more sense to call this state 
as STOPPING to encapsulate any optional preparation work before 
the device can be transitioned to STOP (with default as defined for
STOPPING_P2P above and actual behavior changeable by future
capability bits)? 

One additional requirement in driver side is to dynamically mediate the 
fast path and queue any new request which may trigger vPRI or P2P
before moving out of RUNNING_P2P. If moving to STOP_COPY, then
queued requests will also be included as device state to be replayed
in the resuming path.

Does above sound a reasonable understanding of this FSM mechanism? 

> + *
> + * The optional states cannot be used with SET_STATE if the device does not
> + * support them. The user can disocver if these states are supported by

'disocver' -> 'discover'

Thanks
Kevin

  reply	other threads:[~2022-02-15 10:18 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-07 17:22 [PATCH V7 mlx5-next 00/15] Add mlx5 live migration driver and v2 migration protocol Yishai Hadas
2022-02-07 17:22 ` [PATCH V7 mlx5-next 01/15] PCI/IOV: Add pci_iov_vf_id() to get VF index Yishai Hadas
2022-02-07 17:22 ` [PATCH V7 mlx5-next 02/15] net/mlx5: Reuse exported virtfn index function call Yishai Hadas
2022-02-07 17:22 ` [PATCH V7 mlx5-next 03/15] net/mlx5: Disable SRIOV before PF removal Yishai Hadas
2022-02-07 17:22 ` [PATCH V7 mlx5-next 04/15] PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata of a PF Yishai Hadas
2022-02-07 17:22 ` [PATCH V7 mlx5-next 05/15] net/mlx5: Expose APIs to get/put the mlx5 core device Yishai Hadas
2022-02-07 17:22 ` [PATCH V7 mlx5-next 06/15] net/mlx5: Introduce migration bits and structures Yishai Hadas
2022-02-07 17:22 ` [PATCH V7 mlx5-next 07/15] vfio: Have the core code decode the VFIO_DEVICE_FEATURE ioctl Yishai Hadas
2022-02-07 17:22 ` [PATCH V7 mlx5-next 08/15] vfio: Define device migration protocol v2 Yishai Hadas
2022-02-09  0:07   ` Alex Williamson
2022-02-09  2:36     ` Jason Gunthorpe
2022-02-15 10:41       ` Tian, Kevin
2022-02-15 16:04         ` Jason Gunthorpe
2022-02-15 23:32           ` Alex Williamson
2022-02-16  1:17             ` Jason Gunthorpe
2022-02-16  3:17           ` Tian, Kevin
2022-02-16 12:14             ` Jason Gunthorpe
2022-02-17  2:29               ` Tian, Kevin
2022-02-15 10:58       ` Tian, Kevin
2022-02-15 13:13         ` Jason Gunthorpe
2022-02-15  8:04   ` Tian, Kevin
2022-02-15 15:33     ` Jason Gunthorpe
2022-02-16  3:04       ` Tian, Kevin
2022-02-07 17:22 ` [PATCH V7 mlx5-next 09/15] vfio: Extend the device migration protocol with RUNNING_P2P Yishai Hadas
2022-02-15 10:18   ` Tian, Kevin [this message]
2022-02-15 15:56     ` Jason Gunthorpe
2022-02-16  2:52       ` Tian, Kevin
2022-02-16 12:11         ` Jason Gunthorpe
2022-02-07 17:22 ` [PATCH V7 mlx5-next 10/15] vfio: Remove migration protocol v1 documentation Yishai Hadas
2022-02-11 11:03   ` Cornelia Huck
2022-02-07 17:22 ` [PATCH V7 mlx5-next 11/15] vfio/mlx5: Expose migration commands over mlx5 device Yishai Hadas
2022-02-07 17:22 ` [PATCH V7 mlx5-next 12/15] vfio/mlx5: Implement vfio_pci driver for mlx5 devices Yishai Hadas
2022-02-09  0:07   ` Alex Williamson
2022-02-07 17:22 ` [PATCH V7 mlx5-next 13/15] vfio/pci: Expose vfio_pci_core_aer_err_detected() Yishai Hadas
2022-02-07 17:22 ` [PATCH V7 mlx5-next 14/15] vfio/mlx5: Use its own PCI reset_done error handler Yishai Hadas
2022-02-09  0:08   ` Alex Williamson
2022-02-09  2:39     ` Jason Gunthorpe
2022-02-10 16:48       ` Alex Williamson
2022-02-10 17:27         ` Jason Gunthorpe
2022-02-07 17:22 ` [PATCH V7 mlx5-next 15/15] vfio: Extend the device migration protocol with PRE_COPY Yishai Hadas
2022-02-17 17:15   ` Alex Williamson
2022-02-18  0:03     ` Jason Gunthorpe
2022-02-18  8:01   ` Tian, Kevin
2022-02-18 14:06     ` Jason Gunthorpe
2022-02-22  1:43       ` Tian, Kevin
2022-02-22 15:50         ` Jason Gunthorpe
2022-02-23  0:40           ` Tian, Kevin
2022-02-23  0:44             ` Jason Gunthorpe
2022-02-23  1:46               ` Tian, Kevin
2022-02-18  8:11 ` [PATCH V7 mlx5-next 00/15] Add mlx5 live migration driver and v2 migration protocol Tarun Gupta (SW-GPU)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN9PR11MB5276D169554630B8345DB7598C349@BN9PR11MB5276.namprd11.prod.outlook.com \
    --to=kevin.tian@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=jgg@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=leonro@nvidia.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).