linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Max Gurtovoy <mgurtovoy@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>, Leon Romanovsky <leon@kernel.org>,
	"Doug Ledford" <dledford@redhat.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	Kirti Wankhede <kwankhede@nvidia.com>, <kvm@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-pci@vger.kernel.org>,
	<linux-rdma@vger.kernel.org>, <netdev@vger.kernel.org>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Cornelia Huck <cohuck@redhat.com>
Subject: Re: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity
Date: Thu, 30 Sep 2021 06:41:39 -0600	[thread overview]
Message-ID: <20210930064139.57bb74c0.alex.williamson@redhat.com> (raw)
In-Reply-To: <bad28179-cbca-9337-8e6b-d730f06c6c58@nvidia.com>

On Thu, 30 Sep 2021 12:25:23 +0300
Max Gurtovoy <mgurtovoy@nvidia.com> wrote:

> On 9/30/2021 1:44 AM, Alex Williamson wrote:
> > On Thu, 30 Sep 2021 00:48:55 +0300
> > Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
> >  
> >> On 9/29/2021 7:14 PM, Jason Gunthorpe wrote:  
> >>> On Wed, Sep 29, 2021 at 06:28:44PM +0300, Max Gurtovoy wrote:
> >>>     
> >>>>> So you have a device that's actively modifying its internal state,
> >>>>> performing I/O, including DMA (thereby dirtying VM memory), all while
> >>>>> in the _STOP state?  And you don't see this as a problem?  
> >>>> I don't see how is it different from vfio-pci situation.  
> >>> vfio-pci provides no way to observe the migration state. It isn't
> >>> "000b"  
> >> Alex said that there is a problem of compatibility.
> >>
> >> I migration SW is not involved, nobody will read this migration state.  
> > The _STOP state has a specific meaning regardless of whether userspace
> > reads the device state value.  I think what you're suggesting is that
> > the device reports itself as _STOP'd but it's actually _RUNNING.  Is
> > that the compatibility workaround, create a self inconsistency?  
> 
>  From migration point of view the device is stopped.

The _RESUMING and _SAVING bits control the migration activity, the
_RUNNING bit controls the ability of the device to modify its internal
state and affect external state.  The initial state of the device is
absolutely not stopped.

> > We cannot impose on userspace to move a device from _STOP to _RUNNING
> > simply because the device supports the migration region, nor should we
> > report a device state that is inconsistent with the actual device state.  
> 
> In this case we can think maybe moving to running during enabling the 
> bus master..

There are no spontaneous state transitions, device_state changes only
via user manipulation of the register.

> >>>> Maybe we need to rename STOP state. We can call it READY or LIVE or
> >>>> NON_MIGRATION_STATE.  
> >>> It was a poor choice to use 000b as stop, but it doesn't really
> >>> matter. The mlx5 driver should just pre-init this readable to running.  
> >> I guess we can do it for this reason. There is no functional problem nor
> >> compatibility issue here as was mentioned.
> >>
> >> But still we need the kernel to track transitions. We don't want to
> >> allow moving from RESUMING to SAVING state for example. How this
> >> transition can be allowed ?
> >>
> >> In this case we need to fail the request from the migration SW...  
> > _RESUMING to _SAVING seems like a good way to test round trip migration
> > without running the device to modify the state.  Potentially it's a
> > means to update a saved device migration data stream to a newer format
> > using an intermediate driver version.  
> 
> what do you mean by "without running the device to modify the state." ?

If a device is !_RUNNING it should not be advancing its internal state,
therefore state-in == state-out.
 
> did you describe a case where you migrate from source to dst and then 
> back to source with a new migration data format ?

I'm speculating that as the driver evolves, the migration data stream
generated from the device's migration region can change.  Hopefully in
compatible ways.  The above sequence of restoring and extracting state
without the complication of the device running could help to validate
compatibility.  Thanks,

Alex


  reply	other threads:[~2021-09-30 12:41 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1632305919.git.leonro@nvidia.com>
2021-09-22 10:38 ` [PATCH mlx5-next 1/7] PCI/IOV: Provide internal VF index Leon Romanovsky
2021-09-22 21:59   ` Bjorn Helgaas
2021-09-23  6:35     ` Leon Romanovsky
2021-09-24 13:08       ` Bjorn Helgaas
2021-09-25 10:10         ` Leon Romanovsky
2021-09-25 17:41           ` Bjorn Helgaas
2021-09-26  6:36             ` Leon Romanovsky
2021-09-26 20:23               ` Bjorn Helgaas
2021-09-27 11:55                 ` Leon Romanovsky
2021-09-27 14:47                   ` Bjorn Helgaas
2021-09-22 10:38 ` [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity Leon Romanovsky
2021-09-23 10:33   ` Shameerali Kolothum Thodi
2021-09-23 11:17     ` Leon Romanovsky
2021-09-23 13:55       ` Max Gurtovoy
2021-09-24  7:44         ` Shameerali Kolothum Thodi
2021-09-24  9:37           ` Kirti Wankhede
2021-09-26  9:09           ` Max Gurtovoy
2021-09-26 16:17             ` Shameerali Kolothum Thodi
2021-09-27 18:24               ` Max Gurtovoy
2021-09-27 18:29                 ` Shameerali Kolothum Thodi
2021-09-27 22:46   ` Alex Williamson
2021-09-27 23:12     ` Jason Gunthorpe
2021-09-28 19:19       ` Alex Williamson
2021-09-28 19:35         ` Jason Gunthorpe
2021-09-28 20:18           ` Alex Williamson
2021-09-29 16:16             ` Jason Gunthorpe
2021-09-29 18:06               ` Alex Williamson
2021-09-29 18:26                 ` Jason Gunthorpe
2021-09-29 10:57         ` Max Gurtovoy
2021-09-29 10:44       ` Max Gurtovoy
2021-09-29 12:35         ` Alex Williamson
2021-09-29 13:26           ` Max Gurtovoy
2021-09-29 13:50             ` Alex Williamson
2021-09-29 14:36               ` Max Gurtovoy
2021-09-29 15:17                 ` Alex Williamson
2021-09-29 15:28                   ` Max Gurtovoy
2021-09-29 16:14                     ` Jason Gunthorpe
2021-09-29 21:48                       ` Max Gurtovoy
2021-09-29 22:44                         ` Alex Williamson
2021-09-30  9:25                           ` Max Gurtovoy
2021-09-30 12:41                             ` Alex Williamson [this message]
2021-09-29 23:21                         ` Jason Gunthorpe
2021-09-30  9:34                           ` Max Gurtovoy
2021-09-30 14:47                             ` Jason Gunthorpe
2021-09-30 15:32                               ` Max Gurtovoy
2021-09-30 16:24                                 ` Jason Gunthorpe
2021-09-30 16:51                                   ` Max Gurtovoy
2021-09-30 17:01                                     ` Jason Gunthorpe
2021-09-22 10:38 ` [PATCH mlx5-next 3/7] vfio/pci_core: Make the region->release() function optional Leon Romanovsky
2021-09-23 13:57   ` Max Gurtovoy
2021-09-22 10:38 ` [PATCH mlx5-next 4/7] net/mlx5: Introduce migration bits and structures Leon Romanovsky
2021-09-24  5:48   ` Mark Zhang
2021-09-22 10:38 ` [PATCH mlx5-next 5/7] net/mlx5: Expose APIs to get/put the mlx5 core device Leon Romanovsky
2021-09-22 10:38 ` [PATCH mlx5-next 6/7] mlx5_vfio_pci: Expose migration commands over mlx5 device Leon Romanovsky
2021-09-28 20:22   ` Alex Williamson
2021-09-29  5:36     ` Leon Romanovsky
2021-09-22 10:38 ` [PATCH mlx5-next 7/7] mlx5_vfio_pci: Implement vfio_pci driver for mlx5 devices Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210930064139.57bb74c0.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=cohuck@redhat.com \
    --cc=davem@davemloft.net \
    --cc=dledford@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mgurtovoy@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).