linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Max Gurtovoy <mgurtovoy@nvidia.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Leon Romanovsky <leon@kernel.org>,
	Doug Ledford <dledford@redhat.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Kirti Wankhede <kwankhede@nvidia.com>, <kvm@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-pci@vger.kernel.org>,
	<linux-rdma@vger.kernel.org>, <netdev@vger.kernel.org>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Cornelia Huck <cohuck@redhat.com>
Subject: Re: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity
Date: Thu, 30 Sep 2021 18:32:07 +0300	[thread overview]
Message-ID: <d5b68bb7-d4d3-e9d8-1834-dba505bb8595@nvidia.com> (raw)
In-Reply-To: <20210930144752.GA67618@ziepe.ca>


On 9/30/2021 5:47 PM, Jason Gunthorpe wrote:
> On Thu, Sep 30, 2021 at 12:34:19PM +0300, Max Gurtovoy wrote:
>
>>> When we add the migration extension this cannot change, so after
>>> open_device() the device should be operational.
>> if it's waiting for incoming migration blob, it is not running.
> It cannot be waiting for a migration blob after open_device, that is
> not backwards compatible.
>
> Just prior to open device the vfio pci layer will generate a FLR to
> the function so we expect that post open_device has a fresh from reset
> fully running device state.

running also mean that the device doesn't have a clue on its internal 
state ? or running means unfreezed and unquiesced ?

>
>>> The reported state in the migration region should accurately reflect
>>> what the device is currently doing. If the device is operational then
>>> it must report running, not stopped.
>> STOP in migration meaning.
> As Alex and I have said several times STOP means the internal state is
> not allowed to change.
>
>>> driver will see RESUMING toggle off so it will trigger a
>>> de-serialization
>> You mean stop serialization ?
> No, I mean it will take all the migration data that has been uploaded
> through the migration region and de-serialize it into active device
> state.

you should feed the device way before that.

>
>>> driver will see SAVING toggled on so it will serialize the new state
>>> (either the pre-copy state or the post-copy state dpending on the
>>> running bit)
>> lets leave the bits and how you implement the state numbering aside.
> You've missed the point. This isn't a FSM. It is a series of three
> control bits that we have assigned logical meaning their combinatoins.
>
> The algorithm I gave is a control centric algorithm not a state
> centric algorithm and matches the direction Alex thought this was
> being designed for.
>   
>> If you finish resuming you can move to a new state (that we should add) =>
>> RESUMED.
> It is not a state machine. Once you stop prentending this is
> implementing a FSM Alex's position makes perfect sense.

You can look on it anyway you want. Three control bits or FSM. And I can 
look on it anyway I want.

The point is what bits/state you set during the resume phase:

1. you initialize at  _RUNNING bit == 001b. No problem.

2. state stream arrives, migration SW raise _RESUMING bit. should it be 
101b or 100b ? for now it's 100b. But according to your statement is 
should be 101b (invalid today) since device state can change. right ?

3. Then you should indicate that all the state was serialized to the 
device (actually to all the pci devices). 100b mean RESUMING and not 
RUNNING so maybe this can say RESUMED and state can't change now ?

4. all devices move to running 001b only after all devices moved to 100b.

Otherwise, devices will start changing each other internal states.

-Max.

>
> Jason

  reply	other threads:[~2021-09-30 15:32 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1632305919.git.leonro@nvidia.com>
2021-09-22 10:38 ` [PATCH mlx5-next 1/7] PCI/IOV: Provide internal VF index Leon Romanovsky
2021-09-22 21:59   ` Bjorn Helgaas
2021-09-23  6:35     ` Leon Romanovsky
2021-09-24 13:08       ` Bjorn Helgaas
2021-09-25 10:10         ` Leon Romanovsky
2021-09-25 17:41           ` Bjorn Helgaas
2021-09-26  6:36             ` Leon Romanovsky
2021-09-26 20:23               ` Bjorn Helgaas
2021-09-27 11:55                 ` Leon Romanovsky
2021-09-27 14:47                   ` Bjorn Helgaas
2021-09-22 10:38 ` [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity Leon Romanovsky
2021-09-23 10:33   ` Shameerali Kolothum Thodi
2021-09-23 11:17     ` Leon Romanovsky
2021-09-23 13:55       ` Max Gurtovoy
2021-09-24  7:44         ` Shameerali Kolothum Thodi
2021-09-24  9:37           ` Kirti Wankhede
2021-09-26  9:09           ` Max Gurtovoy
2021-09-26 16:17             ` Shameerali Kolothum Thodi
2021-09-27 18:24               ` Max Gurtovoy
2021-09-27 18:29                 ` Shameerali Kolothum Thodi
2021-09-27 22:46   ` Alex Williamson
2021-09-27 23:12     ` Jason Gunthorpe
2021-09-28 19:19       ` Alex Williamson
2021-09-28 19:35         ` Jason Gunthorpe
2021-09-28 20:18           ` Alex Williamson
2021-09-29 16:16             ` Jason Gunthorpe
2021-09-29 18:06               ` Alex Williamson
2021-09-29 18:26                 ` Jason Gunthorpe
2021-09-29 10:57         ` Max Gurtovoy
2021-09-29 10:44       ` Max Gurtovoy
2021-09-29 12:35         ` Alex Williamson
2021-09-29 13:26           ` Max Gurtovoy
2021-09-29 13:50             ` Alex Williamson
2021-09-29 14:36               ` Max Gurtovoy
2021-09-29 15:17                 ` Alex Williamson
2021-09-29 15:28                   ` Max Gurtovoy
2021-09-29 16:14                     ` Jason Gunthorpe
2021-09-29 21:48                       ` Max Gurtovoy
2021-09-29 22:44                         ` Alex Williamson
2021-09-30  9:25                           ` Max Gurtovoy
2021-09-30 12:41                             ` Alex Williamson
2021-09-29 23:21                         ` Jason Gunthorpe
2021-09-30  9:34                           ` Max Gurtovoy
2021-09-30 14:47                             ` Jason Gunthorpe
2021-09-30 15:32                               ` Max Gurtovoy [this message]
2021-09-30 16:24                                 ` Jason Gunthorpe
2021-09-30 16:51                                   ` Max Gurtovoy
2021-09-30 17:01                                     ` Jason Gunthorpe
2021-09-22 10:38 ` [PATCH mlx5-next 3/7] vfio/pci_core: Make the region->release() function optional Leon Romanovsky
2021-09-23 13:57   ` Max Gurtovoy
2021-09-22 10:38 ` [PATCH mlx5-next 4/7] net/mlx5: Introduce migration bits and structures Leon Romanovsky
2021-09-24  5:48   ` Mark Zhang
2021-09-22 10:38 ` [PATCH mlx5-next 5/7] net/mlx5: Expose APIs to get/put the mlx5 core device Leon Romanovsky
2021-09-22 10:38 ` [PATCH mlx5-next 6/7] mlx5_vfio_pci: Expose migration commands over mlx5 device Leon Romanovsky
2021-09-28 20:22   ` Alex Williamson
2021-09-29  5:36     ` Leon Romanovsky
2021-09-22 10:38 ` [PATCH mlx5-next 7/7] mlx5_vfio_pci: Implement vfio_pci driver for mlx5 devices Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d5b68bb7-d4d3-e9d8-1834-dba505bb8595@nvidia.com \
    --to=mgurtovoy@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=cohuck@redhat.com \
    --cc=davem@davemloft.net \
    --cc=dledford@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).